Recursive Types in Python
Explore Python's type hinting evolution: from forward references for complex recursive types to the innovative Self type for simpler, more precise code. Stay updated on Python's advancements and best practices with our insightful articles. Subscribe for more Python programming insights.
Overview
In computer science, some of the more valuable data structures are recursive, meaning they are either defined in terms of themselves like lists or composed of different parts that refer to each other, like Abstract Syntax Trees.
Being dynamically typed, Python does not perform static type checking at runtime. However, the interpreter has a name resolution step requiring that any identifiers we use, including in Type Hints, are declared beforehand.
Python's type checkers, such as mypy
, can implement a more complex process, including multiple passes. Unfortunately, programs with recursive types that type-check cleanly might result in a NameError at runtime.
This limitation makes it difficult to directly declare recursive or mutually recursive types in Python. This article will discuss two solutions, one generally applicable in all cases and one for particular types referencing themselves.
Forward References
In Python, identifiers have to be defined before use.; Python's name resolution will throw a NameError otherwise. This presents a challenge for using Type Hints in recursive data structures.
We could avoid adding Type Hints in such situations, limiting the usability of the type system. Fortunately, PEP 484 introduced Forward References to enable us to declare direct or mutually recursive types.
Forward references use strings instead of identifiers to specify type hints. When Python's interpreter encounters these strings, it skips the name resolution because they are strings, not names. This approach allows the use of types defined later in the code, circumventing the limitations imposed by Python's one-pass interpreter.
However, type checkers are not limited to a single pass and are aware of forward references. They treat these strings as regular type hints, using them for type-checking our programs.
Consider an application designed to evaluate mathematical expressions. We define two primary classes: Expression
and Operator
. The Expression
class represents a single expression, which could be as simple as a numeric value or as complex as a nested operation. The Operator
class encapsulates an operation (like addition or multiplication) and operates on multiple Expression
objects.
However, there's a catch: each Operator
contains Expression
objects, and each Expression
can encapsulate an Operator
. This circular dependency creates a mutually recursive relationship, posing a challenge for Python's type system.
To resolve this, we use forward references. Here's how the classes might be defined:
from typing import Union, List
class Value:
def __init__(self, data: int):
self.data = data
def evaluate(self) -> int:
return self.data
class Operation:
def __init__(self, operands: List['Expression']):
self.operands = operands
def evaluate(self) -> int:
# Placeholder for operation logic (e.g., addition, multiplication)
# For simplicity, let's assume it's a sum of operands
return sum(operand.evaluate() for operand in self.operands)
class Expression:
def __init__(self, content: Union[Value, Operation]):
self.content = content
def evaluate(self) -> int:
return self.content.evaluate()
Understanding the Self Type in Python
Traditionally, if a Python method within a class needed to return an instance of its class or accept parameters of the same class type, it required us to declare Type Variables (TypeVar
).
This could complicate the code, making it harder to understand for those new to the language.
PEP 673 addresses this by providing a concise, straightforward way to express that a method or attribute is inherently tied to its class. It Introduced the Self
Type annotation into Python, simplifying the tasks of writing Type Hints for recursive Data Structures.
We can use Self anywhere inside a class or a protocol where we need to refer to the current object type. This reduces the need to introduce type variables and type bounds on those variables.
Self always refers to the immediately enclosing class; it will reference the innermost class scope if used within a nested class.
Let's understand Self Types using a recursive data structure known as Tries. A Trie (or Prefix Tree) is a tree-like data structure used to store a dynamic set of strings efficiently. In a Trie, each node represents a single character of a string, and a path from the root to a node represents a prefix of strings stored in the Trie.
In our TrieNode
class example, each instance of TrieNode
represents a single character and potentially leads to further characters. The children
dictionary in each TrieNode
instance maps characters to their subsequent TrieNode
instances:
from typing import Dict, Optional, Self
class TrieNode:
def __init__(self):
self.children: Dict[str, TrieNode] = {}
self.is_end_of_word: bool = False
def insert(self, word: str) -> Self:
node = self
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end_of_word = True
return self
def count_words_with_prefix(self, prefix: str) -> int:
node = self._find_node(prefix)
if not node:
return 0
return self._count_words(node)
def _count_words(self, node: 'TrieNode') -> int:
count = 1 if node.is_end_of_word else 0
for child in node.children.values():
count += self._count_words(child)
return count
def _find_node(self, string: str) -> Optional[Self]:
node = self
for char in string:
if char not in node.children:
return None
node = node.children[char]
return node
Breakdown of the TrieNode Operations
Inserting a Word:
- When inserting a word, the
insert
method starts at the root node (an instance ofTrieNode
). - For each character in the word, the method checks if the character already exists as a key in the
children
dictionary of the current node. - If the character is absent, it creates a new
TrieNode
and adds it tochildren
. - It then moves to this new node and continues with the next character.
- When the end of the word is reached, it marks the current node as the end of a word (
is_end_of_word = True
).
Searching for a Word:
- The
search
method also starts at the root. - It traverses the Trie according to the characters in the word being searched.
- If, at any point, the required character is not found in the
children
of the current node, it returnsFalse
. - If all characters are located, and the final node is marked as
is_end_of_word
, it returnsTrue
.
Checking for a Prefix:
- The
starts_with
method is similar tosearch but
doesn't require the final node to be marked asis_end_of_word
. - It only checks if the path for the prefix exists in the Trie.
Counting Occurrences of a Prefix:
- The method begins by finding the TrieNode corresponding to the last character of the given prefix. This is accomplished using the
_find_node
helper method. _find_node
traverses the Trie following the characters of the prefix. If any character in the prefix is not found, it returnsNone
, indicating that no words start with that prefix.- Once the node corresponding to the last character of the prefix is found, the
_count_words
method is called on this node. _count_words
is a recursive method that traverses all the child nodes starting from the given node.- For each node it visits, it checks if the node represents the end of a word (
is_end_of_word
isTrue
). If so, it increments the count. - The method then recursively calls itself for each of the current node's children, adding up the counts from each subtree.
- The total count of words that start with the given prefix is returned.
In this implementation, the insert
method adds a word to the Trie, while count_words_with_prefix
counts the number of words starting with a given prefix. The Self
type simplifies the representation of the TrieNode's methods, making the code more readable and maintainable.
Using Self
in recursive data structures like Tries enhances code readability and a self-explanatory nature. It immediately clarifies that methods return an instance of their class or operate on parameters of their class type. This not only improves the clarity of the code but also maintains consistency in type annotations, making the codebase more maintainable and easier to understand.
Using the Trie
The true power of this Trie implementation is showcased in scenarios like counting the number of words that start with a particular prefix, a common requirement in applications like autocomplete systems or search engines:
words = ["apple", "app", "apricot", "banana", "berry", "blueberry", "blackberry"]
trie = TrieNode()
for word in words:
trie.insert(word)
# Counting words that start with 'app'
prefix = "app"
count = trie.count_words_with_prefix(prefix)
print(f"There are {count} words that start with '{prefix}'")
In this example, we efficiently insert a set of words into our Trie and then demonstrate how to count the number of words starting with a specific prefix. The Trie makes these operations highly efficient, as it reduces the search space with each character in the prefix, and the recursive count_words
method enables quick counting of qualifying words.
Practical Applications of the Self-Type in Python
Self Type is advantageous in scenarios where a method in a class needs to refer to the class itself. Below are several practical applications where the Self Type can be particularly beneficial:
- Fluent Interfaces: In fluent interfaces, methods often return an instance of the class to allow for method chaining. Using
Self
as the return type clarifies that these methods are part of a fluent interface design pattern. - Recursive Data Structures: Data structures like linked lists, trees, and graphs often have methods that return instances of the same class. The
Self
type simplifies the type hinting in these cases, making the recursive nature of these structures more apparent. - Factory Methods: Factory methods within a class that return a new instance can use the Self type to indicate their purpose. This is particularly useful in classes designed to be extended or inherited.
- Copy or Clone Methods: Methods intended to create a copy or clone of the current instance can use
Self
to show that they return a new instance of the same class. - Builders and Constructors: In builder patterns or alternative constructors, using
Self
can denote methods that build or construct instances of the class. - Type Checking in Dynamic Environments: In dynamic environments where the exact class type might vary due to inheritance or extension,
Self
can provide a more accurate and flexible type hint than directly using the class name. - Polymorphic Methods: In polymorphic class hierarchies, where methods may return instances of the base class or any of its subclasses,
Self
can be used to maintain generality and flexibility in the type hints. - Method Overriding in Inheritance: In cases of method overriding in class hierarchies,
Self
can be used in the return type to ensure that the overridden methods in subclasses return instances of the subclass.
Conclusion
Python's journey in enhancing its type hinting capabilities has been remarkable. With the introduction of forward references, Python made it possible to declare complex recursive type hints, a significant leap in handling intricate data structures. Building upon this, introducing Self types represents a further development in Python's type system. Self-types streamline the process of writing type hints for simple recursive types, making the code more readable and intuitive for developers.
If you're interested in modern Python and passionate about staying updated with the latest in programming, don't forget to subscribe.
Addendum: A Special Note for Our Readers
I decided to delay the introduction of subscriptions. You can read the full story here.
If you find our content helpful, there are several ways you can support us:
- The easiest way is to share our articles and links page on social media; it is free and helps us greatly.
- If you want a great experience during the Chinese New Year, I am renting my timeshare in Phuket. A five-night stay in this resort in Phuket costs 11,582 € on Expedia. I am offering it in USD at an over 40% discount compared to that price. I received the Year of the Snake in style.
- If your finances permit it, we are happy over any received donation. It helps us offset the site's running costs and an unexpected tax bill. Any amount is greatly appreciated:
- Finally, some articles have links to relevant goods and services; buying through them will not cost you more. And if you like programming swag, please visit the TuringTacoTales Store on Redbubble. Take a look. Maybe you can find something you like: