The algorithm starts with an unbounded mapping and iteratively improves it until the bounds converge, at which point the optimal edit sequence is discovered.
EditDistance(from_node: graphtage.TreeNode, to_node: graphtage.TreeNode, from_seq: Sequence[graphtage.TreeNode], to_seq: Sequence[graphtage.TreeNode], insert_remove_penalty: int = 1)¶
An edit that computes the minimum sequence of sub-edits necessary to transform one node to another.
The algorithm works by iteratively constructing the Levenshtein matrix one diagonal at a time, starting from the upper left cell and ending at the lower right cell. Each successive call to
EditDistance.tighten_bounds()constructs a new diagonal of the matrix and fully tightens the bounds of its edits. Once the lower right cell is expanded, the matrix is complete and the optimal sequence of edits can be reconstructed.
Bounds of this edit are updated after each diagonal is added by observing that the final cost is bounded above by the minimum cost of an edit in the last-expanded diagonal. This results in a monotonically decreasing upper bound.
__init__(from_node: graphtage.TreeNode, to_node: graphtage.TreeNode, from_seq: Sequence[graphtage.TreeNode], to_seq: Sequence[graphtage.TreeNode], insert_remove_penalty: int = 1)¶
Initializes the edit distance edit.
from_node – The node that will be transformed.
to_node – The node into which
from_nodewill be transformed.
from_seq – A sequence of nodes that comprise
to_seq – A sequence of nodes that comprise
insert_remove_penalty – The penalty for inserting or removing a node (default is 1).
__iter__() → Iterator[graphtage.Edit]¶
Returns an iterator over this edit’s sub-edits.
The result of
- Return type
Tests whether the bounds of this edit are less than the bounds of
bounds() → graphtage.bounds.Range¶
Calculates bounds on the cost of this edit.
If the Levenshtein matrix has been fully constructed, return the cost of the lower right cell.
If the matrix is incomplete, then use
super().bounds().lower_boundas the lower bound and the minimum cost in the last completed matrix diagonal as the upper bound.
The bounds on the cost of this edit.
- Return type
has_non_zero_cost() → bool¶
Returns whether this edit has a non-zero cost.
This will tighten the edit’s bounds until either its lower bound is greater than zero or its bounds are definitive.
is_complete() → bool¶
An edit distance edit is only complete once its Levenshtein edit matrix has been fully constructed.
A callback for when an edit is assigned to an
This default implementation adds the edit to the node, and recursively calls
Edit.on_diff()on all of the sub-edits:
from_node.edit = self from_node.edit_list.append(self) for edit in self.edits(): edit.on_diff(edit.from_node)
from_node – The edited node that was added to the diff
Prints this edit.
This is equivalent to:
Returns the sequence being edited.
This is a convenience function solely to aid in automated type checking. It is equivalent to:
tighten_bounds() → bool¶
Tightens the bounds of this edit, if possible.
If the Levenshtein matrix is not yet complete, construct and fully tighten the next diagonal of the matrix.
Returns whether this edit is valid