graphtage.xml

A graphtage.Filetype for parsing, diffing, and rendering XML files.

This class is also currently used for parsing HTML.

The parser is implemented atop xml.etree.ElementTree. Any XML or HTML accepted by that module will also be accepted by this module.

xml classes

HTML

class graphtage.xml.HTML

Bases: graphtage.xml.XML

The HTML file type.

__init__()

Initializes the HTML file type.

By default, HTML associates itself with the “html”, “text/html”, and “application/xhtml+xml” MIME types.

build_tree(path: str, options: Optional[graphtage.BuildOptions] = None)graphtage.TreeNode

Builds an intermediate representation tree from a file of this Filetype.

Parameters
  • path – Path to the file to parse

  • options – An optional set of options for building the tree

Returns

The root tree node of the provided file

Return type

TreeNode

build_tree_handling_errors(path: str, options: Optional[graphtage.BuildOptions] = None) → Union[str, graphtage.TreeNode]

Same as Filetype.build_tree(), but it should return a human-readable error string on failure.

This function should never throw an exception.

Parameters
  • path – Path to the file to parse

  • options – An optional set of options for building the tree

Returns

On success, the root tree node, or a string containing the error message on failure.

Return type

Union[str, TreeNode]

get_default_formatter()graphtage.xml.XMLFormatter

Returns the default formatter for printing files of this type.

XML

class graphtage.xml.XML

Bases: graphtage.Filetype

The XML file type.

__init__()

Initializes the XML file type.

By default, XML associates itself with the “xml”, “application/xml”, and “text/xml” MIME types.

build_tree(path: str, options: Optional[graphtage.BuildOptions] = None)graphtage.TreeNode

Builds an intermediate representation tree from a file of this Filetype.

Parameters
  • path – Path to the file to parse

  • options – An optional set of options for building the tree

Returns

The root tree node of the provided file

Return type

TreeNode

build_tree_handling_errors(path: str, options: Optional[graphtage.BuildOptions] = None) → Union[str, graphtage.TreeNode]

Same as Filetype.build_tree(), but it should return a human-readable error string on failure.

This function should never throw an exception.

Parameters
  • path – Path to the file to parse

  • options – An optional set of options for building the tree

Returns

On success, the root tree node, or a string containing the error message on failure.

Return type

Union[str, TreeNode]

get_default_formatter()graphtage.xml.XMLFormatter

Returns the default formatter for printing files of this type.

XMLChildFormatter

class graphtage.xml.XMLChildFormatter(*args, **kwargs)

Bases: graphtage.formatter.Formatter[Union[TreeNode, Edit]]

DEFAULT_INSTANCE: Formatter[T] = <graphtage.xml.XMLChildFormatter object>
__init__()

Initializes a sequence formatter.

Parameters
  • start_symbol – The symbol to print at the start of the sequence.

  • end_symbol – The symbol to print at the end of the sequence.

  • delimiter – A delimiter to print between items.

  • delimiter_callback

    A callback for when a delimiter is to be printed. If omitted, this defaults to:

    lambda p: p.write(delimiter)
    

static __new__(cls, *args, **kwargs)graphtage.formatter.Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

delimiter_callback: Callable[[Printer], Any]
edit_print(printer: graphtage.printer.Printer, edit: graphtage.Edit)

Called when the edit for an item is to be printed.

If the SequenceNode being printed either is not edited or has no edits, then the edit passed to this function will be a Match(child, child, 0).

This implementation simply delegates the print to the Formatting Protocol:

self.print(printer, edit)
get_formatter(item: T) → Optional[Callable[[graphtage.printer.Printer, T], Any]]

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = True
item_newline(printer: graphtage.printer.Printer, is_first: bool = False, is_last: bool = False)

Called before each node is printed.

This is also called one extra time after the last node, if there is at least one node printed.

The default implementation is simply:

printer.newline()
items_indent(printer: graphtage.printer.Printer)graphtage.printer.Printer

Returns a Printer context with an indentation.

This is called as:

with self.items_indent(printer) as p:

immediately after the self.start_symbol is printed, but before any of the items have been printed.

This default implementation is equivalent to:

return printer.indent()
parent: Optional[Formatter[T]] = None
print(printer: graphtage.printer.Printer, node_or_edit: Union[graphtage.TreeNode, graphtage.Edit], with_edits: bool = True)

Prints the given node or edit.

Parameters
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_ListNode(*args, **kwargs)
print_SequenceNode(printer: graphtage.printer.Printer, node: graphtage.sequences.SequenceNode)

Formats a sequence node.

The protocol for this function is as follows:

  • Print self.start_symbol

  • With the printer returned by self.items_indent:
    • For each edit in the sequence (or just a sequence of graphtage.Match for each child, if the node is not edited):
      • Call self.item_newline(printer, is_first=index == 0)

      • Call self.edit_print(printer, edit)

  • If at least one edit was printed, then call self.item_newline(printer, is_last=True)

  • Print self.start_symbol

property root

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = ()
sub_formatters: List[Formatter[T]] = []

XMLElement

class graphtage.xml.XMLElement(tag: graphtage.StringNode, attrib: Optional[Dict[graphtage.StringNode, graphtage.StringNode]] = None, text: Optional[graphtage.StringNode] = None, children: Sequence[graphtage.xml.XMLElement] = (), allow_key_edits: bool = True)

Bases: graphtage.TreeNode, collections.abc.Iterable, Sized, abc.ABC

“A node representing an XML element.

__init__(tag: graphtage.StringNode, attrib: Optional[Dict[graphtage.StringNode, graphtage.StringNode]] = None, text: Optional[graphtage.StringNode] = None, children: Sequence[graphtage.xml.XMLElement] = (), allow_key_edits: bool = True)

Initializes an XML element.

Parameters
  • tag – The tag of the element.

  • attrib – The attributes of the element.

  • text – The text of the element.

  • children – The children of the element.

  • allow_key_edits – Whether or not to allow keys to be edited when matching element attributes.

all_children_are_leaves()bool

Tests whether all of the children of this container are leaves.

Equivalent to:

all(c.is_leaf for c in self)
Returns

True if all children are leaves.

Return type

bool

attrib: graphtage.DictNode

The attributes of this element.

calculate_total_size()int

Calculates the size of this node. This is an arbitrary, immutable value that is used to calculate the bounded costs of edits on this node.

Returns

An arbitrary integer representing the size of this node.

Return type

int

children() → Collection[graphtage.TreeNode]

The children of this node.

Equivalent to:

list(self)
dfs() → Iterator[graphtage.TreeNode]

Performs a depth-first traversal over all of this node’s descendants.

self is always included and yielded first.

This implementation is equivalent to:

stack = [self]
while stack:
    node = stack.pop()
    yield node
    stack.extend(reversed(node.children()))
diff(node: graphtage.TreeNode) → Union[graphtage.EditedTreeNode, T]

Performs a diff against the provided node.

Parameters

node – The node against which to perform the diff.

Returns

An edited version of this node with all edits being completed.

Return type

Union[EditedTreeNode, T]

edit_modifiers: Optional[List[Callable[[graphtage.TreeNode, graphtage.TreeNode], Optional[graphtage.Edit]]]] = None
editable_dict() → Dict[str, Any]

Copies self.__dict__, calling TreeNode.editable_dict() on any TreeNode objects therein.

This is equivalent to:

ret = dict(self.__dict__)
if not self.is_leaf:
    for key, value in ret.items():
        if isinstance(value, TreeNode):
            ret[key] = value.make_edited()
return ret

This is used by TreeNode.make_edited().

property edited

Returns whether this node has been edited.

The default implementation returns False, whereas EditedTreeNode.edited() returns True.

classmethod edited_type() → Type[Union[graphtage.EditedTreeNode, T]]

Dynamically constructs a new class that is both a TreeNode and an EditedTreeNode.

The edited type’s member variables are populated by the result of TreeNode.editable_dict() of the TreeNode it wraps:

new_node.__dict__ = dict(wrapped_tree_node.editable_dict())
Returns

A class that is both a TreeNode and an EditedTreeNode. Its constructor accepts a TreeNode that it will wrap.

Return type

Type[Union[EditedTreeNode, T]]

edits(node)graphtage.Edit

Calculates the best edit to transform this node into the provided node.

Parameters

node – The node to which to transform.

Returns

The best possible edit.

Return type

Edit

get_all_edits(node: graphtage.TreeNode) → Iterator[graphtage.Edit]

Returns an iterator over all edits that will transform this node into the provided node.

Parameters

node – The node to which to transform this one.

Returns

An iterator over edits. Note that this iterator will automatically explode any CompoundEdit in the sequence.

Return type

Iterator[Edit]

property is_leaf

Container nodes are never leaves, even if they have no children.

Returns

False

Return type

bool

make_edited() → Union[graphtage.EditedTreeNode, T]

Returns a new, copied instance of this node that is also an instance of EditedTreeNode.

This is equivalent to:

return self.edited_type()(self)
Returns

A copied version of this node that is also an instance of EditedTreeNode and thereby mutable.

Return type

Union[EditedTreeNode, T]

print(printer: graphtage.printer.Printer)

Prints this node.

tag: graphtage.StringNode

The tag of this element.

text: Optional[graphtage.StringNode]

The text of this element.

to_obj()

Returns a pure Python representation of this node.

For example, a node representing a list, like graphtage.ListNode, should return a Python list. A node representing a mapping, like graphtage.MappingNode, should return a Python dict. Container nodes should recursively call TreeNode.to_obj() on all of their children.

This is used solely for the providing objects to operate on in the commandline expressions evaluation, for options like –match-if and –match-unless.

property total_size

The size of this node.

This is an arbitrary, immutable value that is used to calculate the bounded costs of edits on this node.

The first time this property is called, its value will be set and memoized by calling TreeNode.calculate_total_size().

Returns

An arbitrary integer representing the size of this node.

Return type

int

XMLElementAttribFormatter

class graphtage.xml.XMLElementAttribFormatter(*args, **kwargs)

Bases: graphtage.formatter.Formatter[Union[TreeNode, Edit]]

DEFAULT_INSTANCE: Formatter[T] = <graphtage.xml.XMLElementAttribFormatter object>
__init__()

Initializes a sequence formatter.

Parameters
  • start_symbol – The symbol to print at the start of the sequence.

  • end_symbol – The symbol to print at the end of the sequence.

  • delimiter – A delimiter to print between items.

  • delimiter_callback

    A callback for when a delimiter is to be printed. If omitted, this defaults to:

    lambda p: p.write(delimiter)
    

static __new__(cls, *args, **kwargs)graphtage.formatter.Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

delimiter_callback: Callable[[Printer], Any]
edit_print(printer: graphtage.printer.Printer, edit: graphtage.Edit)

Called when the edit for an item is to be printed.

If the SequenceNode being printed either is not edited or has no edits, then the edit passed to this function will be a Match(child, child, 0).

This implementation simply delegates the print to the Formatting Protocol:

self.print(printer, edit)
get_formatter(item: T) → Optional[Callable[[graphtage.printer.Printer, T], Any]]

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = True
item_newline(printer: graphtage.printer.Printer, is_first: bool = False, is_last: bool = False)

Called before each node is printed.

This is also called one extra time after the last node, if there is at least one node printed.

The default implementation is simply:

printer.newline()
items_indent(printer: graphtage.printer.Printer)graphtage.printer.Printer

Returns a Printer context with an indentation.

This is called as:

with self.items_indent(printer) as p:

immediately after the self.start_symbol is printed, but before any of the items have been printed.

This default implementation is equivalent to:

return printer.indent()
parent: Optional[Formatter[T]] = None
print(printer: graphtage.printer.Printer, node_or_edit: Union[graphtage.TreeNode, graphtage.Edit], with_edits: bool = True)

Prints the given node or edit.

Parameters
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_KeyValuePairNode(printer: graphtage.printer.Printer, node: graphtage.KeyValuePairNode)
print_MappingNode(*args, **kwargs)
print_MultiSetNode(*args, **kwargs)
print_SequenceNode(printer: graphtage.printer.Printer, node: graphtage.sequences.SequenceNode)

Formats a sequence node.

The protocol for this function is as follows:

  • Print self.start_symbol

  • With the printer returned by self.items_indent:
    • For each edit in the sequence (or just a sequence of graphtage.Match for each child, if the node is not edited):
      • Call self.item_newline(printer, is_first=index == 0)

      • Call self.edit_print(printer, edit)

  • If at least one edit was printed, then call self.item_newline(printer, is_last=True)

  • Print self.start_symbol

property root

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = ()
sub_formatters: List[Formatter[T]] = []

XMLElementEdit

class graphtage.xml.XMLElementEdit(from_node: graphtage.xml.XMLElement, to_node: graphtage.xml.XMLElement)

Bases: graphtage.AbstractCompoundEdit

An edit on an XML element.

__init__(from_node: graphtage.xml.XMLElement, to_node: graphtage.xml.XMLElement)

Initializes an XML element edit.

Parameters
  • from_node – The node being edited.

  • to_node – The node to which from_node will be transformed.

__iter__() → Iterator[graphtage.Edit]

Returns an iterator over this edit’s sub-edits.

Returns

The result of AbstractCompoundEdit.edits()

Return type

Iterator[Edit]

__lt__(other)

Tests whether the bounds of this edit are less than the bounds of other.

attrib_edit: graphtage.Edit

The edit to transform this element’s attributes.

bounds()graphtage.bounds.Range

Returns the bounds of this edit.

This defaults to the bounds provided when this AbstractEdit was constructed. If an upper bound was not provided to the constructor, the upper bound defaults to:

self.from_node.total_size + self.to_node.total_size + 1
Returns

A range bounding the cost of this edit.

Return type

Range

child_edit: graphtage.Edit

The edit to transform this node’s children.

edits() → Iterator[graphtage.Edit]

Returns an iterator over this edit’s sub-edits

from_node: graphtage.TreeNode
has_non_zero_cost()bool

Returns whether this edit has a non-zero cost.

This will tighten the edit’s bounds until either its lower bound is greater than zero or its bounds are definitive.

initial_bounds: graphtage.bounds.Range
is_complete()bool

An edit is complete when no further calls to Edit.tighten_bounds() will change the nature of the edit.

This implementation considers an edit complete if it is valid and its bounds are definitive:

return not self.valid or self.bounds().definitive()

If an edit is able to discern that it has a unique solution even if its final bounds are unknown, it should reimplement this method to define that check.

For example, in the case of a CompoundEdit, this method should only return True if no future calls to Edit.tighten_bounds() will affect the result of CompoundEdit.edits().

Returns

True if subsequent calls to Edit.tighten_bounds() will only serve to tighten the bounds of this edit and will not affect the semantics of the edit.

Return type

bool

on_diff(from_node: graphtage.EditedTreeNode)

A callback for when an edit is assigned to an EditedTreeNode in TreeNode.diff().

This default implementation adds the edit to the node, and recursively calls Edit.on_diff() on all of the sub-edits:

from_node.edit = self
from_node.edit_list.append(self)
for edit in self.edits():
    edit.on_diff(edit.from_node)
Parameters

from_node – The edited node that was added to the diff

print(formatter: graphtage.GraphtageFormatter, printer: graphtage.printer.Printer)

Edits can optionally implement a printing method

This function is called automatically from the formatter in the Printing Protocol and should never be called directly unless you really know what you’re doing! Raising NotImplementedError will cause the formatter to fall back on its own printing implementations.

This implementation is equivalent to:

for edit in self.edits():
    edit.print(formatter, printer)
tag_edit: graphtage.Edit

The edit to transform this element’s tag.

text_edit: Optional[graphtage.Edit]

The edit to transform this element’s text.

tighten_bounds()bool

Tightens the Edit.bounds() on the cost of this edit, if possible.

Returns

True if the bounds have been tightened.

Return type

bool

Note

Implementations of this function should return False if and only if self.bounds().definitive().

property valid

Returns whether this edit is valid

XMLElementObj

class graphtage.xml.XMLElementObj(tag: str, attrib: Dict[str, str], text: Optional[str] = None, children: Optional[Sequence[graphtage.xml.XMLElementObj]] = ())

Bases: object

An object for interacting with XMLElement from command line expressions.

__init__(tag: str, attrib: Dict[str, str], text: Optional[str] = None, children: Optional[Sequence[graphtage.xml.XMLElementObj]] = ())

Initializes an XML Element Object.

Parameters
  • tag – The tag of the element.

  • attrib – The attributes of the element.

  • text – The text of the element.

  • children – The children of the element.

attrib: Dict[str, str]

The attributes of this element.

children: Optional[Sequence[XMLElementObj]]

The children of this element.

tag: str

The tag of this element.

text: Optional[str]

The text of this element.

XMLFormatter

class graphtage.xml.XMLFormatter(*args, **kwargs)

Bases: graphtage.formatter.Formatter[Union[TreeNode, Edit]]

DEFAULT_INSTANCE: Formatter[T] = <graphtage.xml.XMLFormatter object>
__init__()

Initialize self. See help(type(self)) for accurate signature.

static __new__(cls, *args, **kwargs)graphtage.formatter.Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

get_formatter(item: T) → Optional[Callable[[graphtage.printer.Printer, T], Any]]

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = False
parent: Optional[Formatter[T]] = None
print(printer: graphtage.printer.Printer, node_or_edit: Union[graphtage.TreeNode, graphtage.Edit], with_edits: bool = True)

Prints the given node or edit.

Parameters
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_LeafNode(printer: graphtage.printer.Printer, node: graphtage.LeafNode)
print_XMLElement(printer: graphtage.printer.Printer, node: graphtage.xml.XMLElement)
property root

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = [<class 'graphtage.xml.XMLStringFormatter'>, <class 'graphtage.xml.XMLChildFormatter'>, <class 'graphtage.xml.XMLElementAttribFormatter'>]
sub_formatters: List[Formatter[T]] = []

XMLStringFormatter

class graphtage.xml.XMLStringFormatter(*args, **kwargs)

Bases: graphtage.formatter.Formatter[Union[TreeNode, Edit]]

DEFAULT_INSTANCE: Formatter[T] = <graphtage.xml.XMLStringFormatter object>
__init__()

Initialize self. See help(type(self)) for accurate signature.

static __new__(cls, *args, **kwargs)graphtage.formatter.Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

context(printer: graphtage.printer.Printer)
escape(c: str)str

String escape.

This function is called once for each character in the string.

Returns

The escaped version of c, or c itself if no escaping is required.

Return type

str

This is equivalent to:

html.escape(c)
get_formatter(item: T) → Optional[Callable[[graphtage.printer.Printer, T], Any]]

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = True
is_quoted: bool = False
parent: Optional[Formatter[T]] = None
print(printer: graphtage.printer.Printer, node_or_edit: Union[graphtage.TreeNode, graphtage.Edit], with_edits: bool = True)

Prints the given node or edit.

Parameters
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_StringEdit(printer: graphtage.printer.Printer, edit: graphtage.StringEdit)
print_StringNode(printer: graphtage.printer.Printer, node: graphtage.StringNode)
property root

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = ()
sub_formatters: List[Formatter[T]] = []
write_char(printer: graphtage.printer.Printer, c: str, index: int, num_edits: int, removed=False, inserted=False)

Writes a character to the printer.

Note

This function calls graphtage.StringFormatter.escape(); classes extending graphtage.StringFormatter should also call graphtage.StringFormatter.escape() when reimplementing this function.

Note

There is no need to specially format characters that have been removed or inserted; the printer will have already automatically been configured to format them prior to the call to StringFormatter.write_char().

Parameters
  • printer – The printer to which to write the character.

  • c – The character to write.

  • index – The index of the character in the string.

  • num_edits – The total number of characters that will be printed.

  • removed – Whether this character was removed from the source string.

  • inserted – Whether this character is inserted into the source string.

write_end_quote(printer: graphtage.printer.Printer, edit: graphtage.StringEdit)

Prints an ending quote for the string, if necessary

write_start_quote(printer: graphtage.printer.Printer, edit: graphtage.StringEdit)

Prints a starting quote for the string, if necessary

xml functions

build_tree

graphtage.xml.build_tree(path_or_element_tree: Union[str, xml.etree.ElementTree.Element, xml.etree.ElementTree.ElementTree], options: Optional[graphtage.BuildOptions] = None)graphtage.xml.XMLElement

Constructs an XML element node from an XML file.