graphtage.csv

A graphtage.Filetype for parsing, diffing, and rendering CSV files.

csv classes

CSV

class graphtage.csv.CSV

Bases: Filetype

The CSV filetype.

__init__()

Initializes the CSV filetype.

CSV identifies itself with the MIME types csv and text/csv.

build_tree(path: str, options: BuildOptions | None = None) TreeNode

Equivalent to build_tree()

build_tree_handling_errors(path: str, options: BuildOptions | None = None) TreeNode

Same as Filetype.build_tree(), but it should return a human-readable error string on failure.

This function should never throw an exception.

Parameters:
  • path – Path to the file to parse

  • options – An optional set of options for building the tree

Returns:

On success, the root tree node, or a string containing the error message on failure.

Return type:

Union[str, TreeNode]

get_default_formatter() CSVFormatter

Returns the default formatter for printing files of this type.

CSVFormatter

class graphtage.csv.CSVFormatter(*args, **kwargs)

Bases: GraphtageFormatter

Top-level formatter for CSV files.

DEFAULT_INSTANCE: Formatter[T] = <graphtage.csv.CSVFormatter object>

A default instance of this formatter, automatically instantiated by the FormatterChecker metaclass.

__init__()
static __new__(cls, *args, **kwargs) Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

get_formatter(item: T) Callable[[Printer, T], Any] | None

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = False
parent: Formatter[T] | None = None

The parent formatter for this formatter instance.

This is automatically populated by Formatter.__new__() and should never be manually modified.

print(printer: Printer, node_or_edit: TreeNode | Edit, with_edits: bool = True)

Prints the given node or edit.

Parameters:
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_LeafNode(printer: Printer, node: LeafNode)

Prints a leaf node, which should always be a column in a CSV row.

The node is escaped by first writing it to csv.writer():

csv.writer(...).writerow([node.object])
property root: Formatter[T]

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = [<class 'graphtage.csv.CSVRows'>, <class 'graphtage.json.JSONFormatter'>]

A list of formatter types that should be used as sub-formatters in the Formatting Protocol.

sub_formatters: List[Formatter[T]] = []

The list of instantiated formatters corresponding to Formatter.sub_format_types.

This list is automatically populated by Formatter.__new__() and should never be manually modified.

CSVNode

class graphtage.csv.CSVNode(nodes: Iterable[T], allow_list_edits: bool = True, allow_list_edits_when_same_length: bool = True)

Bases: ListNode[CSVRow]

A node representing zero or more CSV rows.

__init__(nodes: Iterable[T], allow_list_edits: bool = True, allow_list_edits_when_same_length: bool = True)

Initializes a List node.

Parameters:
  • nodes – The set of nodes in this list.

  • allow_list_edits – Whether to consider removal and insertion when editing this list.

  • allow_list_edits_when_same_length – Whether to consider removal and insertion when comparing this list to another list of the same length.

__iter__() Iterator[TreeNode]

Iterates over this sequence’s child nodes.

This is equivalent to:

return iter(self._children)
__len__() int

The number of children of this sequence.

This is equivalent to:

return len(self._children)
all_children_are_leaves() bool

Tests whether all of the children of this container are leaves.

Equivalent to:

all(c.is_leaf for c in self)
Returns:

True if all children are leaves.

Return type:

bool

calculate_total_size()

Calculates the total size of this sequence.

This is equivalent to:

return sum(c.total_size for c in self)
child_indexes: Dict[TreeNode, int]
children() T

The children of this node.

Equivalent to:

list(self)
property container_type: Type[Tuple[T, ...]]

The container type required by graphtage.sequences.SequenceNode

Returns:

tuple

Return type:

Type[Tuple[T, …]]

dfs() Iterator[TreeNode]

Performs a depth-first traversal over all of this node’s descendants.

self is always included and yielded first.

This implementation is equivalent to:

stack = [self]
while stack:
    node = stack.pop()
    yield node
    stack.extend(reversed(node.children()))
diff(node: TreeNode) EditedTreeNode | T

Performs a diff against the provided node.

Parameters:

node – The node against which to perform the diff.

Returns:

An edited version of this node with all edits being completed.

Return type:

Union[EditedTreeNode, T]

edit_modifiers: List[Callable[[TreeNode, TreeNode], Edit | None]] | None = None
editable_dict() Dict[str, Any]

Copies self.__dict__, calling TreeNode.editable_dict() on all children.

This is equivalent to:

ret = dict(self.__dict__)
ret['_children'] = self.container_type(n.make_edited() for n in self)
return ret

This is used by SequenceNode.make_edited().

property edited: bool

Returns whether this node has been edited.

The default implementation returns False, whereas EditedTreeNode.edited() returns True.

edits(node: TreeNode) Edit

Calculates the best edit to transform this node into the provided node.

Parameters:

node – The node to which to transform.

Returns:

The best possible edit.

Return type:

Edit

get_all_edit_contexts(node: TreeNode) Iterator[Tuple[Tuple[TreeNode, ...], Edit]]

Returns an iterator over all edit contexts that will transform this node into the provided node.

Parameters:

node – The node to which to transform this one.

Returns:

An iterator over pairs of paths from node to the edited node, as well as its edit. Note that this iterator will automatically explode any CompoundEdit in the sequence.

Return type:

Iterator[Tuple[Tuple[“TreeNode”, …], Edit]

get_all_edits(node: TreeNode) Iterator[Edit]

Returns an iterator over all edits that will transform this node into the provided node.

Parameters:

node – The node to which to transform this one.

Returns:

An iterator over edits. Note that this iterator will automatically explode any CompoundEdit in the sequence.

Return type:

Iterator[Edit]

property is_leaf: bool

Container nodes are never leaves, even if they have no children.

Returns:

False

Return type:

bool

make_edited() EditedTreeNode | T

Returns a new, copied instance of this node that is also an instance of EditedTreeNode.

This is equivalent to:

return self.__class__.edited_type()(self)
Returns:

A copied version of this node that is also an instance of EditedTreeNode and thereby mutable.

Return type:

Union[EditedTreeNode, T]

property parent: TreeNode | None

The parent node of this node, or None if it has no parent.

The setter for this property should only be called by the parent node setting itself as the parent of its child.

ContainerNode subclasses automatically set this property for all of their children. However, if you define a subclass of TreeNode does not extend off of ContainerNode and for which len(self.children()) > 0, then each child’s parent must be set.

print(printer: Printer)

Prints a sequence node.

By default, sequence nodes are printed like lists:

SequenceFormatter('[', ']', ',').print(printer, self)
print_parent_context(printer: Printer, for_child: TreeNode)

Prints the context for the given child node.

For example, if this node represents a list and the child is the element at index 3, then “[3]” might be printed.

The child is expected to be one of this node’s children, but this is not validated.

The default implementation prints nothing.

to_obj()

Returns a pure Python representation of this node.

For example, a node representing a list, like graphtage.ListNode, should return a Python list. A node representing a mapping, like graphtage.MappingNode, should return a Python dict. Container nodes should recursively call TreeNode.to_obj() on all of their children.

This is used solely for the providing objects to operate on in the commandline expressions evaluation, for options like –match-if and –match-unless.

property total_size: int

The size of this node.

This is an arbitrary, immutable value that is used to calculate the bounded costs of edits on this node.

The first time this property is called, its value will be set and memoized by calling TreeNode.calculate_total_size().

Returns:

An arbitrary integer representing the size of this node.

Return type:

int

CSVRow

class graphtage.csv.CSVRow(nodes: Iterable[T], allow_list_edits: bool = True, allow_list_edits_when_same_length: bool = True)

Bases: ListNode[TreeNode]

A node representing a row of a CSV file.

__init__(nodes: Iterable[T], allow_list_edits: bool = True, allow_list_edits_when_same_length: bool = True)

Initializes a List node.

Parameters:
  • nodes – The set of nodes in this list.

  • allow_list_edits – Whether to consider removal and insertion when editing this list.

  • allow_list_edits_when_same_length – Whether to consider removal and insertion when comparing this list to another list of the same length.

__iter__() Iterator[TreeNode]

Iterates over this sequence’s child nodes.

This is equivalent to:

return iter(self._children)
__len__() int

The number of children of this sequence.

This is equivalent to:

return len(self._children)
all_children_are_leaves() bool

Tests whether all of the children of this container are leaves.

Equivalent to:

all(c.is_leaf for c in self)
Returns:

True if all children are leaves.

Return type:

bool

calculate_total_size()

Calculates the total size of this sequence.

This is equivalent to:

return sum(c.total_size for c in self)
child_indexes: Dict[TreeNode, int]
children() T

The children of this node.

Equivalent to:

list(self)
property container_type: Type[Tuple[T, ...]]

The container type required by graphtage.sequences.SequenceNode

Returns:

tuple

Return type:

Type[Tuple[T, …]]

dfs() Iterator[TreeNode]

Performs a depth-first traversal over all of this node’s descendants.

self is always included and yielded first.

This implementation is equivalent to:

stack = [self]
while stack:
    node = stack.pop()
    yield node
    stack.extend(reversed(node.children()))
diff(node: TreeNode) EditedTreeNode | T

Performs a diff against the provided node.

Parameters:

node – The node against which to perform the diff.

Returns:

An edited version of this node with all edits being completed.

Return type:

Union[EditedTreeNode, T]

edit_modifiers: List[Callable[[TreeNode, TreeNode], Edit | None]] | None = None
editable_dict() Dict[str, Any]

Copies self.__dict__, calling TreeNode.editable_dict() on all children.

This is equivalent to:

ret = dict(self.__dict__)
ret['_children'] = self.container_type(n.make_edited() for n in self)
return ret

This is used by SequenceNode.make_edited().

property edited: bool

Returns whether this node has been edited.

The default implementation returns False, whereas EditedTreeNode.edited() returns True.

edits(node: TreeNode) Edit

Calculates the best edit to transform this node into the provided node.

Parameters:

node – The node to which to transform.

Returns:

The best possible edit.

Return type:

Edit

get_all_edit_contexts(node: TreeNode) Iterator[Tuple[Tuple[TreeNode, ...], Edit]]

Returns an iterator over all edit contexts that will transform this node into the provided node.

Parameters:

node – The node to which to transform this one.

Returns:

An iterator over pairs of paths from node to the edited node, as well as its edit. Note that this iterator will automatically explode any CompoundEdit in the sequence.

Return type:

Iterator[Tuple[Tuple[“TreeNode”, …], Edit]

get_all_edits(node: TreeNode) Iterator[Edit]

Returns an iterator over all edits that will transform this node into the provided node.

Parameters:

node – The node to which to transform this one.

Returns:

An iterator over edits. Note that this iterator will automatically explode any CompoundEdit in the sequence.

Return type:

Iterator[Edit]

property is_leaf: bool

Container nodes are never leaves, even if they have no children.

Returns:

False

Return type:

bool

make_edited() EditedTreeNode | T

Returns a new, copied instance of this node that is also an instance of EditedTreeNode.

This is equivalent to:

return self.__class__.edited_type()(self)
Returns:

A copied version of this node that is also an instance of EditedTreeNode and thereby mutable.

Return type:

Union[EditedTreeNode, T]

property parent: TreeNode | None

The parent node of this node, or None if it has no parent.

The setter for this property should only be called by the parent node setting itself as the parent of its child.

ContainerNode subclasses automatically set this property for all of their children. However, if you define a subclass of TreeNode does not extend off of ContainerNode and for which len(self.children()) > 0, then each child’s parent must be set.

print(printer: Printer)

Prints a sequence node.

By default, sequence nodes are printed like lists:

SequenceFormatter('[', ']', ',').print(printer, self)
print_parent_context(printer: Printer, for_child: TreeNode)

Prints the context for the given child node.

For example, if this node represents a list and the child is the element at index 3, then “[3]” might be printed.

The child is expected to be one of this node’s children, but this is not validated.

The default implementation prints nothing.

to_obj()

Returns a pure Python representation of this node.

For example, a node representing a list, like graphtage.ListNode, should return a Python list. A node representing a mapping, like graphtage.MappingNode, should return a Python dict. Container nodes should recursively call TreeNode.to_obj() on all of their children.

This is used solely for the providing objects to operate on in the commandline expressions evaluation, for options like –match-if and –match-unless.

property total_size: int

The size of this node.

This is an arbitrary, immutable value that is used to calculate the bounded costs of edits on this node.

The first time this property is called, its value will be set and memoized by calling TreeNode.calculate_total_size().

Returns:

An arbitrary integer representing the size of this node.

Return type:

int

CSVRowFormatter

class graphtage.csv.CSVRowFormatter(*args, **kwargs)

Bases: SequenceFormatter

A formatter for CSV rows.

DEFAULT_INSTANCE: Formatter[T] = <graphtage.csv.CSVRowFormatter object>

A default instance of this formatter, automatically instantiated by the FormatterChecker metaclass.

__init__()

Initializes the formatter.

Equivalent to:

super().__init__('', '', ',')
static __new__(cls, *args, **kwargs) Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

edit_print(printer: Printer, edit: Edit)

Called when the edit for an item is to be printed.

If the SequenceNode being printed either is not edited or has no edits, then the edit passed to this function will be a Match(child, child, 0).

This implementation simply delegates the print to the Formatting Protocol:

self.print(printer, edit)
get_formatter(item: T) Callable[[Printer, T], Any] | None

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = True

This is a partial formatter; it will not be automatically used in the Formatting Protocol.

item_newline(printer: Printer, is_first: bool = False, is_last: bool = False)

An empty implementation, since each row should be printed as a single line.

items_indent(printer: Printer) Printer

Returns a Printer context with an indentation.

This is called as:

with self.items_indent(printer) as p:

immediately after the self.start_symbol is printed, but before any of the items have been printed.

This default implementation is equivalent to:

return printer.indent()
parent: Formatter[T] | None = None

The parent formatter for this formatter instance.

This is automatically populated by Formatter.__new__() and should never be manually modified.

print(printer: Printer, node_or_edit: TreeNode | Edit, with_edits: bool = True)

Prints the given node or edit.

Parameters:
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_CSVRow(*args, **kwargs)

Prints a CSV row.

Equivalent to:

super().print_SequenceNode(*args, **kwargs)
print_SequenceNode(printer: Printer, node: SequenceNode)

Formats a sequence node.

The protocol for this function is as follows:

  • Print self.start_symbol

  • With the printer returned by self.items_indent:
    • For each edit in the sequence (or just a sequence of graphtage.Match for each child, if the node is not edited):
      • Call self.item_newline(printer, is_first=index == 0)

      • Call self.edit_print(printer, edit)

  • If at least one edit was printed, then call self.item_newline(printer, is_last=True)

  • Print self.start_symbol

property root: Formatter[T]

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = ()

A list of formatter types that should be used as sub-formatters in the Formatting Protocol.

sub_formatters: List[Formatter[T]] = []

The list of instantiated formatters corresponding to Formatter.sub_format_types.

This list is automatically populated by Formatter.__new__() and should never be manually modified.

CSVRows

class graphtage.csv.CSVRows(*args, **kwargs)

Bases: SequenceFormatter

A sub formatter for printing the sequence of rows in a CSV file.

DEFAULT_INSTANCE: Formatter[T] = <graphtage.csv.CSVRows object>

A default instance of this formatter, automatically instantiated by the FormatterChecker metaclass.

__init__()

Initializes the formatter.

Equivalent to:

super().__init__('', '', '')
static __new__(cls, *args, **kwargs) Formatter[T]

Instantiates a new formatter.

This automatically instantiates and populates Formatter.sub_formatters and sets their parent to this new formatter.

edit_print(printer: Printer, edit: Edit)

Called when the edit for an item is to be printed.

If the SequenceNode being printed either is not edited or has no edits, then the edit passed to this function will be a Match(child, child, 0).

This implementation simply delegates the print to the Formatting Protocol:

self.print(printer, edit)
get_formatter(item: T) Callable[[Printer, T], Any] | None

Looks up a formatter for the given item using this formatter as a base.

Equivalent to:

get_formatter(item.__class__, base_formatter=self)
is_partial: bool = True

This is a partial formatter; it will not be automatically used in the Formatting Protocol.

item_newline(printer: Printer, is_first: bool = False, is_last: bool = False)

Prints a newline on all but the first and last items.

items_indent(printer: Printer)

Returns printer because CSV rows do not need to be indented.

parent: Formatter[T] | None = None

The parent formatter for this formatter instance.

This is automatically populated by Formatter.__new__() and should never be manually modified.

print(printer: Printer, node_or_edit: TreeNode | Edit, with_edits: bool = True)

Prints the given node or edit.

Parameters:
  • printer – The printer to which to write.

  • node_or_edit – The node or edit to print.

  • with_edits – If :keyword:True, print any edits associated with the node.

Note

The protocol for determining how a node or edit should be printed is very complex due to its extensibility. See the Printing Protocol for a detailed description.

print_CSVNode(*args, **kwargs)

Prints a CSV node.

Equivalent to:

super().print_SequenceNode(*args, **kwargs)
print_SequenceNode(printer: Printer, node: SequenceNode)

Formats a sequence node.

The protocol for this function is as follows:

  • Print self.start_symbol

  • With the printer returned by self.items_indent:
    • For each edit in the sequence (or just a sequence of graphtage.Match for each child, if the node is not edited):
      • Call self.item_newline(printer, is_first=index == 0)

      • Call self.edit_print(printer, edit)

  • If at least one edit was printed, then call self.item_newline(printer, is_last=True)

  • Print self.start_symbol

property root: Formatter[T]

Returns the root formatter.

sub_format_types: Sequence[Type[Formatter[T]]] = [<class 'graphtage.csv.CSVRowFormatter'>]

A list of formatter types that should be used as sub-formatters in the Formatting Protocol.

sub_formatters: List[Formatter[T]] = []

The list of instantiated formatters corresponding to Formatter.sub_format_types.

This list is automatically populated by Formatter.__new__() and should never be manually modified.

csv functions

build_tree

graphtage.csv.build_tree(path: str, options: BuildOptions | None = None, *args, **kwargs) CSVNode

Constructs a CSVNode from a CSV file.

The file is parsed using Python’s csv.reader(). The elements in each row are constructed by delegating to graphtage.json.build_tree():

CSVRow([json.build_tree(i, options=options) for i in row])
Parameters:
  • path – The path to the file to be parsed.

  • options – Optional build options to pass on to graphtage.json.build_tree().

  • *args – Any extra positional arguments are passed on to csv.reader().

  • **kwargs – Any extra keyword arguments are passed on to csv.reader().

Returns:

The resulting CSV node object.

Return type:

CSVNode