Using Graphtage Programmatically
Graphtage is a command line utility, but it can just as easily be used as a library. This section documents how to interact with Graphtage directly from Python.
The Intermediate Representation
Graphtage’s diffing algorithms operate on an
intermediate representation rather than on the data
structures of the original file format. This allows Graphtage to have generic comparison algorithms that can work on
any input file type. The intermediate representation is a tree of graphtage.TreeNode
objects.
Therefore, the first step is to convert the files being diffed into Graphtage’s intermediate representation. The JSON filetype has a function to convert arbitrary Python objects (comprised of standard Python types) into Graphtage trees:
>>> from graphtage import json
>>> from_tree = json.build_tree({"foo": [1, 2, 3, 4]})
>>> from_tree
DictNode([KeyValuePairNode(key=StringNode('foo'), value=ListNode((IntegerNode(1), IntegerNode(2), IntegerNode(3), IntegerNode(4))))])
Transforming Nodes with Edits
To see the sequence of edits to transform this tree to another, we call graphtage.TreeNode.get_all_edits()
:
>>> to_tree = json.build_tree({"bar": [2, 3, 4]})
>>> to_tree
DictNode([KeyValuePairNode(key=StringNode('bar'), value=ListNode((IntegerNode(2), IntegerNode(3), IntegerNode(4))))])
>>> for edit in from_tree.get_all_edits(to_tree):
... print(edit)
Remove(IntegerNode(1), remove_from=ListNode((IntegerNode(1), IntegerNode(2), IntegerNode(3), IntegerNode(4))))
StringEdit(from_node=StringNode('foo'), to_node=StringNode('bar'))
Applying Edits to Nodes
Both nodes and edits are immutable. We can perform a diff to apply edits to nodes, producing a new tree constructed of
graphtage.EditedTreeNode
objects. Using some Python magic, the new tree’s nodes maintain all of the same
characteristics of the source nodes—including their source node class types—but are also instanceof()
graphtage.EditedTreeNode
, too.
Here is how to diff two nodes:
>>> from_node.diff(to_node)
>>> diff = from_tree.diff(to_tree)
>>> diff
EditedDictNode([EditedKeyValuePairNode(key=EditedStringNode('foo'), value=EditedListNode((EditedIntegerNode(1), EditedIntegerNode(2), EditedIntegerNode(3), EditedIntegerNode(4))))])
As you can see, the tree was reconstructed with edited versions of each node. Each node will have a new member variable,
graphtage.EditedTreeNode.edit
, containing the edit that that chose to apply to itself (or None
if the
node did not need to be edited). There are also additional member variables to indicate whether the node has been
removed from its parent container.
Formatting and Printing Results
There are two components to outputting a tree or diff: a graphtage.formatter.Formatter
, which is responsible
for the syntax of the output, and a graphtage.printer.Printer
, which is responsible for rendering that output
to a stream. For example, to print our diff in JSON format to the default printer (STDOUT), we would do:
>>> from graphtage import printer
>>> with printer.DEFAULT_PRINTER as p:
... json.JSONFormatter.DEFAULT_INSTANCE.print(printer.DEFAULT_PRINTER, diff)
...
{
"++bar++~~foo~~": [
~~1~~,
2,
3,
4
]
}
Since Graphtage’s formatters are independent of the input format, thanks to the intermediate representation, we can just as easily output the diff in another format, like YAML:
>>> from graphtage import yaml
>>> with printer.DEFAULT_PRINTER as p:
... yaml.YAMLFormatter.DEFAULT_INSTANCE.print(printer.DEFAULT_PRINTER, diff)
...
++bar++~~foo~~:
- ~~1~~
- 2
- 3
- 4
Diffing In-Memory Python Objects
When used as a library, Graphtage has the ability to diff in-memory Python objects. This can be useful when debugging, for example, to quickly determine the difference between two Python objects that cause a differential.:
>>> from graphtage.pydiff import print_diff
>>> with printer.DEFAULT_PRINTER as p:
... obj1 = [1, 2, {3: "three"}, 4]
... obj2 = [1, 2, {3: 3}, "four"]
... print_diff(obj1, obj2, printer=p)
[1,2,{3: "three" -> 3},++"four"++~~4~~]
Python object diffing also works with custom classes:
>>> class Foo:
... def __init__(self, bar, baz):
... self.bar = bar
... self.baz = baz
>>> with printer.DEFAULT_PRINTER as p:
... print_diff(Foo("bar", "baz"), Foo("bar", "bak"), printer=p)
Foo(bar="bar", baz="ba++k++~~z~~")