Defining New Filetypes

Implementing support for a new Graphtage filetype entails extending the graphtage.Filetype class. Subclassing graphtage.Filetype automatically registers it with Graphtage.

Filetype Matching

Input files are matched to an associated graphtage.Filetype using MIME types. Each graphtage.Filetype registers one or more MIME types for which it will be responsible. Input file MIME types are classified using the mimetypes module. Sometimes a filetype does not have a standardized MIME type or is not properly classified by the mimetypes module. For example, Graphtage’s graphtage.pickle.Pickle filetype has neither. You can add support for such a filetype as follows:

import mimetypes

if '.pkl' not in mimetypes.types_map and '.pickle' not in mimetypes.types_map:
    mimetypes.add_type('application/x-python-pickle', '.pkl')
    mimetypes.suffix_map['.pickle'] = '.pkl'

Implementing a New Filetype

With the MIME type registered, here is a sketch of how one might define the Pickle filetype:

from graphtage import BuildOptions, Filetype, Formatter, TreeNode

class Pickle(Filetype):
    def __init__(self):
        super().__init__(
            "pickle",                      # a unique identifier
            "application/python-pickle",   # the primary MIME type
            "application/x-python-pickle"  # an optional secondary MIME type
        )

    def build_tree(self, path: str, options: Optional[BuildOptions] = None) -> TreeNode:
        # return the root node of the tree built from the given pickle file

    def build_tree_handling_errors(self, path: str, options: Optional[BuildOptions] = None) -> Union[str, TreeNode]:
        # the same as the build_tree() function,
        # but on error return a string containing the error message
        #
        # for example:
        try:
            return self.build_tree(path=path, options=options)
        except PickleDecodeError as e:
            return f"Error deserializing {os.path.basename(path)}: {e!s}"

    def get_default_formatter(self) -> GraphtageFormatter:
        # return the formatter associated with this file type