graphtage.utils

Generic utility functions and classes.

utils classes

HashableCounter

class graphtage.utils.HashableCounter(*args, **kwargs)

Bases: Generic[T], Counter[T], Counter

A Counter that supports being hashed even though it is mutable.

__add__(other)

Add counts from two counters.

>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})

__and__(other)

Intersection is the minimum of corresponding counts.

>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})

__delitem__(elem): Like dict.__delitem__() but does not raise KeyError for missing values.

__eq__(other): True if all counts agree. Missing counts are treated as zero.

__ge__(other): True if all counts in self are a superset of those in other.

__gt__(other): True if all counts in self are a proper superset of those in other.

__iadd__(other)

Inplace add from another counter, keeping only positive counts.

>>> c = Counter('abbb')
>>> c += Counter('bcc')
>>> c
Counter({'b': 4, 'c': 2, 'a': 1})

__iand__(other)

Inplace intersection is the minimum of corresponding counts.

>>> c = Counter('abbb')
>>> c &= Counter('bcc')
>>> c
Counter({'b': 1})

__init__(*args, **kwargs)

__ior__(other)

Inplace union is the maximum of value from either counter.

>>> c = Counter('abbb')
>>> c |= Counter('bcc')
>>> c
Counter({'b': 3, 'c': 2, 'a': 1})

__isub__(other)

Inplace subtract counter, but keep only results with positive counts.

>>> c = Counter('abbbc')
>>> c -= Counter('bccd')
>>> c
Counter({'b': 2, 'a': 1})

__le__(other): True if all counts in self are a subset of those in other.

__lt__(other): True if all counts in self are a proper subset of those in other.

__missing__(key): The count of elements not in the Counter is zero.

__ne__(other): True if any counts disagree. Missing counts are treated as zero.

__neg__(): Subtracts from an empty counter. Strips positive and zero counts, and flips the sign on negative counts.

__or__(other)

Union is the maximum of value in either of the input counters.

>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})

__pos__(): Adds an empty counter, effectively stripping negative and zero counts

__sub__(other)

Subtract count, but keep only results with positive counts.

>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})

_keep_positive(): Internal method to strip elements with a negative or zero count

clear() → None. Remove all items from D.

copy(): Return a shallow copy.

elements() → Iterator

Iterator over elements repeating each as many times as its count.

Examples:

>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']

# Knuth's example for prime factors of 1836:  2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements():     # loop over factors
...     product *= factor                       # and multiply them
>>> product
1836

Note

If an element’s count has been set to zero or is a negative number, elements() will ignore it.

classmethod fromkeys(iterable, v=None): Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /): Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

most_common(n=None)

List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]

pop(k[, d]) → v, remove specified key and return the corresponding value.: If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

subtract(iterable=None, /, **kwds)

Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.subtract('witch')             # subtract elements from another iterable
>>> c.subtract(Counter('watch'))    # subtract elements from another counter
>>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
-1

total(): Sum of the counts

update(iterable=None, /, **kwds)

Like dict.update() but add counts instead of replacing them.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.update('witch')           # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d)                 # add elements from another counter
>>> c['h']                      # four 'h' in which, witch, and watch
4

values() → an object providing a view on D's values

OrderedCounter

class graphtage.utils.OrderedCounter(iterable=None, /, **kwds)

Bases: Counter, OrderedDict

A counter that remembers the order elements are first encountered

__add__(other)

Add counts from two counters.

>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})

__and__(other)

Intersection is the minimum of corresponding counts.

>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})

__delitem__(elem): Like dict.__delitem__() but does not raise KeyError for missing values.

__eq__(other): True if all counts agree. Missing counts are treated as zero.

__ge__(other): True if all counts in self are a superset of those in other.

__gt__(other): True if all counts in self are a proper superset of those in other.

__iadd__(other)

Inplace add from another counter, keeping only positive counts.

>>> c = Counter('abbb')
>>> c += Counter('bcc')
>>> c
Counter({'b': 4, 'c': 2, 'a': 1})

__iand__(other)

Inplace intersection is the minimum of corresponding counts.

>>> c = Counter('abbb')
>>> c &= Counter('bcc')
>>> c
Counter({'b': 1})

__init__(iterable=None, /, **kwds)

Create a new, empty Counter object. And if given, count elements from an input iterable. Or, initialize the count from another mapping of elements to their counts.

>>> c = Counter()                           # a new, empty counter
>>> c = Counter('gallahad')                 # a new counter from an iterable
>>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
>>> c = Counter(a=4, b=2)                   # a new counter from keyword args

__ior__(other)

Inplace union is the maximum of value from either counter.

>>> c = Counter('abbb')
>>> c |= Counter('bcc')
>>> c
Counter({'b': 3, 'c': 2, 'a': 1})

__isub__(other)

Inplace subtract counter, but keep only results with positive counts.

>>> c = Counter('abbbc')
>>> c -= Counter('bccd')
>>> c
Counter({'b': 2, 'a': 1})

__le__(other): True if all counts in self are a subset of those in other.

__lt__(other): True if all counts in self are a proper subset of those in other.

__missing__(key): The count of elements not in the Counter is zero.

__ne__(other): True if any counts disagree. Missing counts are treated as zero.

__neg__(): Subtracts from an empty counter. Strips positive and zero counts, and flips the sign on negative counts.

__or__(other)

Union is the maximum of value in either of the input counters.

>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})

__pos__(): Adds an empty counter, effectively stripping negative and zero counts

__sub__(other)

Subtract count, but keep only results with positive counts.

>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})

_keep_positive(): Internal method to strip elements with a negative or zero count

clear() → None. Remove all items from od.

copy(): Return a shallow copy.

elements() → Iterator

Iterator over elements repeating each as many times as its count.

Examples:

>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']

# Knuth's example for prime factors of 1836:  2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements():     # loop over factors
...     product *= factor                       # and multiply them
>>> product
1836

Note

If an element’s count has been set to zero or is a negative number, elements() will ignore it.

classmethod fromkeys(iterable, v=None): Create a new ordered dictionary with keys from iterable and values set to value.

get(key, default=None, /): Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

most_common(n=None)

List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]

move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(key[, default]) → v, remove specified key and return the corresponding value.: If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

subtract(iterable=None, /, **kwds)

Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.subtract('witch')             # subtract elements from another iterable
>>> c.subtract(Counter('watch'))    # subtract elements from another counter
>>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
-1

total(): Sum of the counts

update(iterable=None, /, **kwds)

Like dict.update() but add counts instead of replacing them.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.update('witch')           # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d)                 # add elements from another counter
>>> c['h']                      # four 'h' in which, witch, and watch
4

values() → an object providing a view on D's values

Sized

class graphtage.utils.Sized(*args, **kwargs)

Bases: Protocol

A protocol for objects that have a getsizeof function.

__init__(*args, **kwargs)

getsizeof: Callable[[], int]: Returns the size of this object.

SparseMatrix

class graphtage.utils.SparseMatrix(num_rows: int | None = None, num_cols: int | None = None, default_value: T | None = None)

Bases: Sized, Generic[T], Mapping[int, MutableMapping[int, T | None]]

A sparse matrix that can store arbitrary Python objects.

For sparse matrices storing homogeneous items and/or native types, it is more efficient to use an implementation like a scipy sparse matrix.

class SparseMatrixRow(row_num: int, num_cols: int | None = None, default_value: T | None = None)

Bases: Sized, MutableMapping[int, T | None]

A row of a sparse matrix.

__getitem__(col: int) → T | None

Returns the value of the given column of this row.

Parameters:: col – The index of the column to get.
Returns:: The value of the column, or the default value if it has not yet been set.
Return type:: Optional[T]
Raises:: IndexError – If self.num_cols is not None and col is greater than or equal to it.

__init__(row_num: int, num_cols: int | None = None, default_value: T | None = None)

Initializes a sparse matrix row.

Parameters:

row_num – The index of this row.
num_cols – An optional number of columns in this row. If omitted, this row will be unbounded and allow insertion and access at any column index.
default_value – An optional default value to use if a column is accessed before it is assigned.

__iter__() → Iterator[int]: Iterates over the indexes of columns that have been set in this row.

__len__() → int: Returns the number of cells defined this row.

clear(): Clears the contents of this row.

default_value: T | None: The default value for this row.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.

getsizeof() → int

Returns an approximation of the number of bytes used by this row in memory.

This is equivalent to:

return sys.getsizeof(self) + getsizeof(self.row)

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

num_cols: int | None: The number of columns in this row.

pop(k[, d]) → v, remove specified key and return the corresponding value.: If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair: as a 2-tuple; but raise KeyError if D is empty.

row: Dict[int, T | None]: Data structure holding the contents of this row.

row_num: int: The index of this row.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D

shape() → int

The number of columns in this row.

This is equivalent to:

if self.num_cols is None:
    if self.row:
        return max(self.row.keys()) + 1
    else:
        return 0
else:
    return self.num_cols

update([E, ]**F) → None. Update D from mapping/iterable E and F.: If E present and has a .keys() method, does: for k in E.keys(): D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D's values

__getitem__(row: int) → MutableMapping[int, T | None]

Returns the value of the given row of this matrix.

Parameters:: row – The index of the row to get.
Returns:: The contents of the row.
Return type:: MutableMapping[int, Optional[T]]
Raises:: IndexError – If self.num_rows is not None and row is greater than or equal to it.

__init__(num_rows: int | None = None, num_cols: int | None = None, default_value: T | None = None)

Initializes a sparse matrix.

Parameters:

num_rows – An optional bound on the number of rows.
num_cols – An optional bound on the number of columns.
default_value – An optional default value to return if cells are accessed before they are set.

__len__() → int

Returns the number of rows in this matrix.

If self.num_rows is None, this will return the maximum row index of a cell that has been set, plus one.

clear(): Clears the contents of this matrix.

default_value: T | None: The default value to return if cells are accessed before they are set.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.

getsizeof() → int: Calculates the approximate memory footprint of this matrix in bytes.

items() → a set-like object providing a view on D's items

keys() → a set-like object providing a view on D's keys

num_cols: int | None: The number of columns in this matrox, or None if there is no bound.

num_filled_elements() → int: Counts the number of elements in this matrix that have been explicitly set.

num_rows: int | None: The number of rows in this matrix, or None if there is no bound.

rows: Dict[int, SparseMatrixRow]: The rows of this matrix.

shape() → Tuple[int, int]: Returns the (number of rows, number of columns) of this matrix.

values() → an object providing a view on D's values

Tempfile

class graphtage.utils.Tempfile(contents: bytes, prefix: str | None = None, suffix: str | None = None)

Bases: object

Creates a temporary file containing a given byte sequence.

The file will automatically be cleaned up after its use. This is useful for interacting with functions and libraries that only accept a path and not a stream object.

Examples

>>> from graphtage.utils import Tempfile
>>> with Tempfile(b"foo") as tmp:
...     print(tmp)
/var/folders/bs/hrvzrctx6tg2_j17gb6wckph0000gn/T/tmpkza5fvr_
>>> with Tempfile(b"foo") as tmp:
...     with open(tmp, 'r') as f:
...         print(f.read())
foo

__enter__() → str: Constructs a tempfile, returning the path to the file.

__exit__(type, value, traceback): Automatically cleans up the tempfile.

__init__(contents: bytes, prefix: str | None = None, suffix: str | None = None)

Initializes a Tempfile

Parameters:

contents – The contents to be populated in the file.
prefix – An optional prefix for the filename.
suffix – An optional suffix for the filename.

utils functions

getsizeof

graphtage.utils.getsizeof(obj) → int

A function to calculate the memory footprint of an object.

If the object implements the Sized protocol (i.e., if it implements a getsizeof method), then this is used:

if hasattr(obj, 'getsizeof'):
    return obj.getsizeof()

If the object is a list or tuple, then:

return sys.getsizeof(obj) + sum(getsizeof(i) for i in obj)

If the object is a dict, then:

return sys.getsizeof(obj) + sum(getsizeof(key) + getsizeof(value) for key, value in obj.items())

Otherwise:

return sys.getsizeof(obj)

Parameters:: obj – The object to measure.
Returns:: An approximation of the memory footprint of the object, in bytes.
Return type:: int

largest

graphtage.utils.largest(*sequence: T | Iterable[T], n: int = 1, key: Callable[[T], Any] | None = None) → Iterator[T]

smallest

graphtage.utils.smallest(*sequence: T | Iterable[T], n: int = 1, key: Callable[[T], Any] | None = None) → Iterator[T]