graphtage.utils

Generic utility functions and classes.

utils classes

HashableCounter

class graphtage.utils.HashableCounter(*args, **kwargs)

Bases: Generic[graphtage.utils.T], Counter[graphtage.utils.T], collections.Counter

A Counter that supports being hashed even though it is mutable.

__add__(other)

Add counts from two counters.

>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})
__and__(other)

Intersection is the minimum of corresponding counts.

>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})
__delitem__(elem)

Like dict.__delitem__() but does not raise KeyError for missing values.

__iadd__(other)

Inplace add from another counter, keeping only positive counts.

>>> c = Counter('abbb')
>>> c += Counter('bcc')
>>> c
Counter({'b': 4, 'c': 2, 'a': 1})
__iand__(other)

Inplace intersection is the minimum of corresponding counts.

>>> c = Counter('abbb')
>>> c &= Counter('bcc')
>>> c
Counter({'b': 1})
__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

__ior__(other)

Inplace union is the maximum of value from either counter.

>>> c = Counter('abbb')
>>> c |= Counter('bcc')
>>> c
Counter({'b': 3, 'c': 2, 'a': 1})
__isub__(other)

Inplace subtract counter, but keep only results with positive counts.

>>> c = Counter('abbbc')
>>> c -= Counter('bccd')
>>> c
Counter({'b': 2, 'a': 1})
__missing__(key)

The count of elements not in the Counter is zero.

__neg__()

Subtracts from an empty counter. Strips positive and zero counts, and flips the sign on negative counts.

__or__(other)

Union is the maximum of value in either of the input counters.

>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})
__pos__()

Adds an empty counter, effectively stripping negative and zero counts

__sub__(other)

Subtract count, but keep only results with positive counts.

>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})
_keep_positive()

Internal method to strip elements with a negative or zero count

clear() → None. Remove all items from D.
copy()

Return a shallow copy.

elements() → Iterator

Iterator over elements repeating each as many times as its count.

Examples:

>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']

# Knuth's example for prime factors of 1836:  2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements():     # loop over factors
...     product *= factor                       # and multiply them
>>> product
1836

Note

If an element’s count has been set to zero or is a negative number, elements() will ignore it.

classmethod fromkeys(iterable, v=None)

Create a new dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
most_common(n=None)

List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

subtract(iterable=None, /, **kwds)

Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.subtract('witch')             # subtract elements from another iterable
>>> c.subtract(Counter('watch'))    # subtract elements from another counter
>>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
-1
update(iterable=None, /, **kwds)

Like dict.update() but add counts instead of replacing them.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.update('witch')           # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d)                 # add elements from another counter
>>> c['h']                      # four 'h' in which, witch, and watch
4
values() → an object providing a view on D’s values

OrderedCounter

class graphtage.utils.OrderedCounter(iterable=None, /, **kwds)

Bases: collections.Counter, collections.OrderedDict

A counter that remembers the order elements are first encountered

__add__(other)

Add counts from two counters.

>>> Counter('abbb') + Counter('bcc')
Counter({'b': 4, 'c': 2, 'a': 1})
__and__(other)

Intersection is the minimum of corresponding counts.

>>> Counter('abbb') & Counter('bcc')
Counter({'b': 1})
__delitem__(elem)

Like dict.__delitem__() but does not raise KeyError for missing values.

__iadd__(other)

Inplace add from another counter, keeping only positive counts.

>>> c = Counter('abbb')
>>> c += Counter('bcc')
>>> c
Counter({'b': 4, 'c': 2, 'a': 1})
__iand__(other)

Inplace intersection is the minimum of corresponding counts.

>>> c = Counter('abbb')
>>> c &= Counter('bcc')
>>> c
Counter({'b': 1})
__init__(iterable=None, /, **kwds)

Create a new, empty Counter object. And if given, count elements from an input iterable. Or, initialize the count from another mapping of elements to their counts.

>>> c = Counter()                           # a new, empty counter
>>> c = Counter('gallahad')                 # a new counter from an iterable
>>> c = Counter({'a': 4, 'b': 2})           # a new counter from a mapping
>>> c = Counter(a=4, b=2)                   # a new counter from keyword args
__ior__(other)

Inplace union is the maximum of value from either counter.

>>> c = Counter('abbb')
>>> c |= Counter('bcc')
>>> c
Counter({'b': 3, 'c': 2, 'a': 1})
__isub__(other)

Inplace subtract counter, but keep only results with positive counts.

>>> c = Counter('abbbc')
>>> c -= Counter('bccd')
>>> c
Counter({'b': 2, 'a': 1})
__missing__(key)

The count of elements not in the Counter is zero.

__neg__()

Subtracts from an empty counter. Strips positive and zero counts, and flips the sign on negative counts.

__or__(other)

Union is the maximum of value in either of the input counters.

>>> Counter('abbb') | Counter('bcc')
Counter({'b': 3, 'c': 2, 'a': 1})
__pos__()

Adds an empty counter, effectively stripping negative and zero counts

__sub__(other)

Subtract count, but keep only results with positive counts.

>>> Counter('abbbc') - Counter('bccd')
Counter({'b': 2, 'a': 1})
_keep_positive()

Internal method to strip elements with a negative or zero count

clear() → None. Remove all items from od.
copy()

Return a shallow copy.

elements() → Iterator

Iterator over elements repeating each as many times as its count.

Examples:

>>> c = Counter('ABCABC')
>>> sorted(c.elements())
['A', 'A', 'B', 'B', 'C', 'C']

# Knuth's example for prime factors of 1836:  2**2 * 3**3 * 17**1
>>> prime_factors = Counter({2: 2, 3: 3, 17: 1})
>>> product = 1
>>> for factor in prime_factors.elements():     # loop over factors
...     product *= factor                       # and multiply them
>>> product
1836

Note

If an element’s count has been set to zero or is a negative number, elements() will ignore it.

classmethod fromkeys(iterable, v=None)

Create a new ordered dictionary with keys from iterable and values set to value.

get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
most_common(n=None)

List the n most common elements and their counts from the most common to the least. If n is None, then list all element counts.

>>> Counter('abracadabra').most_common(3)
[('a', 5), ('b', 2), ('r', 2)]
move_to_end(key, last=True)

Move an existing element to the end (or beginning if last is false).

Raise KeyError if the element does not exist.

pop(k[, d]) → v, remove specified key and return the corresponding

value. If key is not found, d is returned if given, otherwise KeyError is raised.

popitem(last=True)

Remove and return a (key, value) pair from the dictionary.

Pairs are returned in LIFO order if last is true or FIFO order if false.

setdefault(key, default=None)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

subtract(iterable=None, /, **kwds)

Like dict.update() but subtracts counts instead of replacing them. Counts can be reduced below zero. Both the inputs and outputs are allowed to contain zero and negative counts.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.subtract('witch')             # subtract elements from another iterable
>>> c.subtract(Counter('watch'))    # subtract elements from another counter
>>> c['h']                          # 2 in which, minus 1 in witch, minus 1 in watch
0
>>> c['w']                          # 1 in which, minus 1 in witch, minus 1 in watch
-1
update(iterable=None, /, **kwds)

Like dict.update() but add counts instead of replacing them.

Source can be an iterable, a dictionary, or another Counter instance.

>>> c = Counter('which')
>>> c.update('witch')           # add elements from another iterable
>>> d = Counter('watch')
>>> c.update(d)                 # add elements from another counter
>>> c['h']                      # four 'h' in which, witch, and watch
4
values() → an object providing a view on D’s values

Sized

class graphtage.utils.Sized(*args, **kwargs)

Bases: Protocol

A protocol for objects that have a getsizeof function.

__init__(*args, **kwargs)

Initialize self. See help(type(self)) for accurate signature.

getsizeof: Callable[], int]

Returns the size of this object.

SparseMatrix

class graphtage.utils.SparseMatrix(num_rows: Optional[int] = None, num_cols: Optional[int] = None, default_value: Optional[T] = None)

Bases: graphtage.utils.Sized, Generic[graphtage.utils.T], Mapping[int, MutableMapping[int, Optional[graphtage.utils.T]]]

A sparse matrix that can store arbitrary Python objects.

For sparse matrices storing homogeneous items and/or native types, it is more efficient to use an implementation like a scipy sparse matrix.

class SparseMatrixRow(row_num: int, num_cols: Optional[int] = None, default_value: Optional[T] = None)

Bases: graphtage.utils.Sized, MutableMapping[int, Optional[graphtage.utils.T]]

A row of a sparse matrix.

__getitem__(col: int) → Optional[T]

Returns the value of the given column of this row.

Parameters

col – The index of the column to get.

Returns

The value of the column, or the default value if it has not yet been set.

Return type

Optional[T]

Raises

IndexError – If self.num_cols is not None and col is greater than or equal to it.

__init__(row_num: int, num_cols: Optional[int] = None, default_value: Optional[T] = None)

Initializes a sparse matrix row.

Parameters
  • row_num – The index of this row.

  • num_cols – An optional number of columns in this row. If omitted, this row will be unbounded and allow insertion and access at any column index.

  • default_value – An optional default value to use if a column is accessed before it is assigned.

__iter__() → Iterator[int]

Iterates over the indexes of columns that have been set in this row.

__len__()int

Returns the number of cells defined this row.

clear()

Clears the contents of this row.

default_value: Optional[T]

The default value for this row.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getsizeof()int

Returns an approximation of the number of bytes used by this row in memory.

This is equivalent to:

return sys.getsizeof(self) + getsizeof(self.row)
items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
num_cols: Optional[int]

The number of columns in this row.

pop(k[, d]) → v, remove specified key and return the corresponding value.

If key is not found, d is returned if given, otherwise KeyError is raised.

popitem() → (k, v), remove and return some (key, value) pair

as a 2-tuple; but raise KeyError if D is empty.

row: Dict[int, Optional[T]]

Data structure holding the contents of this row.

row_num: int

The index of this row.

setdefault(k[, d]) → D.get(k,d), also set D[k]=d if k not in D
shape()int

The number of columns in this row.

This is equivalent to:

if self.num_cols is None:
    if self.row:
        return max(self.row.keys()) + 1
    else:
        return 0
else:
    return self.num_cols
update([E, ]**F) → None. Update D from mapping/iterable E and F.

If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v

values() → an object providing a view on D’s values
__getitem__(row: int) → MutableMapping[int, Optional[T]]

Returns the value of the given row of this matrix.

Parameters

row – The index of the row to get.

Returns

The contents of the row.

Return type

MutableMapping[int, Optional[T]]

Raises

IndexError – If self.num_rows is not None and row is greater than or equal to it.

__init__(num_rows: Optional[int] = None, num_cols: Optional[int] = None, default_value: Optional[T] = None)

Initializes a sparse matrix.

Parameters
  • num_rows – An optional bound on the number of rows.

  • num_cols – An optional bound on the number of columns.

  • default_value – An optional default value to return if cells are accessed before they are set.

__len__()int

Returns the number of rows in this matrix.

If self.num_rows is None, this will return the maximum row index of a cell that has been set, plus one.

clear()

Clears the contents of this matrix.

default_value: Optional[T]

The default value to return if cells are accessed before they are set.

get(k[, d]) → D[k] if k in D, else d. d defaults to None.
getsizeof()int

Calculates the approximate memory footprint of this matrix in bytes.

items() → a set-like object providing a view on D’s items
keys() → a set-like object providing a view on D’s keys
num_cols: Optional[int]

The number of columns in this matrox, or None if there is no bound.

num_filled_elements()int

Counts the number of elements in this matrix that have been explicitly set.

num_rows: Optional[int]

The number of rows in this matrix, or None if there is no bound.

rows: Dict[int, graphtage.utils.SparseMatrix.SparseMatrixRow]

The rows of this matrix.

shape() → Tuple[int, int]

Returns the (number of rows, number of columns) of this matrix.

values() → an object providing a view on D’s values

Tempfile

class graphtage.utils.Tempfile(contents: bytes, prefix: Optional[str] = None, suffix: Optional[str] = None)

Bases: object

Creates a temporary file containing a given byte sequence.

The file will automatically be cleaned up after its use. This is useful for interacting with functions and libraries that only accept a path and not a stream object.

Examples

>>> from graphtage.utils import Tempfile
>>> with Tempfile(b"foo") as tmp:
...     print(tmp)
/var/folders/bs/hrvzrctx6tg2_j17gb6wckph0000gn/T/tmpkza5fvr_
>>> with Tempfile(b"foo") as tmp:
...     with open(tmp, 'r') as f:
...         print(f.read())
foo
__enter__()str

Constructs a tempfile, returning the path to the file.

__exit__(type, value, traceback)

Automatically cleans up the tempfile.

__init__(contents: bytes, prefix: Optional[str] = None, suffix: Optional[str] = None)

Initializes a Tempfile

Parameters
  • contents – The contents to be populated in the file.

  • prefix – An optional prefix for the filename.

  • suffix – An optional suffix for the filename.

utils functions

getsizeof

graphtage.utils.getsizeof(obj)int

A function to calculate the memory footprint of an object.

If the object implements the Sized protocol (i.e., if it implements a getsizeof method), then this is used:

if hasattr(obj, 'getsizeof'):
    return obj.getsizeof()

If the object is a list or tuple, then:

return sys.getsizeof(obj) + sum(getsizeof(i) for i in obj)

If the object is a dict, then:

return sys.getsizeof(obj) + sum(getsizeof(key) + getsizeof(value) for key, value in obj.items())

Otherwise:

return sys.getsizeof(obj)
Parameters

obj – The object to measure.

Returns

An approximation of the memory footprint of the object, in bytes.

Return type

int

largest

graphtage.utils.largest(*sequence: Union[T, Iterable[T]], n: int = 1, key: Optional[Callable[[T], Any]] = None) → Iterator[T]

smallest

graphtage.utils.smallest(*sequence: Union[T, Iterable[T]], n: int = 1, key: Optional[Callable[[T], Any]] = None) → Iterator[T]