API reference

class Automaton

Automata can be used to efficiently apply complex search patterns to the keys of maps and sets. Use one of the classmethods never(), always(), str(), or subsequence() to create a new automaton. Add more complex behavior on top with starts_with(), complement(), intersection(), or union(). E.g., an automaton that matches keys that start with b"foo" or b"bar":

a_foo = Automaton.str(b"foo")
a_bar = Automaton.str(b"bar")
a_foobar = a_foo.union(a_bar).starts_with()
classmethod always(cls) Automaton

Create a new Automaton that always matches.

classmethod never(cls) Automaton

Create a new Automaton that never matches.

classmethod str(cls, str: bytes) Automaton

Create a new Automaton that matches str exactly.

classmethod subsequence(cls, str: bytes) Automaton

Create a new Automaton that subsequence matches str. E.g., b"bd" matches the key b"abcde".

classmethod hamming_subsequence(cls, str: bytes, distance: int) Automaton

Create a new Automaton that subsequence matches str. if bytes are within the given hamming distance. E.g., both b"be" and b"bf" match the key b"abceg" if distance is 1. With distance 0, only b"be" would match.

complement(self) Automaton

Modify this automaton to match any key that would previously not match. Returns self to allow chaining with other methods.

starts_with(self) Automaton

Modify this automaton to match any key that starts with a prefix that previously matched, e.g., if self matched b"abc", it will now match b"abcde". Returns self to allow chaining with other methods.

intersection(self, other: Automaton) Automaton

Modify this automaton to match any key that both self and other matches. other must be an instance of Automaton. Returns self to allow chaining with other methods.

union(self, other: Automaton) Automaton

Modify this automaton to match any key that either self or other matches. other must be an instance of Automaton. Returns self to allow chaining with other methods.

class Buffer

A read-only buffer returned by Map.build() and Set.build() when Path is ":memory:". Use to create new Map or Set instances, or write to file:

from ducer import Set
buf = Set.build([b"a", b"b"], ":memory:")
s = Set(buf)
for k in s:
    print(k)
with open("my.set", "wb") as f:
    f.write(buf)
class Map(data: SupportsBytes)

An immutable map of bytes keys and non-negative integers, based on finite-state-transducers. Typically uses a fraction of the memory as the builtin dict and can be streamed from a file.

data can be any object that supports the buffer protocol, e.g., Buffer, bytes, memoryview, mmap, etc. Use Map.build() to create suitable data.

Important

data needs to be contiguous.

To the extent that it’s feasible, ducer maps are intended to be direct replacements for the builtin dict. For m, o: Map and k: bytes, the following works as intended:

k in m
m == o
m[k]
m.get(k)
m.get(k, 42)
len(m)
for k in m:
    pass
for k in m.keys():
    pass
for v in m.values():
    pass
for k, v in m.items():
    pass

Since maps are immutable, the following are not implemented:

  • clear

  • fromkeys

  • pop

  • popitem

  • setdefault

  • update, |=

Further, the |, &, -, ^ operators are also not implemented, since it is not possible to specify the storage path. Use Map.union(), Map.intersection(), Map.difference(), and Map.symmetric_difference() instead.

classmethod build(cls, path: str | Path, iterable: Iterable[Tuple[SupportsBytes, SupportsInt]]) Buffer | None

Build a map from an iterable of items (key: bytes, value: int) and write it to the given path. If path is ":memory:", returns a Buffer containing the map data. path can be str or Path.

Hint

Items can really be any sequence of length 2, but building from tuple is fastest. However, avoid converting items in Python for best performance. Ideally, create tuples directly, e.g., if using msgpack, set use_list=False for msgpack.unpackb() or msgpack.Unpacker.

copy(self) Map

Since maps are immutable, returns self.

get(self, key, default=None) int | None

Returns the given key if present, default otherwise.

keys(self) Iterator[bytes]

Iterate over all keys.

values(self) Iterator[int]

Iterate over all values.

items(self) Iterator[Tuple[bytes, int]]

Iterate over all key-value items.

range(self, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]

Iterate over all key-value items with optional range limits for the key ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than). If no limits are given this is equivalent to iter(self).

starts_with(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]

Iterate over all key-value items whose key starts with str. Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

subsequence(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]

Iterate over all key-value items whose key contain the subsequence str. Keys don’t need to contain the subsequence consecutively, e.g., b"bd" will match the key b"abcde". Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

search(self, automaton: Automaton, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]

Iterate over all key-value items whose key matches the given Automaton. Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

difference(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None

Build a new map that is the difference between self and all others, meaning the resulting map will contain all keys that are in self, but not in others. others must be instances of Map. select specifies how conflicts are resolved if keys are present more than once. If Path is ":memory:", returns a Buffer containing the map data instead of writing to path. Path can be str or Path.

intersection(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None

Build a new map that is the intersection of self and others. others must be instances of Map. select specifies how conflicts are resolved if keys are present more than once. If Path is ":memory:", returns a Buffer containing the map data instead of writing to path. Path can be str or Path.

symmetric_difference(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None

Build a new map that is the symmetric difference between self and others. The resulting map will contain all keys that appear an odd number of times, i.e., if only one other map is given, it will contain all keys that are in either self or others, but not in both. others must be instances of Map. select specifies how conflicts are resolved if keys are present more than once. If path is ":memory:", returns a Buffer containing the map data instead of writing to path. path can be str or Path.

union(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None

Build a new map that is the union of self and others. others must be instances of Map. select specifies how conflicts are resolved if keys are present more than once. If path is ":memory:", returns a Buffer containing the map data instead of writing to path. path can be str or Path.

class Op

Conflict resolution strategies for set operations on maps.

Avg

Select average, i.e., sum(values) // len.

First

Select first value.

Last

Select last value.

Max

Select maximum.

Median

Select median, i.e., with values = sorted(values) and mid = len // 2, select values[mid] for odd length, and (values[mid-1] + values[mid]) // 2 for even length.

Mid

Select middle value, i.e., values[len // 2].

Min

Select minimum.

class Set(data: SupportsBytes)

An immutable set of bytes keys, based on finite-state-transducers. Typically uses a fraction of the memory as the builtin set and can be streamed from a file.

data can be any object that supports the buffer protocol, e.g., Buffer, bytes, memoryview, mmap, etc. Use Map.build() to create suitable data.

Important

data needs to be contiguous.

To the extent that it’s feasible, ducer sets are intended to be direct replacements for the builtin set. For s, o: Set, and k: bytes, the following works as intended:

k in s
s == o
len(s)
for k in s:
    pass
s.isdisjoint(o)
s.issubset(o)
s <= o  # subset
s < o  # proper subset
s.issuperset(o)
s >= o  # superset
s > o  # proper superset

Since sets are immutable, the following are not implemented:

  • add

  • clear

  • difference_update, -=

  • discard

  • intersection_update, &=

  • pop

  • remove

  • symmetric_difference_update, ^=

  • update, |=

Further, the |, &, -, ^ operators are also not implemented, since it is not possible to specify the storage path. Use Set.union(), Set.intersection(), Set.difference(), and Set.symmetric_difference() instead.

classmethod build(cls, path: str | Path, iterable: Iterable[SupportsBytes]) Buffer | None

Build a set from an iterable of bytes and write it to the given path. If path is ":memory:", returns a Buffer containing the set data. path can be str or Path.

copy(self) Set

Since sets are immutable, returns self.

keys(self) Iterator[bytes]

Iterate over all keys.

range(self, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]

Iterate over all keys with optional range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than). If no limits are given this is equivalent to iter(self).

starts_with(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]

Iterate over all keys that start with str. Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

subsequence(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]

Iterate over all keys that contain the subsequence str. Keys don’t need to contain the subsequence consecutively, e.g., b"bd" will match the key b"abcde". Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

search(self, automaton: Automaton, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]

Iterate over all keys that match the given Automaton. Optionally apply range limits ge (greater than or equal), gt (greater than), le (less than or equal), and lt (less than).

difference(self, path: str | Path, *others: Set) Buffer | None

Build a new set that is the difference between self and all others, meaning the resulting set will contain all keys that are in self, but not in others. others must be instances of Set. If path is ":memory:", returns a Buffer containing the set data instead of writing to path. path can be str or Path.

intersection(self, path: str | Path, *others: Set) Buffer | None

Build a new set that is the intersection of self and others. others must be instances of Set. If path is ":memory:", returns a Buffer containing the set data instead of writing to path. path can be str or Path.

symmetric_difference(self, path: str | Path, *others: Set) Buffer | None

Build a new set that is the symmetric difference between self and others, The resulting set will contain all keys that appear an odd number of times, i.e., if only one other set is given, it will contain all keys that are in either self or others, but not in both. others must be instances of Set. If path is ":memory:", returns a Buffer containing the set data instead of writing to path. path can be str or Path.

union(self, path: str | Path, *others: Set) Buffer | None

Build a new set that is the union of self and others. others must be instances of Set. If path is ":memory:", returns a Buffer containing the set data instead of writing to path. path can be str or Path.