API reference¶
- class Automaton¶
Automata can be used to efficiently apply complex search patterns to the keys of maps and sets. Use one of the classmethods
never(),always(),str(), orsubsequence()to create a new automaton. Add more complex behavior on top withstarts_with(),complement(),intersection(), orunion(). E.g., an automaton that matches keys that start withb"foo"orb"bar":a_foo = Automaton.str(b"foo") a_bar = Automaton.str(b"bar") a_foobar = a_foo.union(a_bar).starts_with()
- classmethod subsequence(cls, str: bytes) Automaton¶
Create a new
Automatonthat subsequence matchesstr. E.g.,b"bd"matches the keyb"abcde".
- classmethod hamming_subsequence(cls, str: bytes, distance: int) Automaton¶
Create a new
Automatonthat subsequence matchesstr. if bytes are within the given hamming distance. E.g., bothb"be"andb"bf"match the keyb"abceg"if distance is 1. With distance 0, onlyb"be"would match.
- complement(self) Automaton¶
Modify this automaton to match any key that would previously not match. Returns
selfto allow chaining with other methods.
- starts_with(self) Automaton¶
Modify this automaton to match any key that starts with a prefix that previously matched, e.g., if
selfmatchedb"abc", it will now matchb"abcde". Returnsselfto allow chaining with other methods.
- class Buffer¶
A read-only buffer returned by
Map.build()andSet.build()whenPathis":memory:". Use to create newMaporSetinstances, or write to file:from ducer import Set buf = Set.build([b"a", b"b"], ":memory:") s = Set(buf) for k in s: print(k) with open("my.set", "wb") as f: f.write(buf)
- class Map(data: SupportsBytes)¶
An immutable map of bytes keys and non-negative integers, based on finite-state-transducers. Typically uses a fraction of the memory as the builtin
dictand can be streamed from a file.datacan be any object that supports the buffer protocol, e.g.,Buffer,bytes,memoryview,mmap, etc. UseMap.build()to create suitabledata.Important
dataneeds to be contiguous.To the extent that it’s feasible, ducer maps are intended to be direct replacements for the builtin
dict. Form, o: Mapandk: bytes, the following works as intended:k in m m == o m[k] m.get(k) m.get(k, 42) len(m) for k in m: pass for k in m.keys(): pass for v in m.values(): pass for k, v in m.items(): pass
Since maps are immutable, the following are not implemented:
clearfromkeyspoppopitemsetdefaultupdate,|=
Further, the
|,&,-,^operators are also not implemented, since it is not possible to specify the storage path. UseMap.union(),Map.intersection(),Map.difference(), andMap.symmetric_difference()instead.- classmethod build(cls, path: str | Path, iterable: Iterable[Tuple[SupportsBytes, SupportsInt]]) Buffer | None¶
Build a map from an iterable of items
(key: bytes, value: int)and write it to the given path. Ifpathis":memory:", returns aBuffercontaining the map data.pathcan bestrorPath.Hint
Items can really be any sequence of length 2, but building from
tupleis fastest. However, avoid converting items in Python for best performance. Ideally, create tuples directly, e.g., if using msgpack, setuse_list=Falseformsgpack.unpackb()ormsgpack.Unpacker.
- range(self, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]¶
Iterate over all key-value items with optional range limits for the key
ge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than). If no limits are given this is equivalent toiter(self).
- starts_with(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]¶
Iterate over all key-value items whose key starts with
str. Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- subsequence(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]¶
Iterate over all key-value items whose key contain the subsequence
str. Keys don’t need to contain the subsequence consecutively, e.g.,b"bd"will match the keyb"abcde". Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- search(self, automaton: Automaton, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[Tuple[bytes, int]]¶
Iterate over all key-value items whose key matches the given
Automaton. Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- difference(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None¶
Build a new map that is the difference between
selfand allothers, meaning the resulting map will contain all keys that are inself, but not inothers.othersmust be instances ofMap.selectspecifies how conflicts are resolved if keys are present more than once. IfPathis":memory:", returns aBuffercontaining the map data instead of writing to path. Path can bestrorPath.
- intersection(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None¶
Build a new map that is the intersection of
selfandothers.othersmust be instances ofMap.selectspecifies how conflicts are resolved if keys are present more than once. IfPathis":memory:", returns aBuffercontaining the map data instead of writing to path.Pathcan bestrorPath.
- symmetric_difference(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None¶
Build a new map that is the symmetric difference between
selfandothers. The resulting map will contain all keys that appear an odd number of times, i.e., if only one other map is given, it will contain all keys that are in eitherselforothers, but not in both.othersmust be instances ofMap.selectspecifies how conflicts are resolved if keys are present more than once. Ifpathis":memory:", returns aBuffercontaining the map data instead of writing to path.pathcan bestrorPath.
- union(self, path: str | Path, *others: Map, select: Op = Op.Last) Buffer | None¶
Build a new map that is the union of
selfandothers.othersmust be instances ofMap.selectspecifies how conflicts are resolved if keys are present more than once. Ifpathis":memory:", returns aBuffercontaining the map data instead of writing to path.pathcan bestrorPath.
- class Op¶
Conflict resolution strategies for set operations on maps.
- Avg¶
Select average, i.e.,
sum(values) // len.
- First¶
Select first value.
- Last¶
Select last value.
- Max¶
Select maximum.
- Median¶
Select median, i.e., with
values = sorted(values)andmid = len // 2, selectvalues[mid]for odd length, and(values[mid-1] + values[mid]) // 2for even length.
- Mid¶
Select middle value, i.e.,
values[len // 2].
- Min¶
Select minimum.
- class Set(data: SupportsBytes)¶
An immutable set of bytes keys, based on finite-state-transducers. Typically uses a fraction of the memory as the builtin
setand can be streamed from a file.datacan be any object that supports the buffer protocol, e.g.,Buffer,bytes,memoryview,mmap, etc. UseMap.build()to create suitabledata.Important
dataneeds to be contiguous.To the extent that it’s feasible, ducer sets are intended to be direct replacements for the builtin
set. Fors, o: Set, andk: bytes, the following works as intended:k in s s == o len(s) for k in s: pass s.isdisjoint(o) s.issubset(o) s <= o # subset s < o # proper subset s.issuperset(o) s >= o # superset s > o # proper superset
Since sets are immutable, the following are not implemented:
addcleardifference_update,-=discardintersection_update,&=popremovesymmetric_difference_update,^=update,|=
Further, the
|,&,-,^operators are also not implemented, since it is not possible to specify the storage path. UseSet.union(),Set.intersection(),Set.difference(), andSet.symmetric_difference()instead.- classmethod build(cls, path: str | Path, iterable: Iterable[SupportsBytes]) Buffer | None¶
Build a set from an iterable of
bytesand write it to the given path. Ifpathis":memory:", returns aBuffercontaining the set data.pathcan bestrorPath.
- range(self, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]¶
Iterate over all keys with optional range limits
ge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than). If no limits are given this is equivalent toiter(self).
- starts_with(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]¶
Iterate over all keys that start with
str. Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- subsequence(self, str: bytes, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]¶
Iterate over all keys that contain the subsequence
str. Keys don’t need to contain the subsequence consecutively, e.g.,b"bd"will match the keyb"abcde". Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- search(self, automaton: Automaton, ge: bytes | None = None, gt: bytes | None = None, le: bytes | None = None, lt: bytes | None = None) Iterator[bytes]¶
Iterate over all keys that match the given
Automaton. Optionally apply range limitsge(greater than or equal),gt(greater than),le(less than or equal), andlt(less than).
- difference(self, path: str | Path, *others: Set) Buffer | None¶
Build a new set that is the difference between
selfand allothers, meaning the resulting set will contain all keys that are inself, but not inothers.othersmust be instances ofSet. Ifpathis":memory:", returns aBuffercontaining the set data instead of writing to path.pathcan bestrorPath.
- intersection(self, path: str | Path, *others: Set) Buffer | None¶
Build a new set that is the intersection of
selfandothers.othersmust be instances ofSet. Ifpathis":memory:", returns aBuffercontaining the set data instead of writing to path.pathcan bestrorPath.
- symmetric_difference(self, path: str | Path, *others: Set) Buffer | None¶
Build a new set that is the symmetric difference between
selfandothers, The resulting set will contain all keys that appear an odd number of times, i.e., if only one other set is given, it will contain all keys that are in eitherselforothers, but not in both.othersmust be instances ofSet. Ifpathis":memory:", returns aBuffercontaining the set data instead of writing to path.pathcan bestrorPath.