2. dossier.store — store feature collections

A simple storage interface for feature collections.

mod:dossier.store provides a convenient interface to a mod:kvlayer table for storing dossier.fc.FeatureCollection. The interface consists of methods to query, search, add and remove feature collections from the store. It also provides functions for defining and searching indexes.

Using a storage backend in your code requires a working kvlayer configuration, which is usually written in a YAML file like so:

kvlayer:
  app_name: store
  namespace: dossier
  storage_type: redis
  storage_addresses: ["redis.example.com:6379"]

And here’s a full working example that uses local memory to store feature collections:

from dossier.fc import FeatureCollection
from dossier.store import Store
import kvlayer
import yakonfig

yaml = """
kvlayer:
  app_name: store
  namespace: dossier
  storage_type: local
"""
with yakonfig.defaulted_config([kvlayer], yaml=yaml):
    store = Store(kvlayer.client())

    fc = FeatureCollection({u'NAME': {'Foo': 1, 'Bar': 2}})
    store.put([('1', fc)])
    print store.get('1')

See the documentation for yakonfig for more details on the configuration setup.

Another example showing how to store, retrieve and delete a feature collection:

fc = dossier.fc.FeatureCollection()
fc[u'NAME'][u'foo'] += 1
fc[u'NAME'][u'bar'] = 42

kvl = kvlayer.client()
store = dossier.store.Store(kvl)

store.put('{yourid}', fc)
assert store.get('{yourid}')[u'NAME'][u'bar'] == 42
store.delete('{yourid}')
assert store.get('{yourid}') is None

Here is another example that demonstrates use of indexing to enable a poor man’s case insensitive search:

fc = dossier.fc.FeatureCollection()
fc[u'NAME'][u'foo'] += 1
fc[u'NAME'][u'bar'] = 42

kvl = kvlayer.client()
store = dossier.store.Store(kvl)

# Index transforms must be defined on every instance of `Store`.
# (The index data is persisted; the transforms themselves are
# ephemeral.)
store.define_index(u'name_casei',
                   create=feature_index(u'NAME'),
                   transform=lambda s: s.lower().encode('utf-8'))

store.put('{yourid}', fc)  # `put` automatically updates indexes.
assert list(store.index_scan(u'name_casei', 'FoO'))[0] == '{yourid}'
class dossier.store.Store(kvlclient, impl=None, feature_indexes=None)[source]

A feature collection database.

A feature collection database stores feature collections for content objects like profiles from external knowledge bases.

Every feature collection is keyed by its content_id, which is a byte string. The value of a content_id is specific to the type of content represented by the feature collection. In other words, its representation is unspecified.

__init__(kvlclient, impl=None, feature_indexes=None)[source]

Connects to a feature collection store.

This also initializes the underlying kvlayer namespace.

Parameters:kvl (kvlayer.AbstractStorage) – kvlayer storage client
Return type:Store
get(content_id)[source]

Retrieve a feature collection from the store. This is the same as get_many([content_id])

If the feature collection does not exist None is returned.

Return type:dossier.fc.FeatureCollection
get_many(content_id_list)[source]

Yield (content_id, data) tuples for ids in list.

As with get(), if a content_id in the list is missing, then it is yielded with a data value of None.

Return type:yields tuple(str, dossier.fc.FeatureCollection)
put(items, indexes=True)[source]

Add feature collections to the store.

Given an iterable of tuples of the form (content_id, feature collection), add each to the store and overwrite any that already exist.

This method optionally accepts a keyword argument indexes, which by default is set to True. When it is True, it will create new indexes for each content object for all indexes defined on this store.

Note that this will not update existing indexes. (There is currently no way to do this without running some sort of garbage collection process.)

Parameters:items (iterable) – iterable of (content_id, FeatureCollection).
delete(content_id)[source]

Delete a feature collection from the store.

Deletes the content item from the store with identifier content_id.

Parameters:content_id (str) – identifier for the content object represented by a feature collection
delete_all()[source]

Deletes all storage.

This includes every content object and all index data.

scan(*key_ranges)[source]

Retrieve feature collections in a range of ids.

Returns a generator of content objects corresponding to the content identifier ranges given. key_ranges can be a possibly empty list of 2-tuples, where the first element of the tuple is the beginning of a range and the second element is the end of a range. To specify the beginning or end of the table, use an empty tuple ().

If the list is empty, then this yields all content objects in the storage.

Parameters:key_ranges – as described in kvlayer._abstract_storage.AbstractStorage()
Return type:generator of (content_id, dossier.fc.FeatureCollection).
scan_ids(*key_ranges)[source]

Retrieve content ids in a range of ids.

Returns a generator of content_id corresponding to the content identifier ranges given. key_ranges can be a possibly empty list of 2-tuples, where the first element of the tuple is the beginning of a range and the second element is the end of a range. To specify the beginning or end of the table, use an empty tuple ().

If the list is empty, then this yields all content ids in the storage.

Parameters:key_ranges – as described in kvlayer._abstract_storage.AbstractStorage()
Return type:generator of content_id
scan_prefix(prefix)[source]

Returns a generator of content objects matching a prefix.

The prefix here is a prefix for content_id.

Return type:generator of (content_id, dossier.fc.FeatureCollection).
scan_prefix_ids(prefix)[source]

Returns a generator of content ids matching a prefix.

The prefix here is a prefix for content_id.

Return type:generator of content_id

Methods for indexing:

index_scan(idx_name, val)[source]

Returns ids that match an indexed value.

Returns a generator of content identifiers that have an entry in the index idx_name with value val (after index transforms are applied).

If the index named by idx_name is not registered, then a KeyError is raised.

Parameters:
  • idx_name (unicode) – name of index
  • val (unspecified (depends on the index, usually unicode)) – the value to use to search the index
Return type:

generator of content_id

Raises:

KeyError

index_scan_prefix(idx_name, val_prefix)[source]

Returns ids that match a prefix of an indexed value.

Returns a generator of content identifiers that have an entry in the index idx_name with prefix val_prefix (after index transforms are applied).

If the index named by idx_name is not registered, then a KeyError is raised.

Parameters:
  • idx_name (unicode) – name of index
  • val_prefix – the value to use to search the index
Return type:

generator of content_id

Raises:

KeyError

index_scan_prefix_and_return_key(idx_name, val_prefix)[source]

Returns ids that match a prefix of an indexed value, and the specific key that matched the search prefix.

Returns a generator of (index key, content identifier) that have an entry in the index idx_name with prefix val_prefix (after index transforms are applied).

If the index named by idx_name is not registered, then a KeyError is raised.

Parameters:
  • idx_name (unicode) – name of index
  • val_prefix – the value to use to search the index
Return type:

generator of (index key, content_id)

Raises:

KeyError

define_index(idx_name, create, transform)[source]

Add an index to this store instance.

Adds an index transform to the current FC store. Once an index with name idx_name is added, it will be available in all index_* methods. Additionally, the index will be automatically updated on calls to put().

If an index with name idx_name already exists, then it is overwritten.

Note that indexes do not persist. They must be re-defined for each instance of Store.

For example, to add an index on the boNAME feature, you can use the feature_index helper function:

store.define_index('boNAME',
                   feature_index('boNAME'),
                   lambda s: s.encode('utf-8'))

Another example for creating an index on names:

store.define_index('NAME',
                   feature_index('canonical_name', 'NAME'),
                   lambda s: s.lower().encode('utf-8'))
Parameters:
  • idx_name (unicode) – The name of the index. Must be UTF-8 encodable.
  • create – A function that accepts the transform function and a pair of (content_id, fc) and produces a generator of index values from the pair given using transform.
  • transform – A function that accepts an arbitrary value and applies a transform to it. This transforms the stored value to the index value. This must produce a value with type str (or bytes).
dossier.store.feature_index(*feature_names)[source]

Returns a index creation function.

Returns a valid index create function for the feature names given. This can be used with the Store.define_index() method to create indexes on any combination of features in a feature collection.

Return type:(val -> index val) -> (content_id, FeatureCollection) -> generator of [index val]