superduperdb.backends package#



superduperdb.backends.query_dataset module#

class superduperdb.backends.query_dataset.CachedQueryDataset(select: Select, keys=None, fold='train', suppress=(), transform=None, database=None, prefetch_size: int = 100)[source]#

Bases: object

This class which fetch the document corresponding to the given index. This class prefetches documents from database and stores in the memory.

This can drastically reduce database read operations and hence reduce the overall load on the database.

count_documents() int[source]#

Return the number of matching documents

property database#
class superduperdb.backends.query_dataset.ExpiryCache(iterable=(), /)[source]#

Bases: list

class superduperdb.backends.query_dataset.QueryDataset(select: Select, keys: List[str] | None = None, fold: str | None = 'train', suppress: Sequence[str] = (), transform: Callable | None = None, db=None, ids: List[str] | None = None, in_memory: bool = True, extract: str | None = None, **kwargs)[source]#

Bases: object

A dataset class which can be used to define a torch dataset class.

  • select – A select query object which defines the query to be executed.

  • keys – A list of keys to be returned from the dataset.

  • fold – The fold to be used for the dataset.

  • suppress – A list of keys to be suppressed from the dataset.

  • transform – A callable which can be used to transform the dataset.

  • db – A DB object to be used for the dataset.

  • ids – A list of ids to be used for the dataset.

  • in_memory – A boolean flag to indicate if the dataset should be loaded in memory.

  • extract – A key to be extracted from the dataset.

property database#
superduperdb.backends.query_dataset.query_dataset_factory(data_prefetch: bool = False, **kwargs)[source]#

Module contents#