superduperdb package¶

Subpackages¶

Module contents¶

class superduperdb.CodeModel(identifier: str, artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, signature: ~typing.Literal['*args', '**kwargs', '*args, **kwargs', 'singleton'] = '*args, **kwargs', datatype: ~superduperdb.components.datatype.DataType | ~superduperdb.backends.ibis.field_types.FieldType | None = None, output_schema: ~superduperdb.components.schema.Schema | None = None, flatten: bool = False, model_update_kwargs: ~typing.Dict = <factory>, predict_kwargs: ~typing.Dict = <factory>, compute_kwargs: ~typing.Dict = <factory>, validation: ~superduperdb.components.model.Validation | None = None, metric_values: ~typing.Dict = <factory>, num_workers: int = 0, object: ~superduperdb.base.code.Code)[source]¶

Bases: _ObjectModel

Model component which wraps a Model to become serializable.

Base class for components which can predict.

Parameters:

signature – Model signature.
datatype – DataType instance.
output_schema – Output schema (mapping of encoders).
flatten – Flatten the model outputs.
model_update_kwargs – The kwargs to use for model update.
predict_kwargs – Additional arguments to use at prediction time.
compute_kwargs – Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=…).
validation – The validation Dataset instances to use.
metric_values – The metrics to evaluate on.
object – Code object

full_import_path = 'superduperdb.components.model.CodeModel'¶

classmethod handle_integration(kwargs)[source]¶

Handler integration from ui.

Parameters:: kwargs – integration kwargs

object: Code¶

ui_schema: t.ClassVar[t.List[t.Dict]] = [{'default': 'from superduperdb import code\n\n@code\ndef my_code(x):\n return x\n', 'name': 'object', 'type': 'code'}]¶

class superduperdb.DataType(identifier: str, artifacts: dataclasses.InitVar[Optional[Dict]] = None, *, encoder: Callable | None = None, decoder: Callable | None = None, info: Dict | None = None, shape: Sequence | None = None, directory: str | None = None, encodable: str = 'encodable', bytes_encoding: str | None = BytesEncoding.BYTES, intermidia_type: str | None = 'bytes', media_type: str | None = None)[source]¶

Bases: Component

A data type component that defines how data is encoded and decoded.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
encoder – A callable that converts an encodable object of this encoder to bytes.
decoder – A callable that converts bytes to an encodable object of this encoder.
info – An optional information dictionary.
shape – The shape of the data.
directory – The directory to store file types.
encodable – The type of encodable object (‘encodable’, ‘lazy_artifact’, or ‘file’).
bytes_encoding – The encoding type for bytes (‘base64’ or ‘bytes’).
intermidia_type – Type of the intermidia data [IntermidiaType.BYTES, IntermidiaType.STRING]
media_type – The media type.

__call__(x: Any | None = None, uri: str | None = None) → _BaseEncodable[source]¶

Create an instance of the encodable class.

Parameters:

x – The optional content.
uri – The optional URI.

__post_init__(artifacts)[source]¶

Post-initialization hook.

Parameters:: artifacts – The artifacts.

bytes_encoding: str | None = 'Bytes'¶

bytes_encoding_after_encode(data)[source]¶

Encode the data to base64.

if the bytes_encoding is BASE64 and the intermidia_type is BYTES

Parameters:: data – Encoded data

bytes_encoding_before_decode(data)[source]¶

Encode the data to base64.

if the bytes_encoding is BASE64 and the intermidia_type is BYTES

Parameters:: data – Decoded data

decode_data(item, info: Dict | None = None)[source]¶

Decode the item from bytes.

Parameters:

item – The item to decode.
info – The optional information dictionary.

decoder: Callable | None = None¶

dict()[source]¶: Get the dictionary representation of the object.

directory: str | None = None¶

encodable: str = 'encodable'¶

encode_data(item, info: Dict | None = None)[source]¶

Encode the item into bytes.

Parameters:

item – The item to encode.
info – The optional information dictionary.

encoder: Callable | None = None¶

full_import_path = 'superduperdb.components.datatype.DataType'¶

info: Dict | None = None¶

intermidia_type: str | None = 'bytes'¶

media_type: str | None = None¶

classmethod register_datatype(instance)[source]¶

Parameters:: instance – The datatype instance to register.

registered_types: ClassVar[Dict[str, DataType]] = {'dill': DataType(identifier='dill', encoder=<function dill_encode>, decoder=<function dill_decode>, info=None, shape=None, directory=None, encodable='artifact', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'dill_lazy': DataType(identifier='dill_lazy', encoder=<function dill_encode>, decoder=<function dill_decode>, info=None, shape=None, directory=None, encodable='lazy_artifact', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'file': DataType(identifier='file', encoder=<function file_check>, decoder=<function file_check>, info=None, shape=None, directory=None, encodable='file', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'file_lazy': DataType(identifier='file_lazy', encoder=<function file_check>, decoder=<function file_check>, info=None, shape=None, directory=None, encodable='lazy_file', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'json': DataType(identifier='json', encoder=<function json_encode>, decoder=<function json_decode>, info=None, shape=None, directory=None, encodable='encodable', bytes_encoding=<BytesEncoding.BASE64: 'Str'>, intermidia_type='string', media_type=None), 'pickle': DataType(identifier='pickle', encoder=<function pickle_encode>, decoder=<function pickle_decode>, info=None, shape=None, directory=None, encodable='artifact', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'pickle_lazy': DataType(identifier='pickle_lazy', encoder=<function pickle_encode>, decoder=<function pickle_decode>, info=None, shape=None, directory=None, encodable='lazy_artifact', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None), 'torch': DataType(identifier='torch', encoder=<function torch_encode>, decoder=<function torch_decode>, info=None, shape=None, directory=None, encodable='lazy_artifact', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermidia_type='bytes', media_type=None)}¶

shape: Sequence | None = None¶

type_id: ClassVar[str] = 'datatype'¶

ui_schema: ClassVar[List[Dict]] = [{'choices': ['pickle', 'dill', 'torch'], 'default': 'dill', 'name': 'serializer', 'type': 'string'}, {'name': 'info', 'optional': True, 'type': 'json'}, {'name': 'shape', 'optional': True, 'type': 'json'}, {'name': 'directory', 'optional': True, 'type': 'str'}, {'choices': ['encodable', 'lazy_artifact', 'file'], 'default': 'lazy_artifact', 'name': 'encodable', 'type': 'str'}, {'choices': ['base64', 'bytes'], 'default': 'bytes', 'name': 'bytes_encoding', 'type': 'str'}, {'name': 'media_type', 'optional': True, 'type': 'str'}]¶

class superduperdb.Dataset(identifier: str, artifacts: dataclasses.InitVar[Optional[Dict]] = None, *, select: Select | None = None, sample_size: int | None = None, random_seed: int | None = None, creation_date: str | None = None, raw_data: Sequence[Any] | None = None)[source]¶

Bases: Component

A dataset is an immutable collection of documents.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
select – A query to select the documents for the dataset.
sample_size – The number of documents to sample from the query.
random_seed – The random seed to use for sampling.
creation_date – The date the dataset was created.
raw_data – The raw data for the dataset.

__post_init__(artifacts)[source]¶

Post-initialization method.

Parameters:: artifacts – Optional additional artifacts for initialization.

creation_date: t.Optional[str] = None¶

property data¶: Property representing the dataset’s data.

full_import_path = 'superduperdb.components.dataset.Dataset'¶

init()[source]¶: Initialization method.

pre_create(db: Datalayer) → None[source]¶

Pre-create hook for database operations.

Parameters:: db – The database to use for the operation.

property random¶: Cached property representing the random number generator.

random_seed: t.Optional[int] = None¶

raw_data: t.Optional[t.Sequence[t.Any]] = None¶

sample_size: t.Optional[int] = None¶

select: t.Optional[Select] = None¶

type_id: t.ClassVar[str] = 'dataset'¶

class superduperdb.Document[source]¶

Bases: MongoStyleDict

A wrapper around an instance of dict or a Encodable.

The document data is used to dump that resource to a mix of json-able content, ids and bytes

static decode(r: Dict, db: Datalayer | None = None) → Any[source]¶

Decode the object from a encoded data.

Parameters:

r – Encoded data.
db – Datalayer instance.

encode(schema: Schema | None = None, leaf_types_to_keep: Sequence[Type] = ()) → Dict[source]¶

Make a copy of the content with all the Leaves encoded.

Parameters:

schema – The schema to encode with.
leaf_types_to_keep – The types of leaves to keep.

get_leaves(*leaf_types: str)[source]¶

Get all the leaves in the document.

Parameters:

*leaf_types –

The types of leaves to get.

set_variables(db: Datalayer, **kwargs) → Document[source]¶

Set free variables of self.

Parameters:: db – The datalayer to use.

unpack(db=None, leaves_to_keep: Sequence = ()) → Any[source]¶

Returns the content, but with any encodables replaced by their contents.

Parameters:

db – The datalayer to use.
leaves_to_keep – The types of leaves to keep.

property variables: List[str]¶: Return a list of variables in the object.

superduperdb.Encoder¶: alias of DataType

class superduperdb.Listener(artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, identifier: str = '', key: str | ~typing.List[str] | ~typing.Tuple[~typing.List[str], ~typing.Dict[str, str]], model: ~superduperdb.components.model.Model, select: ~superduperdb.backends.base.query.CompoundSelect, active: bool = True, predict_kwargs: ~typing.Dict | None = <factory>)[source]¶

Bases: Component

Listener component.

Listener object which is used to process a column/key of a collection or table, and store the outputs.

Parameters:

key – Key to be bound to the model.
model – Model for processing data.
select – Object for selecting which data is processed.
active – Toggle to False to deactivate change data triggering.
predict_kwargs – Keyword arguments to self.model.predict().
identifier – A string used to identify the model.

active: bool = True¶

cleanup(database: Datalayer) → None[source]¶

Clean up when the listener is deleted.

Parameters:: database – Data layer instance to process.

classmethod create_output_dest(db: Datalayer, predict_id, model: Model)[source]¶

Create output destination.

Parameters:

db – Data layer instance.
predict_id – Predict ID.
model – Model instance.

property dependencies: List[ComponentTuple]¶: Listener model dependencies.

depends_on(other: Component)[source]¶

Check if the listener depends on another component.

Parameters:: other – Another component.

classmethod from_predict_id(db: Datalayer, predict_id) → Listener[source]¶

Split predict ID.

Parameters:

db – Data layer instance.
predict_id – Predict ID.

full_import_path = 'superduperdb.components.listener.Listener'¶

classmethod handle_integration(kwargs)[source]¶

Method to handle integration.

Parameters:: kwargs – Integration keyword arguments.

property id_key: str¶: Get identifier key.

identifier: str = ''¶

key: str | List[str] | Tuple[List[str], Dict[str, str]]¶

property mapping¶: Mapping property.

model: Model¶

property outputs¶: Get reference to outputs of listener model.

property outputs_key¶: Model outputs key.

property outputs_select¶: Get query reference to model outputs.

post_create(db: Datalayer) → None[source]¶

Post-create hook.

Parameters:: db – Data layer instance.

pre_create(db: Datalayer) → None[source]¶

Pre-create hook.

Parameters:: db – Data layer instance.

property predict_id¶: Get predict ID.

predict_kwargs: Dict | None¶

schedule_jobs(db: Datalayer, dependencies: Sequence[Job] = (), overwrite: bool = False) → Sequence[Any][source]¶

Schedule jobs for the listener.

Parameters:

db – Data layer instance to process.
dependencies – A list of dependencies.

select: CompoundSelect¶

type_id: ClassVar[str] = 'listener'¶

ui_schema: ClassVar[List[Dict]] = [{'default': '', 'name': 'identifier', 'type': 'str'}, {'name': 'key', 'type': 'json'}, {'name': 'model', 'type': 'component/model'}, {'default': {'documents': [], 'query': '<collection_name>.find()'}, 'name': 'select', 'type': 'json'}, {'default': True, 'name': 'active', 'type': 'bool'}, {'default': {}, 'name': 'predict_kwargs', 'type': 'json'}]¶

class superduperdb.Metric(identifier: str, artifacts: dataclasses.InitVar[Optional[Dict]] = None, *, object: Callable)[source]¶

Bases: Component

Metric base object used to evaluate performance on a dataset.

These objects are callable and are applied row-wise to the data, and averaged.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
object – Callable or an Artifact to be applied to the data.

public_api(beta): This API is in beta and may change before becoming stable.

__call__(x: Sequence[int], y: Sequence[int]) → bool[source]¶

Call the metric object on the x and y data.

Parameters:

x – First sequence of data.
y – Second sequence of data.

full_import_path = 'superduperdb.components.metric.Metric'¶

object: Callable¶

type_id: ClassVar[str] = 'metric'¶

ui_schema: ClassVar[List[Dict]] = [{'name': 'object', 'type': 'artifact'}]¶

class superduperdb.Model(identifier: str, artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, signature: ~typing.Literal['*args', '**kwargs', '*args, **kwargs', 'singleton'] = '*args, **kwargs', datatype: ~superduperdb.components.datatype.DataType | ~superduperdb.backends.ibis.field_types.FieldType | None = None, output_schema: ~superduperdb.components.schema.Schema | None = None, flatten: bool = False, model_update_kwargs: ~typing.Dict = <factory>, predict_kwargs: ~typing.Dict = <factory>, compute_kwargs: ~typing.Dict = <factory>, validation: ~superduperdb.components.model.Validation | None = None, metric_values: ~typing.Dict = <factory>)[source]¶

Bases: Component

Base class for components which can predict.

Parameters:

signature – Model signature.
datatype – DataType instance.
output_schema – Output schema (mapping of encoders).
flatten – Flatten the model outputs.
model_update_kwargs – The kwargs to use for model update.
predict_kwargs – Additional arguments to use at prediction time.
compute_kwargs – Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=…).
validation – The validation Dataset instances to use.
metric_values – The metrics to evaluate on.

__call__(*args, outputs: str | None = None, **kwargs)[source]¶

Connect the models to build a graph.

Parameters:

args – Arguments to be passed to the model.
outputs – Identifier for the model outputs.
kwargs – Keyword arguments to be passed to the model.

_infer_auto_schema(outputs, predict_id)[source]¶

Infer datatype from outputs of the model.

Parameters:: outputs – Outputs to infer datatype from.

compute_kwargs: t.Dict¶

datatype: EncoderArg = None¶

encode_outputs(outputs)[source]¶

Method that encodes outputs of a model for saving in the database.

Parameters:: outputs – outputs to encode.

encode_with_schema(outputs)[source]¶

Encode model outputs corresponding to the provided output_schema.

Parameters:: outputs – Encode the outputs with the given schema.

flatten: bool = False¶

full_import_path = 'superduperdb.components.model.Model'¶

static handle_input_type(data, signature)[source]¶

Method to transform data with respect to signature.

Parameters:

data – Data to be transformed
signature – Data signature for transforming

property inputs: Inputs¶: Instance of Inputs to represent model params.

metric_values: t.Dict¶

model_update_kwargs: t.Dict¶

output_schema: t.Optional[Schema] = None¶

abstract predict(dataset: List | QueryDataset) → List[source]¶

Execute on a series of data points defined in the dataset.

Parameters:: dataset – Series of data points to predict on.

predict_in_db(X: ModelInputType, db: Datalayer, predict_id: str, select: CompoundSelect, ids: t.Optional[t.List[str]] = None, max_chunk_size: t.Optional[int] = None, in_memory: bool = True, overwrite: bool = False) → t.Any[source]¶

Predict on the data points in the database.

Execute a single prediction on a data point given by positional and keyword arguments as a job.

Parameters:

X – combination of input keys to be mapped to the model
db – Datalayer instance
predict_id – Identifier for saving outputs.
select – CompoundSelect query
ids – Iterable of ids
max_chunk_size – Chunks of data
in_memory – Load data into memory or not
overwrite – Overwrite all documents or only new documents

predict_in_db_job(X: ModelInputType, db: Datalayer, predict_id: str, select: t.Optional[CompoundSelect], ids: t.Optional[t.List[str]] = None, max_chunk_size: t.Optional[int] = None, dependencies: t.Sequence[Job] = (), in_memory: bool = True, overwrite: bool = False)[source]¶

Run a prediction job in the database.

Execute a single prediction on the data points given by positional and keyword arguments as a job.

Parameters:

X – combination of input keys to be mapped to the model
db – Datalayer instance
predict_id – Model outputs identifier
select – CompoundSelect query
ids – Iterable of ids
max_chunk_size – Chunks of data
dependencies – List of dependencies (jobs)
in_memory – Load data into memory or not
overwrite – Overwrite all documents or only new documents

predict_kwargs: t.Dict¶

abstract predict_one(*args, **kwargs) → int[source]¶

Predict on a single data point.

Execute a single prediction on a data point given by positional and keyword arguments.

signature: Signature = '*args,**kwargs'¶

to_listener(key: str | List[str] | Tuple[List[str], Dict[str, str]], select: CompoundSelect, identifier='', predict_kwargs: dict | None = None, **kwargs)[source]¶

Convert the model to a listener.

Parameters:

key – Key to be bound to the model
select – Object for selecting which data is processed
identifier – A string used to identify the model.
predict_kwargs – Keyword arguments to self.model.predict

type_id: t.ClassVar[str] = 'model'¶

ui_schema: t.ClassVar[t.Dict] = [{'name': 'datatype', 'optional': True, 'type': 'component/datatype'}, {'default': {}, 'name': 'predict_kwargs', 'type': 'json'}, {'default': '*args,**kwargs', 'name': 'signature', 'type': 'str'}]¶

validate(X, dataset: Dataset, metrics: t.Sequence[Metric])[source]¶

Validate dataset on metrics.

Parameters:

X – Define input map
dataset – Dataset to run validation on.
metrics – Metrics for performing validation

validate_in_db(db)[source]¶

Validation job in database.

Parameters:: db – DataLayer instance.

validate_in_db_job(db, dependencies: Sequence[Job] = ())[source]¶

Perform a validation job.

Parameters:

db – DataLayer instance
dependencies – dependencies on the job

validation: t.Optional[Validation] = None¶

class superduperdb.ObjectModel(identifier: str, artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, signature: ~typing.Literal['*args', '**kwargs', '*args, **kwargs', 'singleton'] = '*args, **kwargs', datatype: ~superduperdb.components.datatype.DataType | ~superduperdb.backends.ibis.field_types.FieldType | None = None, output_schema: ~superduperdb.components.schema.Schema | None = None, flatten: bool = False, model_update_kwargs: ~typing.Dict = <factory>, predict_kwargs: ~typing.Dict = <factory>, compute_kwargs: ~typing.Dict = <factory>, validation: ~superduperdb.components.model.Validation | None = None, metric_values: ~typing.Dict = <factory>, num_workers: int = 0, object: ~typing.Any)[source]¶

Bases: _ObjectModel

Model component which wraps a Model to become serializable.

Base class for components which can predict.

Parameters:

signature – Model signature.
datatype – DataType instance.
output_schema – Output schema (mapping of encoders).
flatten – Flatten the model outputs.
model_update_kwargs – The kwargs to use for model update.
predict_kwargs – Additional arguments to use at prediction time.
compute_kwargs – Kwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=…).
validation – The validation Dataset instances to use.
metric_values – The metrics to evaluate on.

full_import_path = 'superduperdb.components.model.ObjectModel'¶

ui_schema: t.ClassVar[t.List[t.Dict]] = [{'name': 'object', 'type': 'artifact'}]¶

class superduperdb.QueryModel(identifier: str, artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, datatype: ~superduperdb.components.datatype.DataType | ~superduperdb.backends.ibis.field_types.FieldType | None = None, output_schema: ~superduperdb.components.schema.Schema | None = None, flatten: bool = False, model_update_kwargs: ~typing.Dict = <factory>, predict_kwargs: ~typing.Dict = <factory>, compute_kwargs: ~typing.Dict = <factory>, validation: ~superduperdb.components.model.Validation | None = None, metric_values: ~typing.Dict = <factory>, preprocess: ~typing.Callable | None = None, postprocess: ~typing.Callable | ~superduperdb.base.code.Code | None = None, select: ~superduperdb.backends.base.query.CompoundSelect)[source]¶

Bases: Model

QueryModel component.

Model which can be used to query data and return those precomputed queries as Results.

Parameters:

preprocess – Preprocess callable
postprocess – Postprocess callable
select – query used to find data (can include like)

full_import_path = 'superduperdb.components.model.QueryModel'¶

classmethod handle_integration(kwargs)[source]¶

Handle integration from UI.

Parameters:: kwargs – Integration kwargs.

property inputs: Inputs¶: Instance of Inputs to represent model params.

postprocess: t.Optional[t.Union[t.Callable, Code]] = None¶

predict(dataset: List | QueryDataset) → List[source]¶

Execute on a series of data points defined in the dataset.

Parameters:: dataset – Series of data points to predict on.

predict_one(*args, **kwargs)[source]¶

Predict on a single data point.

Method to perform a single prediction on args and kwargs. This method is also used for debugging the model.

preprocess: t.Optional[t.Callable] = None¶

select: CompoundSelect¶

signature: t.ClassVar[Signature] = '**kwargs'¶

ui_schema: t.ClassVar[t.List[t.Dict]] = [{'default': 'from superduperdb import code\n\n@code\ndef my_code(x):\n return x\n', 'name': 'postprocess', 'type': 'code'}, {'default': {'documents': [{'<key-1>': '$my_value'}, {'_id': 0, '_outputs': 0}], 'query': "<collection_name>.like(_documents[0], vector_index='<index_id>').find({}, _documents[1]).limit(10)"}, 'name': 'select', 'type': 'json'}]¶

class superduperdb.Schema(identifier: str, artifacts: dataclasses.InitVar[Optional[Dict]] = None, *, fields: Mapping[str, DataType])[source]¶

Bases: Component

A component containing information about the types or encoders of a table.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
fields – A mapping of field names to types or encoders.

public_api(beta): This API is in beta and may change before becoming stable.

__call__(data: dict[str, Any]) → dict[str, Any][source]¶

Encode data using the schema’s encoders.

Parameters:: data – Data to encode.

decode_data(data: dict[str, Any]) → dict[str, Any][source]¶

Decode data using the schema’s encoders.

Parameters:: data – Data to decode.

property encoded_types¶: List of fields of type DataType.

property encoders¶: An iterable to list DataType fields.

fields: Mapping[str, DataType]¶

full_import_path = 'superduperdb.components.schema.Schema'¶

pre_create(db) → None[source]¶

Database pre-create hook to add datatype to the database.

Parameters:: db – Datalayer instance.

property raw¶

Return the raw fields.

Get a dictionary of fields as keys and datatypes as values. This is used to create ibis tables.

property trivial¶: Determine if the schema contains only trivial fields.

type_id: ClassVar[str] = 'schema'¶

class superduperdb.Stack(identifier: str, artifacts: dataclasses.InitVar[Optional[Dict]] = None, *, components: Sequence[Component])[source]¶

Bases: Component

Component to hold a list of components under a namespace and package.

A placeholder to hold a list of components under a namespace and package them as a tarball. This tarball can be retrieved back to a Stack instance with the load method.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
components – List of components to stack together and add to the database.

public_api(alpha): This API is in alpha and may change before becoming stable.

components: Sequence[Component]¶

property db¶: Datalayer property.

static from_list(identifier, content, db: Datalayer | None = None)[source]¶

Helper method to create a Stack from a list content.

Parameters:

identifier – Unique identifier.
content – Content to create a stack.
db – Datalayer instance.

full_import_path = 'superduperdb.components.stack.Stack'¶

type_id: ClassVar[str] = 'stack'¶

class superduperdb.Validation(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, metrics: t.Sequence[Metric] = (), key: t.Optional[ModelInputType] = None, datasets: t.Sequence[Dataset] = ())[source]¶

Bases: Component

component which represents Validation definition.

Parameters:

metrics – List of metrics for validation
key – Model input type key
datasets – Sequence of dataset.

datasets: t.Sequence[Dataset] = ()¶

full_import_path = 'superduperdb.components.model.Validation'¶

key: t.Optional[ModelInputType] = None¶

metrics: t.Sequence[Metric] = ()¶

type_id: t.ClassVar[str] = 'validation'¶

class superduperdb.VectorIndex(identifier: str, artifacts: dataclasses.InitVar[typing.Optional[typing.Dict]] = None, *, indexing_listener: ~superduperdb.components.listener.Listener, compatible_listener: ~superduperdb.components.listener.Listener | None = None, measure: ~superduperdb.vector_search.base.VectorIndexMeasureType = VectorIndexMeasureType.cosine, metric_values: ~typing.Dict | None = <factory>)[source]¶

Bases: Component

A component carrying the information to apply a vector index to a DB instance.

Base class for all components in SuperDuperDB.

Class to represent SuperDuperDB serializable entities that can be saved into a database.

Parameters:

identifier – A unique identifier for the component.
artifacts – List of artifacts which represent entities that are not serializable by default.
indexing_listener – Listener which is applied to created vectors
compatible_listener – Listener which is applied to vectors to be compared
measure – Measure to use for comparison
metric_values – Metric values for this index

compatible_listener: Listener | None = None¶

property dimensions: int¶

Get dimension for vector database.

This dimension will be used to prepare vectors in the vector database.

full_import_path = 'superduperdb.components.vector_index.VectorIndex'¶

get_nearest(like: Document, db: Any, id_field: str = '_id', outputs: Dict | None = None, ids: Sequence[str] | None = None, n: int = 100) → Tuple[List[str], List[float]][source]¶

Get nearest results in this vector index.

Given a document, find the nearest results in this vector index, returned as two parallel lists of result IDs and scores.

Parameters:

like – The document to compare against
db – The datalayer to use
id_field – Identifier field
outputs – An optional dictionary
ids – A list of ids to match
n – Number of items to return

get_vector(like: Document, models: List[str], keys: str | List | Dict, db: Any = None, outputs: Dict | None = None)[source]¶

Peform vector search.

Perform vector search with query like from outputs in db on self.identifier vector index.

Parameters:

like – The document to compare against
models – List of models to retrieve outputs
keys – Keys available to retrieve outputs of model
db – A datalayer instance.
outputs – (optional) update like with outputs

indexing_listener: Listener¶

measure: VectorIndexMeasureType = 'cosine'¶

metric_values: Dict | None¶

property models_keys: Tuple[List[str], List[str | List[str] | Tuple[List[str], Dict[str, str]]]]¶: Return a list of model and keys for each listener.

on_load(db: Datalayer) → None[source]¶

On load hook to perform indexing and compatible listenernd compatible listener.

Automatically loads the listeners if they are not already loaded.

Parameters:: db – A DataLayer instance

schedule_jobs(db: Datalayer, dependencies: Sequence[Job] = ()) → Sequence[Any][source]¶

Schedule jobs for the listener.

Parameters:

db – The DB instance to process
dependencies – A list of dependencies

type_id: ClassVar[str] = 'vector_index'¶

ui_schema: ClassVar[List[Dict]] = [{'name': 'indexing_listener', 'type': 'component/listener'}, {'name': 'compatible_listener', 'optional': True, 'type': 'component/listener'}, {'choices': ['cosine', 'dot', 'l2'], 'name': 'measure', 'type': 'str'}]¶

superduperdb.code(my_callable)[source]¶

Decorator to mark a function as remote code.

Parameters:: my_callable – The callable to mark as remote code.

superduperdb.logging¶: alias of Logging

superduperdb.objectmodel(item: Callable | None = None, identifier: str | None = None, datatype=None, model_update_kwargs: Dict | None = None, flatten: bool = False, output_schema: Schema | None = None)[source]¶

Decorator to wrap a function with ObjectModel.

When a function is wrapped with this decorator, the function comes out as an ObjectModel.

Parameters:

item – Callable to wrap with ObjectModel.
identifier – Identifier for the ObjectModel.
datatype – Datatype for the model outputs.
model_update_kwargs – Dictionary to define update kwargs.
flatten – If True, flatten the outputs and save.
output_schema – Schema for the model outputs.

superduperdb.superduper(item: Any | None = None, **kwargs) → Any[source]¶

Superduper API to automatically wrap an object to a db or a component.

Attempts to automatically wrap an item in a superduperdb component by using duck typing to recognize it.

Parameters:: item – A database or model

superduperdb.vector(shape, identifier: str | None = None)[source]¶

Create an encoder for a vector (list of ints/ floats) of a given shape.

Parameters:

shape – The shape of the vector
identifier – The identifier of the vector