superduperdb.ext.openai package#

Submodules#

superduperdb.ext.openai.model module#

class superduperdb.ext.openai.model.OpenAIAudioTranscription(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = True, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI audio transcription predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:
  • takes_context – Whether the model takes context into account.

  • prompt – The prompt to guide the model’s style. Should contain {context}.

async _apredict_a_batch(files: List[BinaryIO], **kwargs)[source]#

Converts multiple file-like Audio recordings to text.

async _apredict_one(file: BinaryIO, context: List[str] | None = None, **kwargs)[source]#

Converts a file-like Audio recording to text.

_predict_a_batch(files: List[BinaryIO], **kwargs)[source]#

Converts multiple file-like Audio recordings to text.

_predict_one(file: BinaryIO, context: List[str] | None = None, **kwargs)[source]#

Converts a file-like Audio recording to text.

pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
takes_context: bool = True#
class superduperdb.ext.openai.model.OpenAIAudioTranslation(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = True, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI audio translation predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:
  • takes_context – Whether the model takes context into account.

  • prompt – The prompt to guide the model’s style. Should contain {context}.

async _apredict_a_batch(files: List[BinaryIO], **kwargs)[source]#

Translates multiple file-like Audio recordings to English.

async _apredict_one(file: BinaryIO, context: List[str] | None = None, **kwargs)[source]#

Translates a file-like Audio recording to English.

_predict_a_batch(files: List[BinaryIO], **kwargs)[source]#

Translates multiple file-like Audio recordings to English.

_predict_one(file: BinaryIO, context: List[str] | None = None, **kwargs)[source]#

Translates a file-like Audio recording to English.

pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
takes_context: bool = True#
class superduperdb.ext.openai.model.OpenAIChatCompletion(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = False, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI chat completion predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:

prompt – The prompt to use to seed the response.

pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
class superduperdb.ext.openai.model.OpenAIEmbedding(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = False, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, shape: ~typing.Sequence[int] | None = None)[source]#

Bases: _OpenAI

OpenAI embedding predictor

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:

shape – The shape as tuple of the embedding.

pre_create(db)[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

shape: Sequence[int] | None = None#
shapes: ClassVar[Dict] = {'text-embedding-ada-002': (1536,)}#
class superduperdb.ext.openai.model.OpenAIImageCreation(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = True, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI image creation predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:
  • takes_context – Whether the model takes context into account.

  • prompt – The prompt to use to seed the response.

pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
takes_context: bool = True#
class superduperdb.ext.openai.model.OpenAIImageEdit(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = True, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI image edit predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:
  • takes_context – Whether the model takes context into account.

  • prompt – The prompt to use to seed the response.

pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
takes_context: bool = True#
class superduperdb.ext.openai.model._OpenAI(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = False, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>)[source]#

Bases: APIModel

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

client_kwargs: dict | None#

Module contents#

class superduperdb.ext.openai.OpenAIChatCompletion(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = False, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, prompt: str = '')[source]#

Bases: _OpenAI

OpenAI chat completion predictor.

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:

prompt – The prompt to use to seed the response.

client_kwargs: t.Optional[dict]#
identifier: str#
model_update_kwargs: t.Dict#
pre_create(db: Datalayer) None[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

prompt: str = ''#
class superduperdb.ext.openai.OpenAIEmbedding(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, preprocess: t.Optional[t.Callable] = None, postprocess: t.Optional[t.Callable] = None, collate_fn: t.Optional[t.Callable] = None, batch_predict: bool = False, takes_context: bool = False, metrics: t.Sequence[t.Union[str, Metric, None]] = (), model_update_kwargs: t.Dict = <factory>, validation_sets: t.Optional[t.Sequence[t.Union[str, Dataset]]] = None, predict_X: t.Optional[str] = None, predict_select: t.Optional[CompoundSelect] = None, predict_max_chunk_size: t.Optional[int] = None, predict_kwargs: t.Optional[t.Dict] = None, model: t.Optional[str] = None, client_kwargs: dict | None = <factory>, shape: ~typing.Sequence[int] | None = None)[source]#

Bases: _OpenAI

OpenAI embedding predictor

Parameters:
  • identifier – A unique identifier for the component

  • encoder – Encoder instance

  • output_schema – Output schema (mapping of encoders)

  • flatten – Flatten the model outputs

  • preprocess – Preprocess function

  • postprocess – Postprocess function

  • collate_fn – Collate function

  • batch_predict – Whether to batch predict

  • takes_context – Whether the model takes context into account

  • metrics – The metrics to evaluate on

  • model_update_kwargs – The kwargs to use for model update

  • validation_sets – The validation Dataset instances to use

  • predict_X – The key of the input data to use for .predict

  • predict_select – The select to use for .predict

  • predict_max_chunk_size – The max chunk size to use for .predict

  • predict_kwargs – The kwargs to use for .predict

  • model – The model to use, e.g. 'text-embedding-ada-002'

public_api(beta): This API is in beta and may change before becoming stable.

Parameters:

shape – The shape as tuple of the embedding.

client_kwargs: t.Optional[dict]#
identifier: str#
model_update_kwargs: t.Dict#
pre_create(db)[source]#

Called the first time this component is created

Parameters:

db – the db that creates the component

shape: Sequence[int] | None = None#
shapes: ClassVar[Dict] = {'text-embedding-ada-002': (1536,)}#