Skip to main content

model

superduperdb.ext.transformers.model

Source code

LLM​

LLM(self,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = <factory>,
*,
trainer: 't.Optional[Trainer]' = None,
identifier: str = '',
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = <factory>,
predict_kwargs: 't.Dict' = <factory>,
compute_kwargs: 't.Dict' = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = <factory>,
prompt: str = '{input}',
prompt_func: Optional[Callable] = None,
max_batch_size: Optional[int] = 4,
model_name_or_path: Optional[str] = None,
adapter_id: Union[str,
superduperdb.ext.transformers.training.Checkpoint,
NoneType] = None,
model_kwargs: Dict = <factory>,
tokenizer_kwargs: Dict = <factory>,
prompt_template: str = '{input}') -> None
ParameterDescription
identifiermodel identifier
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
promptThe template to use for the prompt.
prompt_funcprompt function, default is None
max_batch_sizeThe maximum batch size to use for batch generation.
model_name_or_pathmodel name or path
adapter_idadapter id, default is None Add a adapter to the base model for inference.
model_kwargsmodel kwargs, all the kwargs will pass to transformers.AutoModelForCausalLM.from_pretrained
tokenizer_kwargstokenizer kwargs, all the kwargs will pass to transformers.AutoTokenizer.from_pretrained
prompt_templateprompt template, default is "{input}"

LLM model based on transformers library.

All the model_kwargs will pass to transformers.AutoModelForCausalLM.from_pretrained. All the tokenize_kwargs will pass to transformers.AutoTokenizer.from_pretrained. When model_name_or_path, bits, model_kwargs, tokenizer_kwargs are the same, will share the same base model and tokenizer cache.

TextClassificationPipeline​

TextClassificationPipeline(self,
identifier: str,
db: dataclasses.InitVar[typing.Optional[ForwardRef('Datalayer')]] = None,
uuid: str = <factory>,
*,
preferred_devices: 't.Sequence[str]' = ('cuda',
'mps',
'cpu'),
device: 't.Optional[str]' = None,
trainer: 't.Optional[Trainer]' = None,
artifacts: 'dc.InitVar[t.Optional[t.Dict]]' = None,
signature: Literal['*args',
'**kwargs',
'*args,
**kwargs',
'singleton'] = 'singleton',
datatype: 'EncoderArg' = None,
output_schema: 't.Optional[Schema]' = None,
flatten: 'bool' = False,
model_update_kwargs: 't.Dict' = <factory>,
predict_kwargs: 't.Dict' = <factory>,
compute_kwargs: 't.Dict' = <factory>,
validation: 't.Optional[Validation]' = None,
metric_values: 't.Dict' = <factory>,
tokenizer_name: Optional[str] = None,
tokenizer_cls: object = <class 'transformers.models.auto.tokenization_auto.AutoTokenizer'>,
tokenizer_kwargs: Dict = <factory>,
model_name: Optional[str] = None,
model_cls: object = <class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>,
model_kwargs: Dict = <factory>,
pipeline: Optional[transformers.pipelines.base.Pipeline] = None,
task: str = 'text-classification') -> None
ParameterDescription
identifierIdentifier of the leaf.
dbDatalayer instance.
uuidUUID of the leaf.
artifactsA dictionary of artifacts paths and DataType objects
signatureModel signature.
datatypeDataType instance.
output_schemaOutput schema (mapping of encoders).
flattenFlatten the model outputs.
model_update_kwargsThe kwargs to use for model update.
predict_kwargsAdditional arguments to use at prediction time.
compute_kwargsKwargs used for compute backend job submit. Example (Ray backend): compute_kwargs = dict(resources=...).
validationThe validation Dataset instances to use.
metric_valuesThe metrics to evaluate on.
tokenizer_nametokenizer name
tokenizer_clstokenizer class, e.g. transformers.AutoTokenizer
tokenizer_kwargstokenizer kwargs, will pass to tokenizer_cls
model_namemodel name, will pass to model_cls
model_clsmodel class, e.g. AutoModelForSequenceClassification
model_kwargsmodel kwargs, will pass to model_cls
pipelinepipeline instance, default is None, will build when None
tasktask of the pipeline
trainerTransformersTrainer instance
preferred_devicespreferred devices
devicedevice to use

A wrapper for transformers.Pipeline.

# Example:
# -------
model = TextClassificationPipeline(...)