superduperdb.ext.vllm package#

Submodules#

superduperdb.ext.vllm.model module#

class superduperdb.ext.vllm.model.VllmAPI(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, api_url: str = '', *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, model_update_kwargs: t.Dict = <factory>, predict_kwargs: t.Dict = <factory>, compute_kwargs: t.Dict = <factory>, prompt: str = '{input}', prompt_func: ~typing.Callable | None = None, max_batch_size: int | None = 4)[source]#

Bases: BaseLLMAPI

Wrapper for requesting the vLLM API service (API Server format, started by vllm.entrypoints.api_server)

Parameters:
  • api_url – The URL for the API.

  • prompt_template – The template to use for the prompt.

  • prompt_func – The function to use for the prompt.

  • max_batch_size – The maximum batch size to use for batch generation.

  • predict_kwargs – Parameters used during inference.

public_api(beta): This API is in beta and may change before becoming stable.

_generate(prompt: str, **kwargs) str | List[str][source]#

Batch generate text from a prompt.

build_post_data(prompt: str, **kwargs: dict[str, Any]) dict[str, Any][source]#
full_import_path = 'superduperdb.ext.vllm.model.VllmAPI'#
class superduperdb.ext.vllm.model.VllmModel(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, model_update_kwargs: t.Dict = <factory>, predict_kwargs: t.Dict = <factory>, compute_kwargs: t.Dict = <factory>, prompt: str = '{input}', prompt_func: ~typing.Callable | None = None, max_batch_size: int | None = 4, model_name: str = '', tensor_parallel_size: int = 1, trust_remote_code: bool = True, vllm_kwargs: dict = <factory>, on_ray: bool = False, ray_address: str | None = None, ray_config: dict = <factory>)[source]#

Bases: BaseLLM

Load a large language model from VLLM.

Parameters:
  • model_name – The name of the model to use.

  • trust_remote_code – Whether to trust remote code.

  • dtype – The data type to use.

  • prompt_template – The template to use for the prompt.

  • prompt_func – The function to use for the prompt.

  • max_batch_size – The maximum batch size to use for batch generation.

  • predict_kwargs – Parameters used during inference.

public_api(beta): This API is in beta and may change before becoming stable.

full_import_path = 'superduperdb.ext.vllm.model.VllmModel'#
init()[source]#
model_name: str = ''#
on_ray: bool = False#
ray_address: str | None = None#
ray_config: dict#
tensor_parallel_size: int = 1#
trust_remote_code: bool = True#
vllm_kwargs: dict#

Module contents#

class superduperdb.ext.vllm.VllmAPI(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, api_url: str = '', *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, model_update_kwargs: t.Dict = <factory>, predict_kwargs: t.Dict = <factory>, compute_kwargs: t.Dict = <factory>, prompt: str = '{input}', prompt_func: ~typing.Callable | None = None, max_batch_size: int | None = 4)[source]#

Bases: BaseLLMAPI

Wrapper for requesting the vLLM API service (API Server format, started by vllm.entrypoints.api_server)

Parameters:
  • api_url – The URL for the API.

  • prompt_template – The template to use for the prompt.

  • prompt_func – The function to use for the prompt.

  • max_batch_size – The maximum batch size to use for batch generation.

  • predict_kwargs – Parameters used during inference.

public_api(beta): This API is in beta and may change before becoming stable.

_generate(prompt: str, **kwargs) str | List[str][source]#

Batch generate text from a prompt.

build_post_data(prompt: str, **kwargs: dict[str, Any]) dict[str, Any][source]#
compute_kwargs: t.Dict#
full_import_path = 'superduperdb.ext.vllm.model.VllmAPI'#
identifier: str#
model_update_kwargs: t.Dict#
predict_kwargs: t.Dict#
class superduperdb.ext.vllm.VllmModel(identifier: str, artifacts: dc.InitVar[t.Optional[t.Dict]] = None, *, datatype: EncoderArg = None, output_schema: t.Optional[Schema] = None, flatten: bool = False, model_update_kwargs: t.Dict = <factory>, predict_kwargs: t.Dict = <factory>, compute_kwargs: t.Dict = <factory>, prompt: str = '{input}', prompt_func: ~typing.Callable | None = None, max_batch_size: int | None = 4, model_name: str = '', tensor_parallel_size: int = 1, trust_remote_code: bool = True, vllm_kwargs: dict = <factory>, on_ray: bool = False, ray_address: str | None = None, ray_config: dict = <factory>)[source]#

Bases: BaseLLM

Load a large language model from VLLM.

Parameters:
  • model_name – The name of the model to use.

  • trust_remote_code – Whether to trust remote code.

  • dtype – The data type to use.

  • prompt_template – The template to use for the prompt.

  • prompt_func – The function to use for the prompt.

  • max_batch_size – The maximum batch size to use for batch generation.

  • predict_kwargs – Parameters used during inference.

public_api(beta): This API is in beta and may change before becoming stable.

compute_kwargs: t.Dict#
full_import_path = 'superduperdb.ext.vllm.model.VllmModel'#
identifier: str#
init()[source]#
model_name: str = ''#
model_update_kwargs: t.Dict#
on_ray: bool = False#
predict_kwargs: t.Dict#
ray_address: str | None = None#
ray_config: dict#
tensor_parallel_size: int = 1#
trust_remote_code: bool = True#
vllm_kwargs: dict#