Environment setup

SuperDuperDB may be run in 2 modes:

Development: All functions and computations run by blocking the main thread in the foreground.
Cluster: Long Computations, requests, models run asychronously on separate workers, services

Development mode

By default, SuperDuperDB runs in development mode. This makes it super easy for developers to test the code-snippets and use-cases.

In development mode, all computations and configurations take place in a single process. Computations block the process in the foreground, and developers can easily set breakpoints during computation for debugging purposes.

In this mode, connecting to superduperdb is as simple as this:

from superduperdb import superduper

db = superduper('<your-database-uri>')

Cluster mode

In cluster mode, the above snippet will not work, since superduperdb doesn't currently propagate this configuration to the rest of the cluster. For that reason, the data_backend URI should be specified in a configuration file common to all services in the cluster, and developers should connect with:

from superduperdb import superduper

db = superduper()

Services

In cluster mode, multiple individual services are set up which are responsible for various parts of the work flow:

Ray cluster

By specifying a ray cluster, computations requested in SuperDuperDB are pushed down to the configured ray cluster, which may be set up with optimized hardware, specific settings, etc.. Read more here

Vector-search service

By specifying a vector-search service, the vector-comparison computation in vector-search queries is sent to this service, which may be set up to optimize for recall speed and performance. Read more here

Change-data capture service

By specifying a change-data capture service, developers are enabled to insert data to their data_backend without directly using the superduperdb package, or even using Python. Read more here.

Rest API

By specifying a Rest API service, developers may access superduperdb using FastAPI REST endpoints, which a documentation and experimentation interface, as well as the ability to integrate from non-Python programs. Read more here.

Configuration

The configuration file should include the URIs of the services required:

# Settings pertaining to cluster mode
cluster:

  # change data capture
  cdc:
    strategy: null
    
    # How to connect to the service
    uri: http://<cdc-host>:<cdc-port>

  # ray compute settings
  compute:

    # How to connect to a ray service
    uri: ray://<ray-host>:<ray-port>

  # vector-search settings
  vector_search:

    # How to connect to the service
    uri: http://<vector_search-host>:<vecto_search-port>
    backfill_batch_size: 100

  # REST API settings (experimental)
  rest:

    # How to connect to the service
    uri: http://<rest-host>:<rest-port>

As well as the required database in data_backend:

data_backend: <database-uri>

Environment setup

Development mode​

Cluster mode​

Services​

Ray cluster​

Vector-search service​

Change-data capture service​

Rest API​

Configuration​