Skip to main content

Environment setup

SuperDuperDB may be run in 2 modes:

  • Development: All functions and computations run by blocking the main thread in the foreground.
  • Cluster: Long Computations, requests, models run asychronously on separate workers, services

Development mode​

By default, SuperDuperDB runs in development mode. This makes it super easy for developers to test the code-snippets and use-cases.

In development mode, all computations and configurations take place in a single process. Computations block the process in the foreground, and developers can easily set breakpoints during computation for debugging purposes.

In this mode, connecting to superduperdb is as simple as this:

from superduperdb import superduper

db = superduper('<your-database-uri>')

Cluster mode​

In cluster mode, the above snippet will not work, since superduperdb doesn't currently propagate this configuration to the rest of the cluster. For that reason, the data_backend URI should be specified in a configuration file common to all services in the cluster, and developers should connect with:

from superduperdb import superduper

db = superduper()

Services​

In cluster mode, multiple individual services are set up which are responsible for various parts of the work flow:

Ray cluster​

By specifying a ray cluster, computations requested in SuperDuperDB are pushed down to the configured ray cluster, which may be set up with optimized hardware, specific settings, etc.. Read more here

Vector-search service​

By specifying a vector-search service, the vector-comparison computation in vector-search queries is sent to this service, which may be set up to optimize for recall speed and performance. Read more here

Change-data capture service​

By specifying a change-data capture service, developers are enabled to insert data to their data_backend without directly using the superduperdb package, or even using Python. Read more here.

Rest API​

By specifying a Rest API service, developers may access superduperdb using FastAPI REST endpoints, which a documentation and experimentation interface, as well as the ability to integrate from non-Python programs. Read more here.

Configuration​

The configuration file should include the URIs of the services required:

# Settings pertaining to cluster mode
cluster:

# change data capture
cdc:
strategy: null

# How to connect to the service
uri: http://<cdc-host>:<cdc-port>

# ray compute settings
compute:

# How to connect to a ray service
uri: ray://<ray-host>:<ray-port>

# vector-search settings
vector_search:

# How to connect to the service
uri: http://<vector_search-host>:<vecto_search-port>
backfill_batch_size: 100

# REST API settings (experimental)
rest:

# How to connect to the service
uri: http://<rest-host>:<rest-port>

As well as the required database in data_backend:

data_backend: <database-uri>