Skip to main content

Welcome to SuperDuperDB!

What is SuperDuperDB?​

SuperDuperDB is an open-source Python framework for modern AI (post 2023) to:

  • connect an AI development environment directly to data
  • connect an AI production environment directly to data
  • create your own flexible platform connecting your AI and data for all AI stakeholders to collaborate on

SuperDuperDB can handle classical AI/ machine learning paradigms (classification, regression, forecasting, clustering, etc.) as well as the most update to date techniques (generative AI, LLMs, retrieval augmented generation - RAG, computer vision, multimodal AI, etc.).

What problem does SuperDuperDB solve?​

AI development consists of multiple phases, tooling universes, stakeholders:

Phases

  • Data injestion & preparation
  • Model development and training
  • Production computation, inference and fine-tuning

Tooling

  • Database, lake, warehouse, object storage
  • IDEs, notebooks, software packages
  • ETL jobs, cloud compute

Stakeholders

  • AI researchers
  • Data scientists and analysts
  • Engineers: MLOps, cloud
  • Decision makers
important

A central problem in operationalizing AI is that the phases, tooling and stakeholders do not have a single accepted environment to co-exist, collaborate and interface which fits developers' and organizations' operational needs.

For more information about SuperDuperDB and why we believe it is much needed, read this blog post.

How can developers use SuperDuperDB?​

SuperDuperDB boils down to 3 key patterns:

1. Connect to your data​

from superduperdb import superduper

db = superduper('<your-database-uri>')

2. Apply AI to your data​


component = ... # build your AI with anything from the
# python ecosystem

db.apply(component)

3. Query your data to obtain predictions, select data or perform vector-searches​

db.execute(query)

What does apply AI to data mean?​

"Applying AI" to data can mean numerous things, which developers are able to determine themselves. Any of these things is possible:

  • Compute outputs on incoming data
  • Train a model on database data
  • Configure vector-search on database
  • Measure the performance of models
  • Configure models to work together

Why is the "DB" so important in AI?​

SuperDuperDB uses the fact that AI development always starts with data, ends with data, and interfaces with data from conception, to productionized deployment. Any environment which has a chance of uniting the diverse tools and stakeholders involved in AI development, needs to single way for AI models and algorithms to be connected to data. That way is SuperDuperDB.

important

By integrating AI directly at data's source, SuperDuperDB enables developers to avoid implementing MLops.

What integrations does SuperDuperDB include?​

Data​

  • MongoDB
  • PostgreSQL
  • SQLite
  • Snowflake
  • MySQL
  • Oracle
  • MSSQL
  • Clickhouse
  • Pandas

AI frameworks​

  • OpenAI
  • Cohere
  • Anthropic
  • PyTorch
  • Sklearn
  • Transformers
  • Sentence-Transformers

What important additional aspects does SuperDuperDB include?​

Developers may:

  • Choose whether to deploy SuperDuperDB in single blocking process or in scalable, non-blocking mode via ray
  • Choose whether to use their own self-programmed home grown models, or integrate AI APIs and open-source frameworks
  • Choose which type of data they use, including images, videos, audio, or custom datatypes
  • Automatically version and track all functionality they use
  • Keep control over which data is exposed to API services (if any) by leveraging model self-hosting

Key Features:​

  • Integration of AI with your existing data infrastructure: Integrate any AI models and APIs with your databases in a single scalable deployment without the need for additional pre-processing steps, ETL, or boilerplate code.
  • Streaming Inference: Have your models compute outputs automatically and immediately as new data arrives, keeping your deployment always up-to-date.
  • Scalable Model Training: Train AI models on large, diverse datasets simply by querying your training data. Ensured optimal performance via in-build computational optimizations.
  • Model Chaining: Easily set up complex workflows by connecting models and APIs to work together in an interdependent and sequential manner.
  • Simple, but Extendable Interface: Add and leverage any function, program, script, or algorithm from the Python ecosystem to enhance your workflows and applications. Drill down to any layer of implementation, including the inner workings of your models, while operating SuperDuperDB with simple Python commands.
  • Difficult Data Types: Work directly in your database with images, video, audio, and any type that can be encoded as bytes in Python.
  • Feature Storing: Turn your database into a centralized repository for storing and managing inputs and outputs of AI models of arbitrary data types, making them available in a structured format and known environment.
  • Vector Search: No need to duplicate and migrate your data to additional specialized vector databases - turn your existing battle-tested database into a fully-fledged multi-modal vector-search database, including easy generation of vector embeddings and vector indexes of your data with preferred models and APIs.