Skip to main content

Basic RAG tutorial

info

In this tutorial we show you how to do retrieval augmented generation (RAG) with superduperdb. Note that this is just an example of the flexibility and power which superduperdb gives to developers. superduperdb is about much more than RAG and LLMs.

As in the vector-search tutorial we'll use superduperdb documentation for the tutorial. We'll add this to a testing database by downloading the data snapshot:

!curl -O https://superduperdb-public-demo.s3.amazonaws.com/text.json
Outputs

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 720k 100 720k 0 0 679k 0 0:00:01 0:00:01 --:--:-- 681k

import json

from superduperdb import superduper, Document

db = superduper('mongomock://test')

with open('text.json') as f:
data = json.load(f)

_ = db['docu'].insert_many([{'txt': r} for r in data]).execute()
Outputs

Let's verify the data in the db by querying one datapoint:

db['docu'].find_one().execute()
Outputs

The first step in a RAG application is to create a VectorIndex. The results of searching with this index will be used as input to the LLM for answering questions.

Read about VectorIndex here and follow along the tutorial on vector-search here.

import requests 

from superduperdb import Application, Document, VectorIndex, Listener, vector
from superduperdb.ext.sentence_transformers.model import SentenceTransformer
from superduperdb.base.code import Code

def postprocess(x):
return x.tolist()

datatype = vector(shape=384, identifier="my-vec")

model = SentenceTransformer(
identifier="my-embedding",
datatype=datatype,
predict_kwargs={"show_progress_bar": True},
signature="*args,**kwargs",
model="all-MiniLM-L6-v2",
device="cpu",
postprocess=Code.from_object(postprocess),
)

listener = Listener(
identifier="my-listener",
model=model,
key='txt',
select=db['docu'].find(),
predict_kwargs={'max_chunk_size': 50},
)

vector_index = VectorIndex(
identifier="my-index",
indexing_listener=listener,
measure="cosine"
)

db.apply(vector_index)
Outputs

Now that we've set up a VectorIndex, we can connect this index with an LLM in a number of ways. A simple way to do that is with the SequentialModel. The first part of the SequentialModel executes a query and provides the results to the LLM in the second part.

The RetrievalPrompt component takes a query with a "free" variable as input, signified with <var:???>. This gives users great flexibility with regard to how they fetch the context for their downstream models.

We're using OpenAI, but you can use any type of LLm with superduperdb. We have several native integrations (see here) but you can also bring your own model.

from superduperdb.ext.llm.prompter import *
from superduperdb import Document
from superduperdb.components.model import SequentialModel
from superduperdb.ext.openai import OpenAIChatCompletion

q = db['docu'].like(Document({'txt': '<var:prompt>'}), vector_index='my-index', n=5).find().limit(10)

def get_output(c):
return [r['txt'] for r in c]

prompt_template = RetrievalPrompt('my-prompt', select=q, postprocess=Code.from_object(get_output))

llm = OpenAIChatCompletion('gpt-3.5-turbo')
seq = SequentialModel('rag', models=[prompt_template, llm])

db.apply(seq)
Outputs

Now we can test the SequentialModel with a sample question:

seq.predict('Tell be about vector-indexes')
Outputs
tip

Did you know you can use any tools from the Python ecosystem with superduperdb. That includes langchain and llamaindex which can be very useful for RAG applications.

from superduperdb import Application

app = Application('rag-app', components=[vector_index, seq, plugin_1, plugin_2])
Outputs
app.encode()
Outputs
app.export('rag-app')
Outputs
!cat rag-app/requirements.txt
Outputs
from superduperdb import *

app = Component.read('rag-app')
Outputs

/Users/dodo/.pyenv/versions/3.11.7/envs/superduperdb-3.11/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True. warnings.warn(

app.info()
Outputs

2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.vector_index.VectorIndex'> with identifier: my-index 2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.listener.Listener'> with identifier: my-listener 2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.sentence_transformers.model.SentenceTransformer'> with identifier: my-embedding 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.datatype.DataType'> with identifier: my-vec 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.base.code.Code'> with identifier: postprocess 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.backends.mongodb.query.MongoQuery'> with identifier: docu-find 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.model.SequentialModel'> with identifier: rag 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.llm.prompter.RetrievalPrompt'> with identifier: my-prompt 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.base.code.Code'> with identifier: get_output 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.backends.mongodb.query.MongoQuery'> with identifier: docu-like-txt-var-prompt-vector-index-my-index-n-5-find-limit-10 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.openai.model.OpenAIChatCompletion'> with identifier: gpt-3.5-turbo

╭──────────────────────────────────────────────────── rag-app ────────────────────────────────────────────────────╮ │ identifier: rag-app │ │ uuid: 9115f5ec-5575-4a11-8678-664f3904bab7 │ │ components: [VectorIndex(identifier='my-index', uuid='650db68c-8786-4204-bc2d-6cc4f1d2511c',  │ │ indexing_listener=Listener(identifier='my-listener', uuid='02f5b3d4-7a0a-48d8-990c-bdae29424038', key='txt',  │ │ model=SentenceTransformer(preferred_devices=('cuda', 'mps', 'cpu'), device='cpu', identifier='my-embedding',  │ │ uuid='b1351454-3714-4c57-bacf-2f2a667d5fdc', signature='*args,**kwargs', datatype=DataType(identifier='my-vec', │ │ uuid='ecfbe6d5-5c1f-4b80-b224-aaf0a1f3ee1d', encoder=None, decoder=None, info=None, shape=(384,),  │ │ directory=None, encodable='native', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermediate_type='bytes',  │ │ media_type=None), output_schema=None, flatten=False, model_update_kwargs={},  │ │ predict_kwargs={'show_progress_bar': True}, compute_kwargs={}, validation=None, metric_values={},  │ │ num_workers=0, object=SentenceTransformer( │ │  (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel  │ │  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens':  │ │ True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False,  │ │ 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) │ │  (2): Normalize() │ │ ), model='all-MiniLM-L6-v2', preprocess=None, postprocess=Code(identifier='postprocess',  │ │ uuid='fadfa78c-4c6b-4914-885a-e1372da93078', code='from superduperdb import code\n\n@code\ndef  │ │ postprocess(x):\n return x.tolist()\n')), select=docu.find(), active=True, predict_kwargs={'max_chunk_size': │ │ 50}), compatible_listener=None, measure=<VectorIndexMeasureType.cosine: 'cosine'>, metric_values={}),  │ │ SequentialModel(identifier='rag', uuid='fa46eb15-112c-496f-965f-c935494825c5', signature='**kwargs',  │ │ datatype=None, output_schema=None, flatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, │ │ validation=None, metric_values={}, num_workers=0, models=[RetrievalPrompt(identifier='my-prompt',  │ │ uuid='ded3b9b8-828d-41a4-bc37-02217fe0bc08', signature='**kwargs', datatype=None, output_schema=None,  │ │ flatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, validation=None, metric_values={}, │ │ num_workers=0, preprocess=None, postprocess=Code(identifier='get_output',  │ │ uuid='c1d6fb70-b6c7-42b4-8872-8bfd243ddf07', code="from superduperdb import code\n\n@code\ndef get_output(c):\n │ │ return [r['txt'] for r in c]\n"), select=docu.like({'txt': '<var:prompt>'}, vector_index="my-index",  │ │ n=5).find().limit(10), prompt_explanation="HERE ARE SOME FACTS SEPARATED BY '---' IN OUR DATA REPOSITORY WHICH  │ │ WILL HELP YOU ANSWER THE QUESTION.", prompt_introduction='HERE IS THE QUESTION WHICH YOU SHOULD ANSWER BASED  │ │ ONLY ON THE PREVIOUS FACTS:', join='\n---\n'), OpenAIChatCompletion(identifier='gpt-3.5-turbo',  │ │ uuid='bc04fcdf-3217-4cb7-9517-38fc632fc8f7', signature='singleton', datatype=None, output_schema=None,  │ │ flatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, validation=None, metric_values={}, │ │ num_workers=0, model='gpt-3.5-turbo', max_batch_size=8, openai_api_key=None, openai_api_base=None,  │ │ client_kwargs={}, batch_size=1, prompt='')])] │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ ╭────────────────────────────────────────────── Component Metadata ───────────────────────────────────────────────╮ │ Variables │ │ prompt │ │ │ │ │ │ Leaves │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯