Basic RAG tutorial
In this tutorial we show you how to do retrieval augmented generation (RAG) with superduperdb
.
Note that this is just an example of the flexibility and power which superduperdb
gives
to developers. superduperdb
is about much more than RAG and LLMs.
As in the vector-search tutorial we'll use superduperdb
documentation for the tutorial.
We'll add this to a testing database by downloading the data snapshot:
!curl -O https://superduperdb-public-demo.s3.amazonaws.com/text.json
Outputs
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 720k 100 720k 0 0 679k 0 0:00:01 0:00:01 --:--:-- 681k
import json
from superduperdb import superduper, Document
db = superduper('mongomock://test')
with open('text.json') as f:
data = json.load(f)
_ = db['docu'].insert_many([{'txt': r} for r in data]).execute()
Outputs
Let's verify the data in the db
by querying one datapoint:
db['docu'].find_one().execute()
Outputs
The first step in a RAG application is to create a VectorIndex
. The results of searching
with this index will be used as input to the LLM for answering questions.
Read about VectorIndex
here and follow along the tutorial on
vector-search here.
import requests
from superduperdb import Application, Document, VectorIndex, Listener, vector
from superduperdb.ext.sentence_transformers.model import SentenceTransformer
from superduperdb.base.code import Code
def postprocess(x):
return x.tolist()
datatype = vector(shape=384, identifier="my-vec")
model = SentenceTransformer(
identifier="my-embedding",
datatype=datatype,
predict_kwargs={"show_progress_bar": True},
signature="*args,**kwargs",
model="all-MiniLM-L6-v2",
device="cpu",
postprocess=Code.from_object(postprocess),
)
listener = Listener(
identifier="my-listener",
model=model,
key='txt',
select=db['docu'].find(),
predict_kwargs={'max_chunk_size': 50},
)
vector_index = VectorIndex(
identifier="my-index",
indexing_listener=listener,
measure="cosine"
)
db.apply(vector_index)
Outputs
Now that we've set up a VectorIndex
, we can connect this index with an LLM in a number of ways.
A simple way to do that is with the SequentialModel
. The first part of the SequentialModel
executes a query and provides the results to the LLM in the second part.
The RetrievalPrompt
component takes a query with a "free" variable as input, signified with <var:???>
.
This gives users great flexibility with regard to how they fetch the context
for their downstream models.
We're using OpenAI, but you can use any type of LLm with superduperdb
. We have several
native integrations (see here) but you can also bring your own model.
from superduperdb.ext.llm.prompter import *
from superduperdb import Document
from superduperdb.components.model import SequentialModel
from superduperdb.ext.openai import OpenAIChatCompletion
q = db['docu'].like(Document({'txt': '<var:prompt>'}), vector_index='my-index', n=5).find().limit(10)
def get_output(c):
return [r['txt'] for r in c]
prompt_template = RetrievalPrompt('my-prompt', select=q, postprocess=Code.from_object(get_output))
llm = OpenAIChatCompletion('gpt-3.5-turbo')
seq = SequentialModel('rag', models=[prompt_template, llm])
db.apply(seq)
Outputs
Now we can test the SequentialModel
with a sample question:
seq.predict('Tell be about vector-indexes')
Outputs
Did you know you can use any tools from the Python ecosystem with superduperdb
.
That includes langchain
and llamaindex
which can be very useful for RAG applications.
from superduperdb import Application
app = Application('rag-app', components=[vector_index, seq, plugin_1, plugin_2])
Outputs
app.encode()
Outputs
app.export('rag-app')
Outputs
!cat rag-app/requirements.txt
Outputs
from superduperdb import *
app = Component.read('rag-app')
Outputs
/Users/dodo/.pyenv/versions/3.11.7/envs/superduperdb-3.11/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning:
resume_download
is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, useforce_download=True
. warnings.warn(
app.info()
Outputs
2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.vector_index.VectorIndex'> with identifier: my-index 2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.listener.Listener'> with identifier: my-listener 2024-Jun-17 09:42:33.43| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.sentence_transformers.model.SentenceTransformer'> with identifier: my-embedding 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.datatype.DataType'> with identifier: my-vec 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.base.code.Code'> with identifier: postprocess 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.backends.mongodb.query.MongoQuery'> with identifier: docu-find 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.components.model.SequentialModel'> with identifier: rag 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.llm.prompter.RetrievalPrompt'> with identifier: my-prompt 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.base.code.Code'> with identifier: get_output 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.backends.mongodb.query.MongoQuery'> with identifier: docu-like-txt-var-prompt-vector-index-my-index-n-5-find-limit-10 2024-Jun-17 09:42:33.44| INFO | Duncans-MBP.fritz.box| superduperdb.base.document:362 | Building leaf <class 'superduperdb.ext.openai.model.OpenAIChatCompletion'> with identifier: gpt-3.5-turbo
[1;32mâ•â”€[0m[1;32m───────────────────────────────────────────────────[0m[1;32m rag-app [0m[1;32m───────────────────────────────────────────────────[0m[1;32m─╮[0m [1;32m│[0m [35midentifier[0m: [34mrag-app[0m [1;32m│[0m [1;32m│[0m [35muuid[0m: [34m9115f5ec-5575-4a11-8678-664f3904bab7[0m [1;32m│[0m [1;32m│[0m [35mcomponents[0m: [34m[VectorIndex(identifier='my-index', uuid='650db68c-8786-4204-bc2d-6cc4f1d2511c', [0m [1;32m│[0m [1;32m│[0m [34mindexing_listener=Listener(identifier='my-listener', uuid='02f5b3d4-7a0a-48d8-990c-bdae29424038', key='txt', [0m [1;32m│[0m [1;32m│[0m [34mmodel=SentenceTransformer(preferred_devices=('cuda', 'mps', 'cpu'), device='cpu', identifier='my-embedding', [0m [1;32m│[0m [1;32m│[0m [34muuid='b1351454-3714-4c57-bacf-2f2a667d5fdc', signature='*args,**kwargs', datatype=DataType(identifier='my-vec',[0m [1;32m│[0m [1;32m│[0m [34muuid='ecfbe6d5-5c1f-4b80-b224-aaf0a1f3ee1d', encoder=None, decoder=None, info=None, shape=(384,), [0m [1;32m│[0m [1;32m│[0m [34mdirectory=None, encodable='native', bytes_encoding=<BytesEncoding.BYTES: 'Bytes'>, intermediate_type='bytes', [0m [1;32m│[0m [1;32m│[0m [34mmedia_type=None), output_schema=None, flatten=False, model_update_kwargs={}, [0m [1;32m│[0m [1;32m│[0m [34mpredict_kwargs={'show_progress_bar': True}, compute_kwargs={}, validation=None, metric_values={}, [0m [1;32m│[0m [1;32m│[0m [34mnum_workers=0, object=SentenceTransformer([0m [1;32m│[0m [1;32m│[0m [34m (0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: BertModel [0m [1;32m│[0m [1;32m│[0m [34m (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': [0m [1;32m│[0m [1;32m│[0m [34mTrue, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, [0m [1;32m│[0m [1;32m│[0m [34m'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})[0m [1;32m│[0m [1;32m│[0m [34m (2): Normalize()[0m [1;32m│[0m [1;32m│[0m [34m), model='all-MiniLM-L6-v2', preprocess=None, postprocess=Code(identifier='postprocess', [0m [1;32m│[0m [1;32m│[0m [34muuid='fadfa78c-4c6b-4914-885a-e1372da93078', code='from superduperdb import code\n\n@code\ndef [0m [1;32m│[0m [1;32m│[0m [34mpostprocess(x):\n return x.tolist()\n')), select=docu.find(), active=True, predict_kwargs={'max_chunk_size':[0m [1;32m│[0m [1;32m│[0m [34m50}), compatible_listener=None, measure=<VectorIndexMeasureType.cosine: 'cosine'>, metric_values={}), [0m [1;32m│[0m [1;32m│[0m [34mSequentialModel(identifier='rag', uuid='fa46eb15-112c-496f-965f-c935494825c5', signature='**kwargs', [0m [1;32m│[0m [1;32m│[0m [34mdatatype=None, output_schema=None, flatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={},[0m [1;32m│[0m [1;32m│[0m [34mvalidation=None, metric_values={}, num_workers=0, models=[RetrievalPrompt(identifier='my-prompt', [0m [1;32m│[0m [1;32m│[0m [34muuid='ded3b9b8-828d-41a4-bc37-02217fe0bc08', signature='**kwargs', datatype=None, output_schema=None, [0m [1;32m│[0m [1;32m│[0m [34mflatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, validation=None, metric_values={},[0m [1;32m│[0m [1;32m│[0m [34mnum_workers=0, preprocess=None, postprocess=Code(identifier='get_output', [0m [1;32m│[0m [1;32m│[0m [34muuid='c1d6fb70-b6c7-42b4-8872-8bfd243ddf07', code="from superduperdb import code\n\n@code\ndef get_output(c):\n[0m [1;32m│[0m [1;32m│[0m [34mreturn [r['txt'] for r in c]\n"), select=docu.like({'txt': '<var:prompt>'}, vector_index="my-index", [0m [1;32m│[0m [1;32m│[0m [34mn=5).find().limit(10), prompt_explanation="HERE ARE SOME FACTS SEPARATED BY '---' IN OUR DATA REPOSITORY WHICH [0m [1;32m│[0m [1;32m│[0m [34mWILL HELP YOU ANSWER THE QUESTION.", prompt_introduction='HERE IS THE QUESTION WHICH YOU SHOULD ANSWER BASED [0m [1;32m│[0m [1;32m│[0m [34mONLY ON THE PREVIOUS FACTS:', join='\n---\n'), OpenAIChatCompletion(identifier='gpt-3.5-turbo', [0m [1;32m│[0m [1;32m│[0m [34muuid='bc04fcdf-3217-4cb7-9517-38fc632fc8f7', signature='singleton', datatype=None, output_schema=None, [0m [1;32m│[0m [1;32m│[0m [34mflatten=False, model_update_kwargs={}, predict_kwargs={}, compute_kwargs={}, validation=None, metric_values={},[0m [1;32m│[0m [1;32m│[0m [34mnum_workers=0, model='gpt-3.5-turbo', max_batch_size=8, openai_api_key=None, openai_api_base=None, [0m [1;32m│[0m [1;32m│[0m [34mclient_kwargs={}, batch_size=1, prompt='')])][0m [1;32m│[0m [1;32m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯[0m [34mâ•â”€[0m[34m─────────────────────────────────────────────[0m[34m Component Metadata [0m[34m──────────────────────────────────────────────[0m[34m─╮[0m [34m│[0m [33mVariables[0m [34m│[0m [34m│[0m [35mprompt[0m [34m│[0m [34m│[0m [34m│[0m [34m│[0m [34m│[0m [34m│[0m [33mLeaves[0m [34m│[0m [34m╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯[0m