Swiftide enables you to build indexing pipelines in a modular fashion, allowing for experimentation and blazing fast, production ready performance for Retrieval Augmented Generation (RAG).
Rust is great at performance and reliability, but for data analytics Python with Jupyter notebooks is king.
Ragas is a framework that helps you evaluate your Retrieval Augmented Generation (RAG) pipelines. RAG denotes a class of LLM applications that use external data to augment the LLM’s context. Evaluating it and quantifying your pipeline performance can be hard. This is where Ragas (RAG Assessment) comes in.
In this article we will explore how to index and query code, experiment with different features, and evaluate the results.
We only provide snippets for brevity. You can find the full code on github. For the same reason, refer to other posts or our documentation for setting up Swiftide and Python with Jupyter.
To learn more about Swiftide, head over to swiftide.rs or check us out on github
Determining features we want to evaluate
Ragas offers metrics tailored for evaluating each step of the RAG pipeline.
As an example, we want to evaluate what the impact of chunking and synthetic questions are on the performance of the pipeline.
For both of these features enabled individually, together and with neither, we will generate
an evaluation for Ragas. We will run the pipelines on the Swiftide codebase and evaluate the results in a python notebook.
In a production setting, you would like evaluate on a much larger dataset, with features that are more complex and have a larger impact.
Laying out the project
We will be building two parts. On the Rust side we will build a query and indexing pipeline that can toggle different features so we can evaluate it. On the Python side we will create a notebook that takes this output to evaluate and plot the results with Ragas.
The Rust part
For this example, we will set up a project with the following features:
A code and markdown indexing pipeline that optionally:
Chunks into smaller parts
Adds metadata with synthetic questions
A query pipeline
Generates subquestions to increase the semantic coverage
Retrieves documents to generate an answer
Clap for command line arguments
A ragas evaluator that exports to the ragas format
Setting it up
First, let’s create the crate with all the dependencies we need:
Next, let’s set up a main function, with clap, to kick it off:
Indexing the repository
Next the fun part. In the Cargo.toml, add a chunk and metadata feature, and include them in default:
Now we can implement the index_all function. It loads files from the given path, markdown or code, splits the stream by the extension, then conditionally chunks and adds metadata to the nodes. It then merges the stream, batch embeds them with openai and stores it into Qdrant.
Querying the data
We also need to provide a query pipeline so we can query the data we indexed. This is also where the evaluator will jump in. Ragas primarily uses questions, answers, retrieved documents and ground truth (if provided) as its source for evaluation.
In Swiftide, an evaluator can be hooked into the query pipeline. Additionally, we provide a way to record the answers as ground truth to include it in our export as a baseline.
You can find the full code for this example on github.
Setting up a Python notebook
Now it’s time to do some experimentation with a notebook. Make sure you have Python setup, either globally or using venv, poetry or uv, with Jupyter installed. You will also need ragas, datasets, pandas, seaborn and matplotlib.
Then run jupyter notebook to create a new notebook and open it. In the examples repository, you can find a questions.json with a large amount of synthetic questions for the Swiftide project. You can also use the example code there to generate your own.
Generating our data
First, we will generate our ground truths with all features enabled, some questions, and exporting the results to base.json.
Run with all features and use the answers as the ground truths
Next we let’s run it for each feature, use the base.json as input, and export to separate json files:
Run with chunking enabled and QA metadata disabled
Run with chunking disabled
Run with chunking and metadata disabled
Loading the data
We need to merge the separate json files into a single dataset.
Load all the data using datasets:
Evaluating with Ragas
For each dataset we can now run the evaluation. After that we combine it into a single pandas dataframe so we can explore the evaluation and visualize it.
Finally, some graphs
Finally some graphs. Let’s generate a bar chart and heatmap on the mean of each feature:
And you will have some spiffy graphs!
Conclusion
It’s really interesting to see that synthetic question generation has a large impact on the performance of the pipeline. I suspect that because Swiftide has compact files and is not a massive codebase, chunking has less of an impact. In our own future experiments, we will be looking at more complex features, like custom prompt tuning, hybrid search, and more.
Ragas makes evaluating a RAG application straightforward and enables rapid iteration on blazing fast RAG applications build with Swiftide.
Check out the Ragas documentation to see what else they have to offer! The full Rust code and python note book is on github.
To learn more about Swiftide, head over to swiftide.rs or check us out on github