Swiftide 0.12 - Hybrid Search, search filters, parquet loader, and a giant speed bump

Published: at by Timon Vonk

Introducing a huge update for Swiftide, 0.12, with hybrid search for Qdrant, filters for similarity search, a parquet loader, a massive performance boost, and many other improvements.

Trumpets and a big thanks to @ephraimkunz for his first contribution!

Swiftide is a Rust native library for building LLM applications. Large language models are amazing, but need context to solve real problems. Swiftide allows you to ingest, transform and index large amounts of data fast, and then query that data so it it can be injected into prompts. This process is called Retrieval Augmented Generation.

To get started with Swiftide, head over to swiftide.rs, check us out on github, or hit us up on discord.

Hybrid search support

Retrieving the most relevant information for a given query is the key challenge in Retrieval Augmented Generation. Research and our own experience shows that similarity search on vectors is not enough. The idea behind hybrid search is fairly simple:

There are two broad ways to go about this, either use multiple data stores, or use a database that can do both.

Qdrant supports hybrid search with sparse vectors. They recently reworked their implementation and Swiftide now fully supports it.

To use hybrid search in Qdrant, both the indexed data and the query need sparse embeddings. You can then build a query pipeline with the HybridSearch strategy.

It can be implemented as follows:

let fastembed = FastEmbed::try_default()?;
let fastembed_sparse = FastEmbed::try_default_sparse()?;
// Ensure Qdrant has vectors configured for both dense and sparse
// In this case we're working with combined vectors (chunk + any metadata)
let qdrant = Qdrant::builder()
.batch_size(batch_size)
.vector_size(384)
.with_vector(EmbeddedField::Combined)
.with_sparse_vector(EmbeddedField::Combined)
.collection_name("swiftide-hybrid-example")
.build()?;
// Then add sparse embeddings for indexing:
// <snip> rest of pipeline
indexing_pipeline.then_in_batch(
256,
transformers::SparseEmbed::new(fastembed_sparse)
)
// And set up the query pipeline with hybrid search and sparse embeddings
let query_pipeline = query::Pipeline::from_search_strategy(
// By default it uses the Combined fields, no need to configure, with a top_k of 10 and a top_n of 10
HybridSearch::default()
)
// Generate sub questions on the initial query to increase our query coverage
.then_transform_query(query_transformers::GenerateSubquestions::from_client(
openai.clone(),
))
// Generate the same embeddings we used for indexing
.then_transform_query(query_transformers::Embed::from_client(fastembed.clone()))
.then_transform_query(query_transformers::SparseEmbed::from_client(
fastembed_sparse.clone(),
))
.then_retrieve(qdrant.clone())
// Answer with Simple, which either takes the documents as is (in this case), or any transformations applied
// after querying
.then_answer(answers::Simple::from_client(openai.clone()));

The full example is available on github.

Unfortunately, Lancedb does not support hybrid search yet in their Rust client. Shoot us a message on discord if your solution needs more elaborate search, and we’re happy to see what is possible.

Search filters

Both lancedb and Qdrant now support search filters in their native client language with SimilaritySingleEmbedding search. This enables the full api for both without the need to wrap.

Filters are set in the SearchStrategy when creating the query pipeline.

For example in Qdrant:

// Given we have a field "filter" on our data (ie from indexed metadata, which in qdrant is stored by default)
let search_strategy = SimilaritySingleEmbedding::from_filter(qdrant::Filter::must([
qdrant::Condition::matches("filter", "true".to_string()),
]));
// Then build the pipeline from the strategy like this:
query::Pipeline::from_search_strategy(search_strategy)

Lancedb filters with strings in a sql like format. Unlike Qdrant, lancedb needs the fields it indexes configured when indexing. Once that works, the same filter query looks like this:

let search_strategy =
SimilaritySingleEmbedding::from_filter("filter = \"true\"".to_string());
query::Pipeline::from_search_strategy(search_strategy)

Parquet loader

Swiftide can now load parquet files. Parquet is quickly becoming the defacto standard for ML datasets. This enables experimenting with datasets from HuggingFace, and using your own datasets. Only plain text columns are implemented.

For example:

let loader = Parquet::builder()
.path(path)
.column_name(column)
.build()?;
indexing::Pipeline::from_loader(loader)

The loader fully streams the content of the parquet file.

Massive performance boost

Well, this is a bit embarrassing and very exciting at the same time. Concurrency was not working fully in streaming pipelines if the future did not yield.

With this fixed, testing large datasets both local resource bound (with fastembed and local modals) and io bound (with openai), we see a 30% up to a 50% improvement in overall performance. Tests were done on a MacBook M3 Pro. The benchmark indexes around 10k chunks, with concurrency set to the number of available CPUs. Of course, with the IO bound we could go a lot higher in concurrency.

Openai

CommandMean [s]Min [s]Max [s]Relative
swiftide-0.124.2024.2024.2021.00
swiftide-0.116.3526.3526.3521.51

FastEmbed

CommandMean [s]Min [s]Max [s]Relative
swiftide-0.1230.385 ± 0.79529.50531.0511.00
swiftide-0.1141.290 ± 0.12741.16141.4151.36 ± 0.04

Other notable improvements

Since our last release post on 0.9, more notable improvements:

Call for contributors

There is a large list of desired features, and many more unlisted over at our issues page; ranging from great starter issues, to fun, complex challenges.


You can find the full changelog here.

To get started with Swiftide, head over to swiftide.rs or check us out on github.