Swiftide 0.8 - Sparse vector and hybrid search, Ollama support, code outlines, and more

Introducing Swiftide 0.8! This is a major release with a lot of new features and improvements. Sparse vectors, Ollama, pipeline defaults and more.

Before diving into it, we have create a discord channel. If you have questions, feedback, want to show off or simply get in touch, this is where to find us.

To get started with Swiftide, head over to swiftide.rs or check us out on github.

Sparse vector support with Splade for hybrid search

Expanding search beyond simple dense vectors is quickly becoming the norm in RAG. Dense search is great at retrieving semantically similar content, but falls short when searching for specific symbols.

One way to solve this is by embedding data with sparse vectors. Similarity search with sparse vectors is more akin to an exact symbol search. Qdrant fully supports them and even has hybrid (and recently fusion) search support. When doing a hybrid search, qdrant will retrieve for both a dense and sparse vector, then rerank the documents and return the top matching documents.

Sparse vectors are represented differently from dense vectors. They have a list of indices into the used dictionary, and a list of values with weights for each of those indices.

fastembed-rs now supports SPLADE, which is by default used for sparse embeddings. When more are added, these should be supported by default in Swiftide.

Note that SPLADE is a fairly large model and you might need some bang when testing.

You can index with sparse and dense vectors like this:

    let fastembed_sparse = FastEmbed::try_default_sparse()
        .unwrap()
        .with_batch_size(batch_size)
        .to_owned();
    let fastembed = FastEmbed::try_default()
        .unwrap()
        .with_batch_size(batch_size)
        .to_owned();

    indexing::Pipeline::from_loader(FileLoader::new("swiftide-core/").with_extensions(&["rs"]))
        .then_chunk(ChunkCode::try_for_language_and_chunk_size(
            "rust",
            10..2048,
        )?)
        .then_in_batch(
            batch_size,
            transformers::SparseEmbed::new(fastembed_sparse.clone()),
        )
        .then_in_batch(batch_size, transformers::Embed::new(fastembed.clone()))
        .then_store_with(
            Qdrant::builder()
                .batch_size(batch_size)
                .vector_size(384)
                .with_vector(EmbeddedField::Combined)
                .with_sparse_vector(EmbeddedField::Combined)
                .collection_name("swiftide-hybrid")
                .build()?,
        )
        .run()
        .await?;

And you can use Qdrant directly for the hybrid search:

    let search_response = qdrant
        .query(
            QueryPointsBuilder::new("swiftide-hybrid")
                .with_payload(true)
                .add_prefetch(
                    PrefetchQueryBuilder::default()
                        .query(Query::new_nearest(VectorInput::new_sparse(
                            sparse.indices,
                            sparse.values,
                        )))
                        .using("Combined_sparse")
                        .limit(20u64),
                )
                .add_prefetch(
                    PrefetchQueryBuilder::default()
                        .query(Query::new_nearest(dense))
                        .using("Combined")
                        .limit(20u64),
                )
                .query(Query::new_fusion(Fusion::Rrf)),
        )
        .await
        .unwrap();

Hybrid search will soon be implemented for the query pipeline as well!

Ollama support

With the release of llama 3.1, we really had to push the support. It’s an impressive model and we’re happy to support it. You can now use ollama in Swiftide:

let ollama_client = integrations::ollama::Ollama::default()
  .with_default_prompt_model("llama3.1")
  .to_owned();


pipeline.then(MetadataQAText::new(ollama_client))
...

Pipeline defaults

When using many LLM based transformers, the pipeline could get rather repetitive with client.clone(). We’ve added a default that all our provided transformers support, and can be used as follows:

pipeline.with_default_llm_client(ollama_client)
  .then(MetadataQAText::default())
  .then(MetadataSummary::default())
  .

Code outline transformers

We are experimenting with different ways of using tree-sitter more effectively when dealing with code. We’ve introduced two transformers, OutlineCodeTree and CompressCodeOutline.

When chunking code, important information like definitions and dependencies in the parent can get lost. The OutlineCodeTree outlines the code, similar to a symbol list in an editor, and adds it to the metadata. The CompressCodeOutline transformer tries to use an LLM to compress the outline so that there’s only relevant symbols for the chunk.

Indexing transformers boilerplate macro

With more and more transformers for indexing, maintaining them became a hurdle. For the pipeline defaults we had to add an additional trait (with a default NOOP implementation), and we figured, with all the code duplication across transformers, this is where an attribute macro can shine. Fundamentally a Transformer is still a single trait with a single method, but for our own API we want to provide a builder, default prompts, and various helpers to make things easier.

Feel free to use the swiftide_macros::indexing_transformer attribute macro and try out it! We’d love to hear your feedback.

What’s next?

A scientific, data driven approach is key to building a great pipeline. Rust is great at performance, but Python shines for data analysis. We are working with Ragas to make that a reality.

Call for contributors

There is a large list of desired features, and many more unlisted over at our issues page; ranging from great starter issues, to fun, complex challenges.

You can find the full changelog here.

To get started with Swiftide, head over to swiftide.rs or check us out on github.