Transformer-Based Retrieval
Supervisor: Dr. Jian Zhu · UBC · Dec 2025 – Present
This project investigates how reasoning can be integrated into late-interaction retrieval models. Working from the ColBERT family of architectures, the goal is to improve retrieval quality on complex, multi-hop, and reason-requiring queries — settings where standard bi-encoder and sparse models fall short. The work involves model implementation, HPC-based training on GPU clusters, and systematic evaluation across standard retrieval benchmarks.
Presentations
Sparse-to-Dense Retrieval on BRIGHT: SPLADE Retrieval with ColBERT Reranking
Canadian AI 2026 · Responsible AI Track · Poster / 3MT
This work evaluates a sparse-to-dense retrieval pipeline on the BRIGHT benchmark, a challenging reasoning-intensive retrieval dataset. We use SPLADE as a first-stage retriever over the biology domain, followed by ColBERT as a late-interaction reranker. Our work reproduces and stress-tests this pipeline under realistic evaluation conditions, examining where sparse retrieval succeeds and where denser reranking is necessary to close the performance gap. The study contributes a reproducibility perspective on the interaction between sparse and dense retrieval in complex, knowledge-intensive tasks.
Best Recall@10
0.348
ColBERT
Best nDCG@10
0.309
SPLADE + ColBERT
Best MRR
0.419
ColBERT
Results — BRIGHT Biology · 103 Queries
| Pipeline | nDCG@10 | Recall@10 | MRR | MAP |
|---|---|---|---|---|
SPLADE First-stage only, no reranking | 0.218 | 0.254 | 0.291 | — |
★ Two-stage pipeline SPLADE + ColBERT (sparse → dense reranker) | 0.309 | 0.345 | 0.411 | 0.255 |
Dense baseline ColBERT dense only (full corpus, no filter) | 0.308 | 0.348 | 0.419 | 0.255 |
Best values per metric are in bold. ★ marks the proposed pipeline.