Transformer-Based Retrieval

Supervisor: Dr. Jian Zhu · UBC · Dec 2025 – Present

This project investigates how reasoning can be integrated into late-interaction retrieval models. Working from the ColBERT family of architectures, the goal is to improve retrieval quality on complex, multi-hop, and reason-requiring queries — settings where standard bi-encoder and sparse models fall short. The work involves model implementation, HPC-based training on GPU clusters, and systematic evaluation across standard retrieval benchmarks.

ColBERTSPLADENeural IRPyTorchHPC / H100BEIRMS MARCO

Presentations

Sparse-to-Dense Retrieval on BRIGHT: SPLADE Retrieval with ColBERT Reranking

Canadian AI 2026 · Responsible AI Track · Poster / 3MT

NSERC CREATE ScholarshipMay 2026

This work evaluates a sparse-to-dense retrieval pipeline on the BRIGHT benchmark, a challenging reasoning-intensive retrieval dataset. We use SPLADE as a first-stage retriever over the biology domain, followed by ColBERT as a late-interaction reranker. Our work reproduces and stress-tests this pipeline under realistic evaluation conditions, examining where sparse retrieval succeeds and where denser reranking is necessary to close the performance gap. The study contributes a reproducibility perspective on the interaction between sparse and dense retrieval in complex, knowledge-intensive tasks.

Best Recall@10

0.348

ColBERT

Best nDCG@10

0.309

SPLADE + ColBERT

Best MRR

0.419

ColBERT

Results — BRIGHT Biology · 103 Queries

Pipeline	nDCG@10	Recall@10	MRR	MAP
SPLADE First-stage only, no reranking	0.218	0.254	0.291	—
★ Two-stage pipeline SPLADE + ColBERT (sparse → dense reranker)	0.309	0.345	0.411	0.255
Dense baseline ColBERT dense only (full corpus, no filter)	0.308	0.348	0.419	0.255

Pipeline

nDCG@10

Recall@10

MRR

MAP

SPLADE

First-stage only, no reranking

0.218

0.254

0.291

—

★

Two-stage pipeline

SPLADE + ColBERT (sparse → dense reranker)

0.309

0.345

0.411

0.255

Dense baseline

ColBERT dense only (full corpus, no filter)

0.308

0.348

0.419

0.255

Best values per metric are in bold. ★ marks the proposed pipeline.

SPLADEColBERTBRIGHT BenchmarkSparse-to-DenseRerankingReproducibility