Interview Questions

Answers

HNSW: A graph-based index. It is extremely fast for queries and offers high recall but has a high memory footprint because it stores a complex graph structure in RAM. It’s best for "always-on" low-latency apps.
IVF: A cluster-based index. It divides vectors into "buckets." It is more memory-efficient and faster to build but usually has slightly higher latency and lower recall than HNSW because it only searches the most relevant clusters.

Pre-filtering: Filters the data before the vector search. It ensures 100% accurate filtering but can be slow if the filter is "weak" (e.g., filtering for a common tag), as the index might not be optimized for that combination.
Post-filtering: Performs the vector search first, then removes results that don't match the filter. This is fast but risky; if your search returns 10 results and all 10 are filtered out, the user gets zero results even if matching data exists deeper in the database.

Scalar Quantization (SQ): Rounds floating-point numbers (e.g., 32-bit to 8-bit). It reduces memory by 4x with minimal accuracy loss.
Product Quantization (PQ): Breaks vectors into chunks and "compresses" each chunk into a codebook value. It can reduce memory by 10x-50x, but it significantly impacts precision. Use PQ for billion-scale datasets where RAM cost is the primary bottleneck.

Metadata Tagging: Store all users in one index and filter by user_id. Simple to manage but carries a risk of data leakage if a bug occurs in the filter logic.
Namespacing/Partitioning: Uses logical partitions within the DB. Offers better performance isolation.
Collection-per-User: Physical isolation. Most secure, but doesn't scale well if you have 100,000+ small users, as each collection has its own overhead.

You cannot "convert" old embeddings to a new model's dimensions. You must:

Dual-Write: Start writing new incoming data to both the old index and a new, empty index.
Backfill: Run a background job to re-embed historical data and upsert it into the new index.
Shadow Testing: Run queries against both and compare results.
Cutover: Once the new index is ready and validated, flip the API traffic to the new index and decommission the old one.