How Similarity Search Works
Understand cosine similarity with a plain-language explanation
From vectors to search
In the previous lesson, you turned every PDF chunk into a vector. Each embedding captures the meaning of that chunk as a list of numbers. Now you need a way to find which chunk vectors are closest to the user's question.
That means you need to measure how similar two vectors are.
Why direction matters
Cosine similarity measures how similar two vectors are on a scale from -1 to 1:
1.0— identical direction (very similar meaning)0.0— perpendicular (unrelated meaning)-1.0— opposite directions (contradictory meaning)
Why compare direction instead of distance? A long document produces a longer vector than a short document, even when both discuss the same topic. Direction strips out that length difference and focuses on meaning alone. Two chunks about "neural networks" will point in a similar direction regardless of their word count.
The building blocks
The cosine similarity formula uses two operations:
- Dot product — multiply matching elements of two vectors, then sum the results. This tells you how much two vectors "agree" in each dimension.
- Norm — the length (magnitude) of a vector. You divide by the norms to cancel out differences in vector length.
The formula
cosine_similarity(A, B) = (A · B) / (|A| × |B|)A · Bis the dot product of vectors A and B|A|is the norm of vector A|B|is the norm of vector B
With numpy, this takes two lines:
dot = numpy.dot(vec_a, vec_b)
similarity = dot / (numpy.linalg.norm(vec_a) * numpy.linalg.norm(vec_b))How search uses this
To answer a question:
- Embed the question to get a query vector.
- Compute cosine similarity between the query vector and every chunk vector.
- Sort chunks by score, highest first.
- Return the top
kchunks.
Those top chunks become the context you pass to the language model.