How Similarity Search Works - Store and Search Vectors | Build a RAG App — Chat with Your PDFs

From vectors to search

In the previous lesson, you turned every PDF chunk into a vector. Each embedding captures the meaning of that chunk as a list of numbers. Now you need a way to find which chunk vectors are closest to the user's question.

That means you need to measure how similar two vectors are.

Why direction matters

Cosine similarity measures how similar two vectors are on a scale from -1 to 1:

1.0 — identical direction (very similar meaning)
0.0 — perpendicular (unrelated meaning)
-1.0 — opposite directions (contradictory meaning)

Why compare direction instead of distance? A long document produces a longer vector than a short document, even when both discuss the same topic. Direction strips out that length difference and focuses on meaning alone. Two chunks about "neural networks" will point in a similar direction regardless of their word count.

The building blocks

The cosine similarity formula uses two operations:

Dot product — multiply matching elements of two vectors, then sum the results. This tells you how much two vectors "agree" in each dimension.
Norm — the length (magnitude) of a vector. You divide by the norms to cancel out differences in vector length.

The formula

cosine_similarity(A, B) = (A · B) / (|A| × |B|)

A · B is the dot product of vectors A and B
|A| is the norm of vector A
|B| is the norm of vector B

With numpy, this takes two lines:

dot = numpy.dot(vec_a, vec_b)
similarity = dot / (numpy.linalg.norm(vec_a) * numpy.linalg.norm(vec_b))

How search uses this

To answer a question:

Embed the question to get a query vector.
Compute cosine similarity between the query vector and every chunk vector.
Sort chunks by score, highest first.
Return the top k chunks.

Those top chunks become the context you pass to the language model.