Technical
· 15 min read

Vector Databases Compared: pgvector vs Pinecone vs Qdrant vs Weaviate

Real benchmarks, operational trade-offs, and code examples for pgvector, Pinecone, Qdrant, and Weaviate. Which vector DB to use and when.

Anil Gulecha
Anil Gulecha
Ex-HackerRank, Ex-Google
Share
Vector Databases Compared: pgvector vs Pinecone vs Qdrant vs Weaviate
TL;DR
  • pgvector is our default for most RAG systems: it runs inside Postgres, eliminates a separate managed service, and handles 2M vectors without special tuning
  • Pinecone delivers sub-20ms p95 latency at 5M+ vectors but costs 3-8x more than a Supabase Postgres instance at the same vector count
  • Qdrant outperforms all four on throughput for self-hosted setups: ~850 QPS at p95 ~8ms on 1M vectors per Qdrant's published benchmarks
  • Weaviate lasted two projects on our stack. Schema-first design, v3-to-v4 breaking changes, and GraphQL verbosity cost more time than the features were worth
  • Pick based on three things: scale, whether your team runs Postgres today, and whether you can afford a separate managed service

We’ve run all four in production client systems. Weaviate didn’t survive past the second project. Here’s the full comparison: benchmarks, operational trade-offs, code examples, and the decision matrix we actually use.

In our earlier RAG post, we covered the pgvector vs Pinecone choice at a high level. This post goes deeper across all four options, including the operational realities that the documentation doesn’t cover.

What We’re Comparing

Four databases, four different design philosophies:

  • pgvector 0.7.x: a Postgres extension. Vectors live in the same table as the rest of your data. No new service to run or authenticate against.
  • Pinecone (serverless and pod-based): purpose-built, fully managed. Zero ops overhead. Fast at scale. The bill grows quickly.
  • Qdrant 1.9.x: open-source, written in Rust. Self-host or use Qdrant Cloud. Built specifically for high-throughput vector search with good filtering support.
  • Weaviate 1.24.x: open-source with a cloud tier. Multi-modal out of the box, hybrid search built in. Complex schema model with a steeper learning curve than the docs suggest.

All four support cosine, dot-product, and L2 distance. All four support metadata filtering. All four accept embeddings from your own model. The differences show up in latency, throughput, cost, and the operational complexity you’re signing up for on day 60 of a project.

Test conditions used throughout: 1536-dimensional vectors (OpenAI text-embedding-3-small dimensions), 1M vector corpus unless otherwise noted, cosine distance. Where we cite Qdrant’s benchmarks, those use 768-dimensional vectors on their test hardware. We note this where it matters.

Raw Performance: What the Numbers Actually Say

I’ll be direct about sourcing here. Our pgvector and Pinecone numbers come from production systems. For Qdrant and Weaviate, we reference Qdrant’s published benchmarks and our own spot-check testing. Qdrant runs those benchmarks on their own hardware, so treat them as directionally correct, not a neutral comparison.

From Qdrant’s published benchmarks on 1M vectors (768 dims), HNSW index, ~99% recall:

DatabaseQPSp95 LatencyRAM Used
Qdrant (self-hosted)~850~8ms~1.4 GB
Weaviate~380~18ms~2.1 GB

From our own production measurements on 500K–1M vectors at 1536 dims:

ConfigQPSp95 Latency
pgvector HNSW (m=16, ef_construction=64)~220~48ms
pgvector HNSW (4 parallel workers)~360~58ms
pgvector IVFFlat (lists=100)~90~70ms
Pinecone Serverless (us-east-1)~340~28ms

HNSW beats IVFFlat on query latency for online systems. The trade-off: HNSW index builds are slower and more memory-intensive. At 1M vectors, an HNSW index took roughly 3-4x the memory of an IVFFlat index in our testing. On a constrained Postgres instance, IVFFlat is worth considering.

One more number: at 5M vectors, pgvector p95 latency climbs to 80-140ms depending on ef_search. Pinecone stays under 30ms on pod-based deployments. That gap drives most of our database switches at scale.

pgvector: Still Our Default

pgvector is our starting point for new RAG systems. Not because it’s the fastest, but because the operational trade-off favors it heavily for teams already running Postgres.

Setting it up

-- Requires PostgreSQL 13+ and pgvector 0.7+
-- See the pgvector HNSW docs: https://github.com/pgvector/pgvector#hnsw
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id          UUID         PRIMARY KEY DEFAULT gen_random_uuid(),
  content     TEXT         NOT NULL,
  embedding   vector(1536),
  tenant_id   UUID         NOT NULL,
  metadata    JSONB        DEFAULT '{}',
  created_at  TIMESTAMPTZ  DEFAULT NOW()
);

-- HNSW index for online queries
-- m=16 and ef_construction=64 are solid starting defaults
-- Higher ef_construction improves recall but slows index builds
CREATE INDEX idx_documents_embedding
  ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Row-level security for multi-tenancy
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;

CREATE POLICY tenant_isolation ON documents
  USING (tenant_id = current_setting('app.current_tenant')::UUID);

The query that makes it worth using

The real advantage isn’t raw speed. It’s this:

-- Vector similarity + SQL predicates in one query.
-- No application-level joins between two systems.
SELECT
  id,
  content,
  metadata,
  1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id        = $2
  AND metadata->>'category' = 'policy'
  AND created_at       > NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1::vector
LIMIT 5;

Replicating this in Pinecone requires a vector query followed by application-level filtering, or pushing the date range into metadata filters which bypass the HNSW graph. In pgvector, it’s one query. Your data is already there, your transactions work, your backups cover the vectors.

Where pgvector breaks

Past 2M vectors on a single Postgres instance, you’ll see index build times exceed 20 minutes and VACUUM operations competing with query traffic. At 5M+ vectors, you’re looking at read replicas, partitioning, or switching tools.

Also: pgvector’s metadata filtering happens as a post-filter on the candidate set, not inside the HNSW graph. On a 5M vector collection with a 10% selectivity filter, you’re scanning 500K candidates to return 50K. Qdrant handles this inside the graph traversal and it’s materially faster for selective queries.

Pinecone: Fast and Expensive

Pinecone is the obvious choice when someone says “I need a vector database” and doesn’t want to think about infrastructure. Zero ops. Good documentation. Consistent performance. The cost at scale is the real constraint.

Pricing across scales

These are estimates based on Pinecone’s published pricing. Serverless charges per read/write unit; pod-based charges hourly. The table below uses a mix of both tiers at typical usage levels:

Scalepgvector (Supabase Pro)Pinecone Estimate
500K vectors, moderate traffic$25/month~$15–25/month
1M vectors, moderate traffic$25/month~$50–80/month
5M vectors, production traffic$60–100/month~$250–400/month
10M vectors, production traffic$120–200/month~$600+/month

At 500K vectors, Pinecone is competitive. Past 2M vectors, the gap compounds. The difference between $100/month and $400/month is significant for a startup running multiple RAG systems.

The API

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create a serverless index (Pinecone v3 SDK)
pc.create_index(
    name="documents",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("documents")

# Upsert with metadata and namespace for multi-tenancy
index.upsert(
    vectors=[
        {
            "id": "doc-001",
            "values": embedding_list,      # list[float], length must match dimension
            "metadata": {
                "category": "policy",
                "created_at": "2026-03-15"
            }
        }
    ],
    namespace="tenant-acme-corp"
)

# Query scoped to a namespace
results = index.query(
    vector=query_embedding,
    top_k=10,
    filter={"category": {"$eq": "policy"}},
    namespace="tenant-acme-corp",
    include_metadata=True
)

Namespaces work cleanly for multi-tenancy. One namespace per tenant, queries are scoped to a namespace. The limitation: you can’t query across namespaces in a single API call. If you have use cases that need cross-tenant retrieval (admin views, aggregated analytics), you’re making multiple calls and merging client-side.

Where Pinecone wins

Sub-20ms p95 at 5M+ vectors, with no configuration. We’ve used Pinecone for a real-time document search feature where the client had 8M indexed documents and stated a p95 latency requirement of under 20ms. pgvector at that scale with that requirement doesn’t work without significant sharding effort. Pinecone worked out of the box.

Qdrant: The Performance-First Alternative

Qdrant is what you reach for when you want near-Pinecone latency without the Pinecone bill, and your team is comfortable running Docker. It’s written in Rust. The performance shows.

The filtering implementation is the standout feature. Qdrant indexes payload fields and runs filters inside the HNSW graph traversal. Pinecone does this too for managed indexes. pgvector doesn’t. For workloads with selective metadata filters on large collections, Qdrant is the fastest option we’ve tested.

Setup and querying

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue, Range
)

# Self-hosted: QdrantClient(url="http://localhost:6333")
# Qdrant Cloud: QdrantClient(url="https://xyz.qdrant.io", api_key="your-key")
client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)

# Index payload fields before inserting data for fast filtering
client.create_payload_index(
    collection_name="documents",
    field_name="category",
    field_schema="keyword",
)

client.create_payload_index(
    collection_name="documents",
    field_name="created_timestamp",
    field_schema="integer",
)

# Upsert with payload (Qdrant's term for metadata)
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,                      # uint64 or UUID string
            vector=embedding_list,
            payload={
                "category": "policy",
                "tenant": "acme-corp",
                "created_timestamp": 1710000000    # Unix epoch
            }
        )
    ]
)

# Search with compound payload filter
results = client.search(
    collection_name="documents",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[
            FieldCondition(key="category", match=MatchValue(value="policy")),
            FieldCondition(key="tenant", match=MatchValue(value="acme-corp")),
            FieldCondition(
                key="created_timestamp",
                range=Range(gte=1704067200)     # since 2024-01-01 00:00:00 UTC
            )
        ]
    ),
    limit=5,
    with_payload=True,
)

Self-hosted vs Qdrant Cloud

Self-hosting Qdrant is genuinely simple. One Docker container:

docker run -p 6333:6333 -p 6334:6334 \
  -v $(pwd)/qdrant_storage:/qdrant/storage \
  qdrant/qdrant:v1.9.0

That’s it. For production, add a volume mount (done above), configure memory limits, and use their Helm chart for Kubernetes with at least 2 replicas.

Qdrant Cloud starts at around $9/month for a 0.5 vCPU, 1GB RAM cluster. At 1M vectors with 1536 dimensions, you need at least the 4GB RAM tier, which is roughly $36/month. Still significantly cheaper than Pinecone at comparable throughput.

Where Qdrant wins

High-throughput self-hosted deployments with complex metadata filters. Teams comfortable with Docker or Kubernetes who don’t want to pay the managed database premium. Also: Qdrant’s hybrid search (dense + sparse vectors combined) is production-ready. If your retrieval needs BM25 + semantic search without external tooling, Qdrant’s implementation is clean and well-documented. We use it on one client project where the document corpus has terminology so domain-specific that pure semantic search misses exact-match queries.

Weaviate: What We Found, and Why We Left

Weaviate has real strengths: native multi-modal support, built-in vectorization modules, and a clean hybrid search implementation. Two of those three things didn’t matter for the projects we were running. And the operational friction cost us days.

The schema problem

Weaviate requires upfront schema definition. No schema-on-write. Every property needs a type before you ingest anything. Here’s the full setup flow:

import weaviate
import weaviate.classes.query as wq
from weaviate.classes.config import Configure, Property, DataType

client = weaviate.connect_to_local()

# Schema definition required before any data ingestion
client.collections.create(
    name="Document",
    vectorizer_config=Configure.Vectorizer.none(),   # BYO embeddings
    properties=[
        Property(name="content",    data_type=DataType.TEXT),
        Property(name="category",   data_type=DataType.TEXT),
        Property(name="tenant",     data_type=DataType.TEXT),
        Property(name="created_at", data_type=DataType.DATE),
    ],
    multi_tenancy_config=Configure.multi_tenancy(enabled=True)
)

documents = client.collections.get("Document")

# Multi-tenancy requires creating each tenant explicitly
documents.tenants.create(["acme-corp", "other-client"])

acme_docs = documents.with_tenant("acme-corp")

# Insert with explicit vector
acme_docs.data.insert(
    properties={
        "content":    "The remote work policy allows 3 days per week.",
        "category":   "policy",
        "tenant":     "acme-corp",
        "created_at": "2026-03-15T00:00:00Z",
    },
    vector=embedding_list
)

# Near-vector query with property filter
results = acme_docs.query.near_vector(
    near_vector=query_embedding,
    filters=wq.Filter.by_property("category").equal("policy"),
    limit=5,
    return_metadata=wq.MetadataQuery(certainty=True)
)

client.close()

The code above uses the v4 Python client. It’s completely different from the v3 client. Different import structure, different object model, different query API. We spent a day migrating a project when Weaviate pushed v4 as stable and the v3 compatibility shim started emitting deprecation warnings we couldn’t suppress.

That migration hit us mid-project. It didn’t kill the project, but it cost a day of engineering time that we billed, the client noticed, and we had to explain. Not a situation you want.

What broke the pattern for us

The schema-first requirement means any change to metadata structure requires a schema migration, not just inserting a new field. On projects where document metadata evolves (which is most projects), this created friction on every sprint. Other databases let you add new payload keys without ceremony.

The GraphQL query API (for the older REST interface before the v4 client abstracted it) was verbose for what should be simple operations. The v4 Python client fixes some of this, but the underlying complexity still surfaces when you need to do anything non-standard.

When Weaviate makes sense

Multi-modal RAG where text and images live in the same index. Weaviate’s CLIP module handles text-image retrieval natively, with no custom pipeline to maintain. If your product needs image-text retrieval at production scale, Weaviate is worth the complexity. Also appropriate if your team already runs a mature Weaviate deployment and knows the schema model well. For greenfield text-only RAG, it’s not the right starting point.

Operational Complexity Compared

Beyond benchmarks, day-to-day operations matter. Here’s how they compare across the things that cause problems at 2am:

FactorpgvectorPineconeQdrantWeaviate
Managed optionSupabase / AWS RDSNative (fully managed)Qdrant CloudWeaviate Cloud
Self-hosted complexityLow (Postgres)Not applicableLow (one Docker image)Medium
Backup strategyStandard Postgres backupsAutomatic, no configSnapshots APIBackup/restore API
Multi-tenancy modelRow-Level SecurityNamespacesPayload-indexed filtersNative tenant support
Schema flexibilityHigh (JSONB payload)High (open metadata)High (open payload)Low (defined upfront)
SDK stabilityStableStableStablev3 → v4 breaking
Hybrid searchVia pg_bm25 extensionSparse vector supportNative (dense + sparse)Native
Team knowledge neededSQLREST / Python SDKREST / Python SDKWeaviate-specific concepts

For teams of 3-5 engineers, pgvector and Pinecone have the lowest friction to operate. Qdrant is one docker run from production-ready. Weaviate’s schema model and SDK history add cognitive overhead that doesn’t pay off unless you’re using the features that justify it.

When Each One Wins

Four databases, four specific scenarios:

pgvector: Your stack already runs Postgres. Your corpus stays under 2M vectors for the foreseeable future. You want vector search and relational data to live in the same transaction. You care about cost and already pay for a Postgres instance. Start here — it was the right call on our most recent SQL data analyst build too.

Pinecone: You need sub-20ms p95 at 5M+ vectors and you’d rather pay the premium than manage infrastructure. Or your team has no Postgres expertise and managed-everything is a stated requirement. Also good when you need to prototype fast without worrying about index tuning.

Qdrant: You want Pinecone-level throughput without Pinecone’s cost structure. You’re comfortable with Docker or Kubernetes. Your queries involve selective metadata filters on large collections (this is where Qdrant’s in-graph filtering genuinely pulls ahead). Or you need production-ready hybrid search without adding another system.

Weaviate: Multi-modal retrieval at production scale. Text + image in one index, using Weaviate’s vectorizer modules. Everything else: pick one of the other three.

A note on AI agent systems: agents use vector stores as one tool among many in a workflow. The database choice matters less for agent tools than for dedicated RAG endpoints, because query latency is a smaller fraction of total agent execution time. pgvector works fine for agent-backed retrieval tools in most cases.

FAQ

Can I migrate from pgvector to Qdrant or Pinecone later?

Yes, but the migration takes real work. You’ll export your embeddings, reformat them for the target API, rebuild the index (expect 2-4 hours for 1M vectors), and rewrite your query logic since each database has a different filter syntax. The data migration is straightforward. The application code changes are more significant: multi-tenancy implementation, metadata filtering, and error handling all differ across the four databases. Factor migration cost into the initial decision. Don’t prototype on Pinecone if you know you’ll ship on pgvector.

Is pgvector’s latency fast enough for production use?

For most RAG applications, 45-80ms vector query latency is completely acceptable. Your LLM generation step takes 1-3 seconds. Nobody notices an 80ms database query in a 2-second response. The use cases where pgvector’s latency matters are real-time search products where vector retrieval is the final user-visible step: autocomplete, search-as-you-type, live similarity feeds. In those cases, Qdrant or Pinecone is the right call.

How do vector databases handle multi-tenancy at scale?

Each database takes a different approach. pgvector uses Postgres row-level security: one table, one index, each row has a tenant_id, and a policy restricts queries to the current tenant’s rows. Works well for dozens to a few hundred tenants. Pinecone uses namespaces: each namespace is logically isolated, and queries are scoped to one namespace per call. Qdrant uses indexed payload fields: filter on tenant as a keyword field, and the filter runs inside the HNSW traversal. Weaviate has native multi-tenancy built into its data model, requiring explicit tenant creation before ingestion. All four work; pgvector’s approach is the simplest to reason about if your team knows SQL well.

What about ChromaDB or Milvus?

We haven’t run either in production client systems, so I won’t cite benchmark numbers I don’t have. ChromaDB is excellent for prototyping: it runs in-process in Python with no external service and zero configuration. For production, it lacks the operational tooling (HA, snapshots, monitoring integrations) of the four databases in this post. Milvus is worth evaluating for very large-scale deployments (100M+ vectors), but its operational complexity exceeds Qdrant’s for typical use cases and the team knowledge investment is significant. Start with one of the four databases above.

Does the vector database choice affect RAG accuracy?

Retrieval accuracy is mostly a function of your embedding model, chunking strategy, and whether you’re using reranking. Not the database. All four implement HNSW and achieve similar recall at comparable ef_search settings. The database choice affects accuracy in one specific case: selective metadata filtering. pgvector’s post-filter approach scans more candidates before applying filters, which can miss relevant results if your filters are aggressive. Qdrant’s in-graph filtering is more precise for those workloads. For the full picture on retrieval accuracy, see our RAG in production guide.


Picking a vector database for a new RAG system? Book a 30-minute technical call and I’ll walk through the right choice for your scale, stack, and filtering requirements.

#vector database#pgvector#Pinecone#Qdrant#Weaviate#RAG#embeddings#production AI
Share

Tuesday Build Notes · 3-min read

One engineering tradeoff, every Tuesday.

From the engineers actually shipping. What we tried, what broke, what we'd do differently. Zero "5 AI trends to watch." Unsubscribe in one click.

Issue #1 lands the moment you subscribe: how we cut a client's LLM bill 60% without losing quality. The 3 model-routing rules we now use on every project.

Anil Gulecha

Written by

Anil Gulecha

Ex-HackerRank, Ex-Google

Anil reviews every architecture decision at Kalvium Labs. He's the engineer who still ships code — making technical trade-offs on RAG vs fine-tuning, model selection, and infrastructure choices. When a CTO evaluates us, Anil is the reason they trust the work.

You read the whole thing. That means you're serious about building with AI. Most people skim. You didn't. Let's talk about what you're building.

KL

Kalvium Labs

AI products for startups

You've read the thinking.
The only thing left is a conversation.

30 minutes. You describe your idea. We tell you honestly: can we prototype it in 72 hours, what would it cost, and is it worth building at all. No pitch. No deck.

Book a 30-Min Call →

Not ready to talk? Describe your idea and get a free product spec first →

What happens on the call:

1

You describe your AI product idea

5 min: vision, users, constraints

2

We ask the hard questions

10 min: what happens when the AI gets it wrong

3

We sketch a 72-hour prototype

10 min: architecture, scope, stack, cost

4

You decide if it's worth pursuing

If AI isn't the answer, we'll say so.

Chat with us