We’ve run all four in production client systems. Weaviate didn’t survive past the second project. Here’s the full comparison: benchmarks, operational trade-offs, code examples, and the decision matrix we actually use.
In our earlier RAG post, we covered the pgvector vs Pinecone choice at a high level. This post goes deeper across all four options, including the operational realities that the documentation doesn’t cover.
What We’re Comparing
Four databases, four different design philosophies:
- pgvector 0.7.x: a Postgres extension. Vectors live in the same table as the rest of your data. No new service to run or authenticate against.
- Pinecone (serverless and pod-based): purpose-built, fully managed. Zero ops overhead. Fast at scale. The bill grows quickly.
- Qdrant 1.9.x: open-source, written in Rust. Self-host or use Qdrant Cloud. Built specifically for high-throughput vector search with good filtering support.
- Weaviate 1.24.x: open-source with a cloud tier. Multi-modal out of the box, hybrid search built in. Complex schema model with a steeper learning curve than the docs suggest.
All four support cosine, dot-product, and L2 distance. All four support metadata filtering. All four accept embeddings from your own model. The differences show up in latency, throughput, cost, and the operational complexity you’re signing up for on day 60 of a project.
Test conditions used throughout: 1536-dimensional vectors (OpenAI text-embedding-3-small dimensions), 1M vector corpus unless otherwise noted, cosine distance. Where we cite Qdrant’s benchmarks, those use 768-dimensional vectors on their test hardware. We note this where it matters.
Raw Performance: What the Numbers Actually Say
I’ll be direct about sourcing here. Our pgvector and Pinecone numbers come from production systems. For Qdrant and Weaviate, we reference Qdrant’s published benchmarks and our own spot-check testing. Qdrant runs those benchmarks on their own hardware, so treat them as directionally correct, not a neutral comparison.
From Qdrant’s published benchmarks on 1M vectors (768 dims), HNSW index, ~99% recall:
| Database | QPS | p95 Latency | RAM Used |
|---|---|---|---|
| Qdrant (self-hosted) | ~850 | ~8ms | ~1.4 GB |
| Weaviate | ~380 | ~18ms | ~2.1 GB |
From our own production measurements on 500K–1M vectors at 1536 dims:
| Config | QPS | p95 Latency |
|---|---|---|
pgvector HNSW (m=16, ef_construction=64) | ~220 | ~48ms |
| pgvector HNSW (4 parallel workers) | ~360 | ~58ms |
pgvector IVFFlat (lists=100) | ~90 | ~70ms |
| Pinecone Serverless (us-east-1) | ~340 | ~28ms |
HNSW beats IVFFlat on query latency for online systems. The trade-off: HNSW index builds are slower and more memory-intensive. At 1M vectors, an HNSW index took roughly 3-4x the memory of an IVFFlat index in our testing. On a constrained Postgres instance, IVFFlat is worth considering.
One more number: at 5M vectors, pgvector p95 latency climbs to 80-140ms depending on ef_search. Pinecone stays under 30ms on pod-based deployments. That gap drives most of our database switches at scale.
pgvector: Still Our Default
pgvector is our starting point for new RAG systems. Not because it’s the fastest, but because the operational trade-off favors it heavily for teams already running Postgres.
Setting it up
-- Requires PostgreSQL 13+ and pgvector 0.7+
-- See the pgvector HNSW docs: https://github.com/pgvector/pgvector#hnsw
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
content TEXT NOT NULL,
embedding vector(1536),
tenant_id UUID NOT NULL,
metadata JSONB DEFAULT '{}',
created_at TIMESTAMPTZ DEFAULT NOW()
);
-- HNSW index for online queries
-- m=16 and ef_construction=64 are solid starting defaults
-- Higher ef_construction improves recall but slows index builds
CREATE INDEX idx_documents_embedding
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Row-level security for multi-tenancy
ALTER TABLE documents ENABLE ROW LEVEL SECURITY;
CREATE POLICY tenant_isolation ON documents
USING (tenant_id = current_setting('app.current_tenant')::UUID);
The query that makes it worth using
The real advantage isn’t raw speed. It’s this:
-- Vector similarity + SQL predicates in one query.
-- No application-level joins between two systems.
SELECT
id,
content,
metadata,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE tenant_id = $2
AND metadata->>'category' = 'policy'
AND created_at > NOW() - INTERVAL '90 days'
ORDER BY embedding <=> $1::vector
LIMIT 5;
Replicating this in Pinecone requires a vector query followed by application-level filtering, or pushing the date range into metadata filters which bypass the HNSW graph. In pgvector, it’s one query. Your data is already there, your transactions work, your backups cover the vectors.
Where pgvector breaks
Past 2M vectors on a single Postgres instance, you’ll see index build times exceed 20 minutes and VACUUM operations competing with query traffic. At 5M+ vectors, you’re looking at read replicas, partitioning, or switching tools.
Also: pgvector’s metadata filtering happens as a post-filter on the candidate set, not inside the HNSW graph. On a 5M vector collection with a 10% selectivity filter, you’re scanning 500K candidates to return 50K. Qdrant handles this inside the graph traversal and it’s materially faster for selective queries.
Pinecone: Fast and Expensive
Pinecone is the obvious choice when someone says “I need a vector database” and doesn’t want to think about infrastructure. Zero ops. Good documentation. Consistent performance. The cost at scale is the real constraint.
Pricing across scales
These are estimates based on Pinecone’s published pricing. Serverless charges per read/write unit; pod-based charges hourly. The table below uses a mix of both tiers at typical usage levels:
| Scale | pgvector (Supabase Pro) | Pinecone Estimate |
|---|---|---|
| 500K vectors, moderate traffic | $25/month | ~$15–25/month |
| 1M vectors, moderate traffic | $25/month | ~$50–80/month |
| 5M vectors, production traffic | $60–100/month | ~$250–400/month |
| 10M vectors, production traffic | $120–200/month | ~$600+/month |
At 500K vectors, Pinecone is competitive. Past 2M vectors, the gap compounds. The difference between $100/month and $400/month is significant for a startup running multiple RAG systems.
The API
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key="your-api-key")
# Create a serverless index (Pinecone v3 SDK)
pc.create_index(
name="documents",
dimension=1536,
metric="cosine",
spec=ServerlessSpec(cloud="aws", region="us-east-1")
)
index = pc.Index("documents")
# Upsert with metadata and namespace for multi-tenancy
index.upsert(
vectors=[
{
"id": "doc-001",
"values": embedding_list, # list[float], length must match dimension
"metadata": {
"category": "policy",
"created_at": "2026-03-15"
}
}
],
namespace="tenant-acme-corp"
)
# Query scoped to a namespace
results = index.query(
vector=query_embedding,
top_k=10,
filter={"category": {"$eq": "policy"}},
namespace="tenant-acme-corp",
include_metadata=True
)
Namespaces work cleanly for multi-tenancy. One namespace per tenant, queries are scoped to a namespace. The limitation: you can’t query across namespaces in a single API call. If you have use cases that need cross-tenant retrieval (admin views, aggregated analytics), you’re making multiple calls and merging client-side.
Where Pinecone wins
Sub-20ms p95 at 5M+ vectors, with no configuration. We’ve used Pinecone for a real-time document search feature where the client had 8M indexed documents and stated a p95 latency requirement of under 20ms. pgvector at that scale with that requirement doesn’t work without significant sharding effort. Pinecone worked out of the box.
Qdrant: The Performance-First Alternative
Qdrant is what you reach for when you want near-Pinecone latency without the Pinecone bill, and your team is comfortable running Docker. It’s written in Rust. The performance shows.
The filtering implementation is the standout feature. Qdrant indexes payload fields and runs filters inside the HNSW graph traversal. Pinecone does this too for managed indexes. pgvector doesn’t. For workloads with selective metadata filters on large collections, Qdrant is the fastest option we’ve tested.
Setup and querying
from qdrant_client import QdrantClient
from qdrant_client.models import (
Distance, VectorParams, PointStruct,
Filter, FieldCondition, MatchValue, Range
)
# Self-hosted: QdrantClient(url="http://localhost:6333")
# Qdrant Cloud: QdrantClient(url="https://xyz.qdrant.io", api_key="your-key")
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Index payload fields before inserting data for fast filtering
client.create_payload_index(
collection_name="documents",
field_name="category",
field_schema="keyword",
)
client.create_payload_index(
collection_name="documents",
field_name="created_timestamp",
field_schema="integer",
)
# Upsert with payload (Qdrant's term for metadata)
client.upsert(
collection_name="documents",
points=[
PointStruct(
id=1, # uint64 or UUID string
vector=embedding_list,
payload={
"category": "policy",
"tenant": "acme-corp",
"created_timestamp": 1710000000 # Unix epoch
}
)
]
)
# Search with compound payload filter
results = client.search(
collection_name="documents",
query_vector=query_embedding,
query_filter=Filter(
must=[
FieldCondition(key="category", match=MatchValue(value="policy")),
FieldCondition(key="tenant", match=MatchValue(value="acme-corp")),
FieldCondition(
key="created_timestamp",
range=Range(gte=1704067200) # since 2024-01-01 00:00:00 UTC
)
]
),
limit=5,
with_payload=True,
)
Self-hosted vs Qdrant Cloud
Self-hosting Qdrant is genuinely simple. One Docker container:
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage \
qdrant/qdrant:v1.9.0
That’s it. For production, add a volume mount (done above), configure memory limits, and use their Helm chart for Kubernetes with at least 2 replicas.
Qdrant Cloud starts at around $9/month for a 0.5 vCPU, 1GB RAM cluster. At 1M vectors with 1536 dimensions, you need at least the 4GB RAM tier, which is roughly $36/month. Still significantly cheaper than Pinecone at comparable throughput.
Where Qdrant wins
High-throughput self-hosted deployments with complex metadata filters. Teams comfortable with Docker or Kubernetes who don’t want to pay the managed database premium. Also: Qdrant’s hybrid search (dense + sparse vectors combined) is production-ready. If your retrieval needs BM25 + semantic search without external tooling, Qdrant’s implementation is clean and well-documented. We use it on one client project where the document corpus has terminology so domain-specific that pure semantic search misses exact-match queries.
Weaviate: What We Found, and Why We Left
Weaviate has real strengths: native multi-modal support, built-in vectorization modules, and a clean hybrid search implementation. Two of those three things didn’t matter for the projects we were running. And the operational friction cost us days.
The schema problem
Weaviate requires upfront schema definition. No schema-on-write. Every property needs a type before you ingest anything. Here’s the full setup flow:
import weaviate
import weaviate.classes.query as wq
from weaviate.classes.config import Configure, Property, DataType
client = weaviate.connect_to_local()
# Schema definition required before any data ingestion
client.collections.create(
name="Document",
vectorizer_config=Configure.Vectorizer.none(), # BYO embeddings
properties=[
Property(name="content", data_type=DataType.TEXT),
Property(name="category", data_type=DataType.TEXT),
Property(name="tenant", data_type=DataType.TEXT),
Property(name="created_at", data_type=DataType.DATE),
],
multi_tenancy_config=Configure.multi_tenancy(enabled=True)
)
documents = client.collections.get("Document")
# Multi-tenancy requires creating each tenant explicitly
documents.tenants.create(["acme-corp", "other-client"])
acme_docs = documents.with_tenant("acme-corp")
# Insert with explicit vector
acme_docs.data.insert(
properties={
"content": "The remote work policy allows 3 days per week.",
"category": "policy",
"tenant": "acme-corp",
"created_at": "2026-03-15T00:00:00Z",
},
vector=embedding_list
)
# Near-vector query with property filter
results = acme_docs.query.near_vector(
near_vector=query_embedding,
filters=wq.Filter.by_property("category").equal("policy"),
limit=5,
return_metadata=wq.MetadataQuery(certainty=True)
)
client.close()
The code above uses the v4 Python client. It’s completely different from the v3 client. Different import structure, different object model, different query API. We spent a day migrating a project when Weaviate pushed v4 as stable and the v3 compatibility shim started emitting deprecation warnings we couldn’t suppress.
That migration hit us mid-project. It didn’t kill the project, but it cost a day of engineering time that we billed, the client noticed, and we had to explain. Not a situation you want.
What broke the pattern for us
The schema-first requirement means any change to metadata structure requires a schema migration, not just inserting a new field. On projects where document metadata evolves (which is most projects), this created friction on every sprint. Other databases let you add new payload keys without ceremony.
The GraphQL query API (for the older REST interface before the v4 client abstracted it) was verbose for what should be simple operations. The v4 Python client fixes some of this, but the underlying complexity still surfaces when you need to do anything non-standard.
When Weaviate makes sense
Multi-modal RAG where text and images live in the same index. Weaviate’s CLIP module handles text-image retrieval natively, with no custom pipeline to maintain. If your product needs image-text retrieval at production scale, Weaviate is worth the complexity. Also appropriate if your team already runs a mature Weaviate deployment and knows the schema model well. For greenfield text-only RAG, it’s not the right starting point.
Operational Complexity Compared
Beyond benchmarks, day-to-day operations matter. Here’s how they compare across the things that cause problems at 2am:
| Factor | pgvector | Pinecone | Qdrant | Weaviate |
|---|---|---|---|---|
| Managed option | Supabase / AWS RDS | Native (fully managed) | Qdrant Cloud | Weaviate Cloud |
| Self-hosted complexity | Low (Postgres) | Not applicable | Low (one Docker image) | Medium |
| Backup strategy | Standard Postgres backups | Automatic, no config | Snapshots API | Backup/restore API |
| Multi-tenancy model | Row-Level Security | Namespaces | Payload-indexed filters | Native tenant support |
| Schema flexibility | High (JSONB payload) | High (open metadata) | High (open payload) | Low (defined upfront) |
| SDK stability | Stable | Stable | Stable | v3 → v4 breaking |
| Hybrid search | Via pg_bm25 extension | Sparse vector support | Native (dense + sparse) | Native |
| Team knowledge needed | SQL | REST / Python SDK | REST / Python SDK | Weaviate-specific concepts |
For teams of 3-5 engineers, pgvector and Pinecone have the lowest friction to operate. Qdrant is one docker run from production-ready. Weaviate’s schema model and SDK history add cognitive overhead that doesn’t pay off unless you’re using the features that justify it.
When Each One Wins
Four databases, four specific scenarios:
pgvector: Your stack already runs Postgres. Your corpus stays under 2M vectors for the foreseeable future. You want vector search and relational data to live in the same transaction. You care about cost and already pay for a Postgres instance. Start here — it was the right call on our most recent SQL data analyst build too.
Pinecone: You need sub-20ms p95 at 5M+ vectors and you’d rather pay the premium than manage infrastructure. Or your team has no Postgres expertise and managed-everything is a stated requirement. Also good when you need to prototype fast without worrying about index tuning.
Qdrant: You want Pinecone-level throughput without Pinecone’s cost structure. You’re comfortable with Docker or Kubernetes. Your queries involve selective metadata filters on large collections (this is where Qdrant’s in-graph filtering genuinely pulls ahead). Or you need production-ready hybrid search without adding another system.
Weaviate: Multi-modal retrieval at production scale. Text + image in one index, using Weaviate’s vectorizer modules. Everything else: pick one of the other three.
A note on AI agent systems: agents use vector stores as one tool among many in a workflow. The database choice matters less for agent tools than for dedicated RAG endpoints, because query latency is a smaller fraction of total agent execution time. pgvector works fine for agent-backed retrieval tools in most cases.
FAQ
Can I migrate from pgvector to Qdrant or Pinecone later?
Yes, but the migration takes real work. You’ll export your embeddings, reformat them for the target API, rebuild the index (expect 2-4 hours for 1M vectors), and rewrite your query logic since each database has a different filter syntax. The data migration is straightforward. The application code changes are more significant: multi-tenancy implementation, metadata filtering, and error handling all differ across the four databases. Factor migration cost into the initial decision. Don’t prototype on Pinecone if you know you’ll ship on pgvector.
Is pgvector’s latency fast enough for production use?
For most RAG applications, 45-80ms vector query latency is completely acceptable. Your LLM generation step takes 1-3 seconds. Nobody notices an 80ms database query in a 2-second response. The use cases where pgvector’s latency matters are real-time search products where vector retrieval is the final user-visible step: autocomplete, search-as-you-type, live similarity feeds. In those cases, Qdrant or Pinecone is the right call.
How do vector databases handle multi-tenancy at scale?
Each database takes a different approach. pgvector uses Postgres row-level security: one table, one index, each row has a tenant_id, and a policy restricts queries to the current tenant’s rows. Works well for dozens to a few hundred tenants. Pinecone uses namespaces: each namespace is logically isolated, and queries are scoped to one namespace per call. Qdrant uses indexed payload fields: filter on tenant as a keyword field, and the filter runs inside the HNSW traversal. Weaviate has native multi-tenancy built into its data model, requiring explicit tenant creation before ingestion. All four work; pgvector’s approach is the simplest to reason about if your team knows SQL well.
What about ChromaDB or Milvus?
We haven’t run either in production client systems, so I won’t cite benchmark numbers I don’t have. ChromaDB is excellent for prototyping: it runs in-process in Python with no external service and zero configuration. For production, it lacks the operational tooling (HA, snapshots, monitoring integrations) of the four databases in this post. Milvus is worth evaluating for very large-scale deployments (100M+ vectors), but its operational complexity exceeds Qdrant’s for typical use cases and the team knowledge investment is significant. Start with one of the four databases above.
Does the vector database choice affect RAG accuracy?
Retrieval accuracy is mostly a function of your embedding model, chunking strategy, and whether you’re using reranking. Not the database. All four implement HNSW and achieve similar recall at comparable ef_search settings. The database choice affects accuracy in one specific case: selective metadata filtering. pgvector’s post-filter approach scans more candidates before applying filters, which can miss relevant results if your filters are aggressive. Qdrant’s in-graph filtering is more precise for those workloads. For the full picture on retrieval accuracy, see our RAG in production guide.
Picking a vector database for a new RAG system? Book a 30-minute technical call and I’ll walk through the right choice for your scale, stack, and filtering requirements.