The first week of building a real RAG system, the hardest question isn't which embedding model to use or how to chunk your documents. It's where to put the vectors. Three options come up in every architecture conversation: pgvector, Pinecone, and Qdrant. Each one is a defensible choice. Each one breaks in ways the marketing pages don't mention.
I've shipped all three in production at different points. The decision isn't really about technical specs, it's about what kind of system you're building around them. Here's how the choice actually plays out for teams that have to live with it for the next two years.
What all three do the same
Before the differences, the shared ground. All three store high-dimensional vectors and support approximate nearest neighbor (ANN) search over them. Given a query vector, they return the k closest vectors based on some distance metric, usually cosine similarity for text embeddings.
The interface you'd actually use looks identical from the application side:
# retrieve top 5 most relevant chunks
results = index.query(
vector=query_embedding,
top_k=5,
filter={"category": "engineering"},
)Under that interface, each engine has very different assumptions about how you got there, how much you should pay, and what happens when your dataset grows past what you planned for. That's where the choices diverge.
pgvector: the database you already have
Postgres with the pgvector extension has become the default choice for teams that already run Postgres. There's no new infrastructure, no separate service to operate, no second set of credentials. The data lives next to the rest of your application state, and you can do JOINs between vectors and metadata tables in a single SQL query.
Pros in practice:
- You already know how to back it up, monitor it, and replicate it.
- A single database query can filter on metadata, look up vectors, and return enriched results.
- Costs are predictable. It's just Postgres, sized like Postgres.
- Works well for datasets up to a few million vectors without exotic tuning.
Cons that bite you later:
- Search latency degrades noticeably past 10 million vectors unless you build and tune
ivfflatorhnswindexes carefully. - The vector index lives inside Postgres's storage, so very large vector datasets compete with your transactional workload for IO and WAL bandwidth.
- The query planner doesn't always pick the right index for hybrid queries. You sometimes have to force index use with hints or restructure the query.
For most early-stage RAG applications, pgvector is the right starting point. The setup time is one afternoon, and you can evaluate retrieval quality against the same database your users eventually hit.
Pinecone: managed and opinionated
Pinecone is the option most teams reach for when they want a managed vector service that they don't have to operate. It's hosted-only, fully managed, and ships with autoscaling, multi-region replication, and a serverless pricing tier that charges only for what you use.
The good parts:
- Cold start to production is fast. Spin up an index, push vectors, query.
- Serverless pricing scales with usage, no idle cost.
- Hybrid search (sparse + dense) is well-supported.
- Namespaces make multi-tenant RAG setups easier than DIY.
The not-so-good parts:
- Vendor lock is real. The API is proprietary. Migrating off takes a real engineering project.
- Pricing gets confusing once you push past the free tier, especially with metadata filters and pod-based indexes.
- You can't run it locally. Local development needs a fake or a separate vector store.
Pinecone is the right answer when you want to outsource vector operations entirely and you trust the company to be around in three years. It's the wrong answer if you need to run the system on-prem, care about data leaving your VPC, or want to avoid vendor lock.
Qdrant: open source with a managed option
Qdrant sits in the middle. It's an open-source vector database written in Rust, with a managed cloud offering if you don't want to operate it yourself. The API is clean, the performance is excellent, and the project has been consistently improving for years.
Why people pick it:
- Written in Rust, fast even on commodity hardware.
- Production-tested at large scale (used by several AI search startups).
- Has a real filtering system that's more flexible than Pinecone's and doesn't require Postgres.
- The open-source version runs anywhere: laptop, on-prem, Kubernetes.
- Cloud offering is optional and per-cluster.
Where it falls short:
- Smaller community than Postgres or Pinecone.
- Fewer integrations with the rest of the data stack.
- Cloud pricing is cheaper than Pinecone at low scale, comparable at high scale.
Qdrant is a particularly good choice when you want the performance of a purpose-built vector database, the option to run it yourself, and a managed path available when you don't want to operate it. For production AI engineering work where you're handling hundreds of millions of vectors, it's the most defensible open-source option.
When each option is genuinely the right answer
Pick pgvector if:
- You already have Postgres for your application data.
- You're under a few million vectors and don't expect to outgrow that soon.
- Your team knows Postgres well and adding another database would be painful.
- You want a single source of truth and strong consistency with metadata.
Pick Pinecone if:
- You don't want to operate another piece of infrastructure.
- You trust vendor lock for now and can architect around it.
- Your workload is spiky or unpredictable and you want serverless pricing.
- You're shipping a product, not building infrastructure.
Pick Qdrant if:
- You want the option to self-host but might go managed later.
- You have a large vector workload and need tight performance control.
- You want filtering power that goes beyond what's natural in SQL or in Pinecone.
- Your team has Rust or systems expertise.
A useful rule of thumb: if you've already chosen Postgres, start with pgvector. If you don't have an opinion yet, start with Qdrant. If you want to delegate the whole category to a vendor and move on, Pinecone.
Things the comparison pages don't tell you
Three operational gotchas that catch teams late in a project:
Cold start latency. Pinecone and Qdrant both warm up to sub-100ms p99 after a few minutes of traffic. pgvector's first query after an idle period can take much longer, especially if the working set doesn't fit in shared buffers. Plan for this in your latency budgets.
Filter + vector query performance. All three claim to support metadata filtering alongside vector search. In practice, very selective filters (returning less than 1% of the corpus) can collapse latency across all engines. Profile the query patterns you actually expect, not the marketing ones.
Embedding model upgrades. Switching embedding models means re-embedding every vector. Pinecone and Qdrant both support namespace-based approaches for zero-downtime migration. pgvector requires the same careful planning but you can do it via SQL.
The decision I'd make today
For most RAG applications in 2026, I'd start with pgvector. The reasoning is mostly operational: you already know how to back it up, secure it, and observe it. Once you outgrow pgvector, the migration path to Qdrant is well-trodden because both expose similar APIs.
For teams operating at scale from day one (handling tens of millions of vectors at high QPS), I'd start with Qdrant. The performance is real, the open-source path gives you options, and the managed offering is a safe fallback if the team can't take on ops work.
For teams whose core product is not the vector store itself, Pinecone makes sense as long as you've accepted the vendor lock as a feature rather than a bug.
The mistake most teams make is picking too late. By the time you've built a six-month RAG system around one database, switching is real work. Spend a day on the decision. The wrong choice won't sink you, but living with it for years makes the difference.



