log / understanding mongodb text search capabilities and limitations
Problem / Context
MongoDB offers built-in text search capabilities for self-managed deployments, but it’s not always clear when to use it versus dedicated search engines like Elasticsearch. Understanding the trade-offs is crucial for making the right architectural decision.
Key question: Should you use MongoDB’s text search or integrate a separate search solution?
Solution / Approach
MongoDB provides text indexes that enable full-text search on string fields. While convenient, they come with significant performance costs and limitations. Understanding these helps you decide when they’re appropriate.
Code / Examples
Creating a Text Index
// Create a text index on multiple fields
db.stores.createIndex({
name: "text",
description: "text"
})
// MongoDB creates an index entry for each unique stemmed word
// in both 'name' and 'description' fields
Searching with Text Indexes
Exact String Search (case-insensitive):
// Search for exact phrase "coffee shop"
db.stores.find({
$text: {
$search: "\"coffee shop\""
}
})
// Note: Uses quotes inside the search string
// Does NOT handle stemming or stop words
Regular Text Search:
// Search for documents containing "coffee" OR "tea"
db.stores.find({
$text: {
$search: "coffee tea"
}
})
// Search for "coffee" AND "tea"
db.stores.find({
$text: {
$search: "coffee tea",
$caseSensitive: false
}
})
Performance Characteristics
Storage Requirements:
- Text indexes create one entry per unique post-stemmed word
- For each indexed field
- For each document
- Result: Can consume massive amounts of RAM
Example calculation:
Document: {
name: "The Coffee Shop",
description: "We serve the best coffee in town"
}
Words indexed from 'name': ["coffee", "shop"] (2 entries)
Words indexed from 'description': ["serve", "best", "coffee", "town"] (4 entries)
Total index entries: 6 per document
With 100k documents × 6 entries = 600k index entries
Write Performance Impact:
// Every insert/update requires:
// 1. Tokenize text
// 2. Apply stemming
// 3. Create index entries for each unique word
// 4. Update the index structure
// This makes writes MUCH slower than regular indexes
Important Limitations
1. No Multi-Word Proximity: Text indexes don’t store word positions, so you can’t search for “words within 5 words of each other”.
2. No Partial Word Matching:
// This WON'T work:
db.stores.find({ $text: { $search: "coff*" } }) // ❌ No wildcards
// You need the full word:
db.stores.find({ $text: { $search: "coffee" } }) // ✅ Works
3. RAM Requirements: Queries with multiple words run faster when the entire collection fits in RAM.
4. File Descriptor Limits: Building large text indexes requires high file descriptor limits. Check with:
ulimit -n
# Should be at least 64000 for large text indexes
When to Use MongoDB Text Search
✅ Good use cases:
- Simple search on small-to-medium datasets (<100k docs)
- Exact phrase matching on product names, titles
- Internal tools where search is not the primary feature
- Prototyping before committing to Elasticsearch
❌ Bad use cases:
- Large datasets (millions of documents)
- High write throughput applications
- Advanced search features (fuzzy matching, proximity, facets)
- User-facing search in production apps
- Real-time autocomplete
Alternative: MongoDB Atlas Search
If you’re on Atlas (managed MongoDB), consider Atlas Search instead:
- Built on Apache Lucene (same as Elasticsearch)
- Better performance and features
- Integrates seamlessly with MongoDB
- Supports fuzzy matching, autocomplete, faceting
References
- MongoDB Text Search Documentation
- Text Index Storage Requirements
- MongoDB Atlas Search (for managed deployments)