log / understanding mongodb text search capabilities and limitations

Problem / Context

MongoDB offers built-in text search capabilities for self-managed deployments, but it’s not always clear when to use it versus dedicated search engines like Elasticsearch. Understanding the trade-offs is crucial for making the right architectural decision.

Key question: Should you use MongoDB’s text search or integrate a separate search solution?

Solution / Approach

MongoDB provides text indexes that enable full-text search on string fields. While convenient, they come with significant performance costs and limitations. Understanding these helps you decide when they’re appropriate.

Code / Examples

Creating a Text Index

// Create a text index on multiple fields
db.stores.createIndex({
  name: "text",
  description: "text"
})

// MongoDB creates an index entry for each unique stemmed word
// in both 'name' and 'description' fields

Searching with Text Indexes

Exact String Search (case-insensitive):

// Search for exact phrase "coffee shop"
db.stores.find({
  $text: {
    $search: "\"coffee shop\""
  }
})

// Note: Uses quotes inside the search string
// Does NOT handle stemming or stop words

Regular Text Search:

// Search for documents containing "coffee" OR "tea"
db.stores.find({
  $text: {
    $search: "coffee tea"
  }
})

// Search for "coffee" AND "tea"
db.stores.find({
  $text: {
    $search: "coffee tea",
    $caseSensitive: false
  }
})

Performance Characteristics

Storage Requirements:

  • Text indexes create one entry per unique post-stemmed word
  • For each indexed field
  • For each document
  • Result: Can consume massive amounts of RAM

Example calculation:

Document: {
  name: "The Coffee Shop",
  description: "We serve the best coffee in town"
}

Words indexed from 'name': ["coffee", "shop"]  (2 entries)
Words indexed from 'description': ["serve", "best", "coffee", "town"]  (4 entries)
Total index entries: 6 per document

With 100k documents × 6 entries = 600k index entries

Write Performance Impact:

// Every insert/update requires:
// 1. Tokenize text
// 2. Apply stemming
// 3. Create index entries for each unique word
// 4. Update the index structure

// This makes writes MUCH slower than regular indexes

Important Limitations

1. No Multi-Word Proximity: Text indexes don’t store word positions, so you can’t search for “words within 5 words of each other”.

2. No Partial Word Matching:

// This WON'T work:
db.stores.find({ $text: { $search: "coff*" } })  // ❌ No wildcards

// You need the full word:
db.stores.find({ $text: { $search: "coffee" } })  // ✅ Works

3. RAM Requirements: Queries with multiple words run faster when the entire collection fits in RAM.

4. File Descriptor Limits: Building large text indexes requires high file descriptor limits. Check with:

ulimit -n
# Should be at least 64000 for large text indexes

✅ Good use cases:

  • Simple search on small-to-medium datasets (<100k docs)
  • Exact phrase matching on product names, titles
  • Internal tools where search is not the primary feature
  • Prototyping before committing to Elasticsearch

❌ Bad use cases:

  • Large datasets (millions of documents)
  • High write throughput applications
  • Advanced search features (fuzzy matching, proximity, facets)
  • User-facing search in production apps
  • Real-time autocomplete

If you’re on Atlas (managed MongoDB), consider Atlas Search instead:

  • Built on Apache Lucene (same as Elasticsearch)
  • Better performance and features
  • Integrates seamlessly with MongoDB
  • Supports fuzzy matching, autocomplete, faceting

References

Written on 2025-11-09 15:55:00 +0700 Edited on 2025-11-09 15:55:00 +0700