System Design - Autocomplete or TypeAhead
System design : Design Autocomplete or Typeahead Suggestions for Google search
Use Case
- Suggestions for Google Search
Modules
- Autocorrect for misspelled words
- Location based results
- Personalized results
- More than one Language
Data Structure
- Distributed Trie
Important Concepts
Request Flow
- Get a prefix, look into distributed trie and return results
- Store top k prefixes on each node
Data Collection Flow
- Getting list of strings from a background process, aggregate them and update trie
- Aggregators
- Appliers
Request Flow
- User types query in search bar
- Query goes to Load Balancer
- Load balancer routes the request to one of several nodes (using some algorithm like round robin algorithm)
- Selected node looks into distributed cache (redis/memcached)
- If not found, looks into zookeeper instance for finding the Trie that is reponsible for the given prefix.
- Get the prefixes from Trie, populate the cache for future use and return the suggestions.
Optimizations
- Cache in CDN (Content Delivery Networks)
- Cache results in Local Machine