Hey guys, today let’s talk about Databases and Full Text Search engines and how to combine the two, if possible.
As we all know, most production setups have their data source in at least two tech stacks-
- A transactional database
- A Full Text based search engine.
Let’s see why they are used, how they fulfil different requirements and how we can integrate both in a single tech stack.
A transactional database is needed to perform real time live updates and queries to retrieve the latest data stored in our database. Each update / query needs to be performed well, whether it is an insert or patch update or an algorithmic computation like group computation, sum, average or joins.
To make our queries faster most databases support building indexes based on BTree using fields as keys on different levels to speed up the lookup process inside our database. These indexes are mostly used for exact matches and range queries/sorts as is the property of the BTree. An example below shows a BTree made on Key and can see that Alex and Bob are placed before Jone and Tom is placed after Ron.
But what if we want to apply fuzzy text based search on a long text field?
Enter ElasticSearch, a famous Full Text search engine which lets you query on your textual data in a very fast method. It does this by creating an inverted index. This helps us search our database’s textual fields in a fuzzy way and can return us the most appropriate matches sorted by a match “score”. For example if we want werewolves and vampires, it’ll return all documents whose description field contains the words werewolves or vampires either together or separate and how well each document matched to our “search phrase”. It can also perform spell checking and provides tokenization capabilities.
Now, ElasticSearch is built on top of Apache Lucene, which is the main engine for this inverted text index and the algorithm building these inside. ElasticSearch is just an implementation of this engine which defines it’s own standards and query language.
The problem with two data sources
The problem having two separate data stores containing almost the same data but in different technical stacks arises when you have to sync the data between both of them. You may build your own pipelines or setup a plugin to do it for you, but each method is dependent on the source database format and the selected search engine. The second problem is of cost – when you are running your production environments, there is huge costs involved in setting up both, and you may want to remove the data duplicity.
What if I tell you that both can be combined together? There are now a few databases which support Full Text Search capabilities of which I’ll be showing my favourite and most easiest one.
MongoDb Atlas Search
Assuming you know what MongoDb is and how it is different from a relational database, I’ll talk more about implementing Full Text Search in your db.
Atlas Search is built on the same Apache Lucene framework and gives us power to implement FTS directly in our transactional database/collection alongside our own native MongoDb indexes we create. This means, whenever a new document is inserted/updated, it automatically starts getting reflected in our FTS queries without any other setup/configuration. Only thing needed is to define the mapping of your index, when you create one.!
Advantages of using MongoDb Atlas Search –
- Your FTS index resides alongside your own transactional data.
- Each document has a JSON like schema, making your search index mapping more powerful by indexing nested objects for Full Text Search.
- It gives you power to index the whole BSON format in Mongo FTS.
- You have the whole MQL (the whole vast aggregation pipeline) to play with in FTS queries. This is a major hit, as you can practically do anything with your matched response by using any aggregation stage you like.
I hope you get an idea of how well your technical stack could be, now that you have combined both of them. In the next blog, I’ll be showcasing a use case demo having Full Text Search inside your transactional database, and the performance and maintenance perks of having it.