Background
RediSearch is a distributed full-text search and aggregation engine on the top of Redis, allows users to execute complex search queries on their Redis dataset in an extremely fast manner. The unique architecture of RediSearch (implemented as a Redis module, written in C and built from the ground up on modern and optimized data-structures) allows it to be a true alternative to other search engines in the market when using it as a stand alone search engine for indexing and retrieval searchable data.
When we first launched RediSearch we benchmarked it against popular search engines like Elasticsearch and Solr to show how powerful the engine is.
This time we decided to do a slightly different benchmark that (a) gives the users a clear reproducible setup, where all search engines under test are optimized to provide their best performance and (b) simulates multiple real life scenarios based on what we see from RediSearch users .
The Search Benchmark
In this Search benchmark we compared RediSearch to Elasticsearch over two use cases:
- Index and query the wikipedia dataset
- Fast indexing in a multi-tenant environment
Wikipedia benchmark
We first indexed 5.6 million docs (5.3GB) of the Wikipedia and then performed two-words search queries over the indexed dataset.
Indexing results
As you can see in the figure below, RediSearch completed building its index in 221 seconds versus 349 seconds of Elastricache., or 58%x faster..
Querying results
Once the dataset was indexed, we launched a two-words search queries using 32 clients running on a dedicated load-generator server. As you can see in the figure below, RediSearch throughput reached 12K ops/sec compared to 3.1K ops/sec of Elasticsearch, or x4 times faster. Furthermore, RediSearch latency was slightly better with 8msec average compared to 8msec of Elasticsearch.
Multi-tenant indexing benchmark
Here we simulated a multi-tenant e-commerce application where each tenant represents a
product category and maintains its own index. For this benchmark we built 50K indices (or products) each stored up to 500 documents (or items), and in total 25 million indices. While RediSearch completed building the indices in just 201 seconds while running on an average of 125K indices/sec, Elasticsearch crashed after 921 indices and just couldn’t cope with this load.
Benchmark setup
Hardware
Cloud Instance Type | vCPU | Mem (GiB) | Network |
---|---|---|---|
One AWS c4.8xlarge: One for the load-generator and one for the Search engine | 36 | 60 | 10 Gigabit |
Dataset source
Name | Description and Source | #docs | size |
---|---|---|---|
wikidump | Date: Feb 7, 2019 | 5.6M | 5.3 GB |
RediSearch configuration
Name | Value |
---|---|
Number of shards |
|
Doc table size | 10M |
Elasticsearch configuration
Name | Value |
---|---|
Number of shards | 5 |
JVM settings (Xms and Xmx) | 25GB |
index.refresh_interval | -1 |
index.number_of_replicas | 0 |
Indices.queries.cache.size and index.queries.cache.enabled | Like mentioned here |
Versions
Name | Value |
---|---|
RediSearch | Version 1.4.3 |
Elasticsearch | Version 6.6.0 with Lucene version 7.6.0 |
Conclusion
We benchmarked RediSearch and Elasticsearch in following use cases:
- The simple Wikipedia use case – where we found RediSearch faster by 58% on indexing and x4 when performing two-word search on the indexed dataset
- The more advanced multi-tenant use case – where RediSearch was able to complete 25 million indices in just 201 seconds or ~125K indices/sec, while Elasticsearch crashed after it indexed 921 documents, showing that it was not designed to cope with this level of load.
Elasticsearch is a great feature-rich search product from created by the great people at Elastic.co, but when it comes to performance, it has inherent architecture deficiencies comparing RediSearch as can be seen in the following table:
Component | RediSearch | Elasticsearch |
---|---|---|
Search engine | Dedicated engine based on modern and optimized data-structures | 20 years old Lucene engine |
Programing language | C-based, extremely optimized | Java |
Memory technology | Runs natively on DRAM and Persistent Memory | Disk-based with a caching option |
Protocol | The optimized RESP (REdis Serialization Protocol) | HTTP |
Read more about RediSearch here and the technology behind it. To get started with RediSearch – try our Redis Cloud Pro here or download Redis Enterprise Software here.