Search Benchmarking: RediSearch vs. Elasticsearch | Redis Labs

https://redislabs.com/blog/search-benchmarking-redisearch-vs-elasticsearch/

Background

RediSearch is a distributed full-text search and aggregation engine on the top of Redis, allows users to execute complex search queries on their Redis dataset in an extremely fast manner. The unique architecture of RediSearch (implemented as a Redis module, written in C and built from the ground up on modern and optimized data-structures) allows it to be a true alternative to other search engines in the market when using it as a stand alone search engine for indexing and retrieval searchable data.
When we first launched RediSearch we benchmarked it against popular search engines like Elasticsearch and Solr to show how powerful the engine is.
This time we decided to do a slightly different benchmark that (a) gives the users a clear reproducible setup, where all search engines under test are optimized to provide their best performance and (b) simulates multiple real life scenarios based on what we see from RediSearch users .

The Search Benchmark

In this Search benchmark we compared RediSearch to Elasticsearch over two use cases:

  1. Index and query the wikipedia dataset
  2. Fast indexing in a multi-tenant environment

Wikipedia benchmark

We first indexed 5.6 million docs (5.3GB) of the Wikipedia and then performed two-words search queries over the indexed dataset.

Indexing results

As you can see in the figure below, RediSearch completed building its index in 221 seconds versus 349 seconds of Elastricache., or 58%x faster..

Querying results

Once the dataset was indexed, we launched a two-words search queries using 32 clients running on a dedicated load-generator server. As you can see in the figure below, RediSearch throughput reached 12K ops/sec compared to 3.1K ops/sec of Elasticsearch, or x4 times faster. Furthermore, RediSearch latency was slightly better with 8msec average compared to 8msec of Elasticsearch.

Multi-tenant indexing benchmark

Here we simulated a multi-tenant e-commerce application where each tenant represents a
product category and maintains its own index. For this benchmark we built 50K indices (or products) each stored up to 500 documents (or items), and in total 25 million indices. While RediSearch completed building the indices in just 201 seconds while running on an average of 125K indices/sec, Elasticsearch crashed after 921 indices and just couldn’t cope with this load.

Benchmark setup

Hardware

Cloud Instance Type vCPU Mem (GiB) Network
One AWS c4.8xlarge: One for the load-generator and one for the Search engine 36 60 10 Gigabit

Dataset source

Name Description and Source #docs size
wikidump Date: Feb 7, 2019 5.6M 5.3 GB

RediSearch configuration

Name Value
Number of shards
  • 5 for the Wikipedia benchmark
  • 20 for the multi-tenant benchmark
Doc table size 10M

Elasticsearch configuration

Name Value
Number of shards 5
JVM settings (Xms and Xmx) 25GB
index.refresh_interval -1
index.number_of_replicas 0
Indices.queries.cache.size and index.queries.cache.enabled Like mentioned here

Versions

Name Value
RediSearch Version 1.4.3
Elasticsearch Version 6.6.0 with Lucene version 7.6.0

Conclusion

We benchmarked RediSearch and Elasticsearch in following use cases:

  1. The simple Wikipedia use case – where we found RediSearch faster by 58% on indexing and x4 when performing two-word search on the indexed dataset
  2. The more advanced multi-tenant use case – where RediSearch was able to complete 25 million indices in just 201 seconds or ~125K indices/sec, while Elasticsearch crashed after it indexed 921 documents, showing that it was not designed to cope with this level of load.

Elasticsearch is a great feature-rich search product from created by the great people at Elastic.co, but when it comes to performance, it has inherent architecture deficiencies comparing RediSearch as can be seen in the following table:

Component RediSearch Elasticsearch
Search engine Dedicated engine based on modern and optimized data-structures 20 years old Lucene engine
Programing language C-based, extremely optimized Java
Memory technology Runs natively on DRAM and Persistent Memory Disk-based with a caching option
Protocol The optimized RESP (REdis Serialization Protocol) HTTP

Read more about RediSearch here and the technology behind it. To get started with RediSearch – try our Redis Cloud Pro here or download Redis Enterprise Software here.

Leave a Reply

Your email address will not be published. Required fields are marked *

Next Post

Exclusive: Major U.S. cancer center ousts ‘Asian’ researchers after NIH flags their foreign ties

Fri Apr 19 , 2019
https://www.sciencemag.org/news/2019/04/exclusive-major-us-cancer-center-ousts-asian-researchers-after-nih-flags-their-foreign

You May Like