Open In App

Complete tutorial on HyperLogLog in redis

Last Updated : 04 Sep, 2023
Improve
Improve
Like Article
Like
Save
Share
Report

Redis HyperLogLog is a powerful probabilistic data structure used for approximating the cardinality of a set. It efficiently estimates the number of unique elements in a large dataset, making it ideal for applications where memory efficiency and speed are crucial. In this article, we will explore what Redis HyperLogLog is, its syntax, and commands, and provide examples of how to use it in real-world scenarios.

What is Redis HyperLogLog?

Redis-hyperloglog

The Redis HyperLogLog algorithm effectively calculates the number of unique elements in a set without having to explicitly store each element. Unlike traditional data structures that require memory proportional to the number of elements in the set, Due to its fixed memory usage, HyperLogLog is extremely memory-efficient for huge datasets.. The trade-off is that it provides an approximate count of unique elements with an acceptable error rate, which is usually within 1-2% of the actual count.

How Does Redis HyperLogLog Work?

  • HyperLogLog works based on the observation that if we hash each element in the set and count the number of leading zeroes in the binary representation of the hash, the maximum number of leading zeroes found across all elements will give us an estimate of the cardinality. The more leading zeroes there are, the fewer distinct elements there are in the set.
  • To achieve this, Redis uses a hash function that maps elements to 64-bit integers and then counts the number of leading zeroes in the binary representation of each hash. The maximum count is used to estimate the cardinality of the set.

Syntax and Commands

Redis provides simple and intuitive commands to work with HyperLogLog:

  • PFADD key element [element ...]: Adds elements to the HyperLogLog data structure associated with the given key.
  • PFCOUNT key [key ...]: Returns the approximated cardinality of the HyperLogLog data structure associated with the given keys.
  • PFMERGE destkey sourcekey [sourcekey ...]: Merges multiple HyperLogLogs into a single one, stored in destkey.

Examples

Let’s see some examples to understand how to use Redis HyperLogLog.

1. Counting Unique Website Visitors

Suppose we have a website and want to count the number of unique visitors.

Java




// Assuming you have a Redis client connected to the server
Jedis jedis = new Jedis("localhost");
 
// Adding unique visitors to the HyperLogLog for the website
jedis.pfadd("website:visitors", "user1", "user2", "user3");
 
// Counting the approximate number of unique visitors
long uniqueVisitors = jedis.pfcount("website:visitors");
System.out.println("Approximate unique visitors: " + uniqueVisitors);


Output: Approximate unique visitors: 3

Explanation: This Java code demonstrates how to use the Jedis library to interact with a Redis server. It connects to the Redis server running onlocalhost, and adds three unique visitors (“user1”, “user2”, and “user3”) to the HyperLogLog data structure associated with the key “website: visitors” using the jedis.pfadd command. Finally, it uses the jedis.pfcount command to estimate the approximate number of unique visitors in the “website: visitors” HyperLogLog, which is 3 in this case.

2. Counting Distinct User Logins

Let’s consider a scenario where we want to count the number of distinct logins for a user.

Python




# Assuming you have a Redis client connected to the server
import redis
r = redis.StrictRedis(host='localhost', port=6379, db=0)
 
# Adding unique logins to the HyperLogLog for a user
r.execute_command("PFADD", "user:logins", "login1", "login2", "login3")
 
# Counting the approximate number of distinct logins for the user
uniqueLogins = r.execute_command("PFCOUNT", "user:logins")
print("Approximate distinct logins: ", uniqueLogins)


The provided Python code is using the redis library to interact with Redis, a data structure server. To get the output, you need to have Redis installed and running on your local machine or accessible via the provided host and port.

Assuming that Redis is running and the redis library is set up correctly, the output of the code will be:

 Approximate distinct logins:  3

This output indicates that three distinct logins, namely “login1,” “login2,” and “login3,” have been added to the HyperLogLog data structure in Redis. Just like in the previous example, the HyperLogLog data structure provides an approximate count of unique elements, which is generally very close to the true count but may not be exact.

Features and Uses of Redis HyperLogLog

Redis HyperLogLog offers several features and use cases:

  • Memory Efficiency: HyperLogLog consumes a fixed amount of memory, making it suitable for large datasets with millions of elements.
  • Approximate Cardinality: It provides an estimated count of unique elements with an acceptable error rate, making it suitable for scenarios where exact counts are not critical.
  • Big Data Analytics: HyperLogLog is widely used in big data analytics, where counting distinct elements in massive datasets is a common task.
  • Set Operations: It can be used to perform set operations like union and intersection on large sets without needing to store the entire set.
  • Log Analytics: HyperLogLog is used to analyze log data, counting unique IP addresses, user agents, or event occurrences.

Performance and Limits of Redis HyperLogLog:

Reading from “PFCOUNT” and writing to “PFADD” in the HyperLogLog are performed in O(1) time where as, merging the HyperLogLogs takes O(N) time. The HyperLogLog can estimate the cardianality of sets with up to 264 members.

Conclusion:

Redis HyperLogLog is a valuable addition to Redis’ powerful data structures. It allows you to efficiently estimate the cardinality of large datasets with minimal memory usage. With its simplicity, speed, and accuracy, Redis HyperLogLog is an essential tool for developers and data scientists dealing with big data and counting distinct elements. By leveraging Redis HyperLogLog, you can process and analyze large datasets with ease and make informed decisions based on the approximate cardinality of the data.



Previous Article
Next Article

Similar Reads

Difference between Redis Pub/sub vs Redis streams
Redis Pub/Sub (Publish/Subscribe) and Redis Streams are both features of the Redis database that enable real-time messaging and event-driven communication, but they have distinct use cases and characteristics. Important Topics for Redis Pub/Sub vs. Redis StreamsRedis Pub/Sub (Publish/Subscribe)When to Use Redis Pub/SubRedis StreamsWhen to Use Redis
4 min read
A Complete Guide to Redis Hashes
Redis Hashes are data structures that allow you to store multiple field-value pairs under a single key. Each field-value pair within a hash is referred to as a "field" and "value". Hashes are useful for representing objects or entities that have various attributes. They are memory-efficient and provide fast access to individual fields. Important to
4 min read
A Complete Guide to Redis Keys
In Redis, keys are the fundamental data structure used to identify, store, and retrieve data. Each key in Redis maps to a corresponding value, and Redis supports various data types for values like strings, hashes, lists, sets, sorted sets, and more. Redis keys are used to perform CRUD (Create, Read, Update, Delete) operations on the data. [caption
4 min read
Complete Guide to Redis Publish Subscribe
Redis Publish-Subscribe (Pub/Sub) is a messaging pattern in Redis that allows one-to-many communication. It involves publishers and subscribers, where publishers send messages (also known as "events") to channels, and subscribers receive messages from channels they are interested in. Pub/Sub is used for building real-time applications, message broa
14 min read
Complete Guide to Redis Java
Redis is an open-source, in-memory data store that is particularly useful for applications that require high-speed data retrieval and low-latency access. In Java programs, you can interact with Redis using the Jedis library, a popular Java client for Redis. Redis Installation in JavaTo use Redis with Java programs, you need to install Redis on your
5 min read
Complete Guide on Redis Data Types with Commands and Storage
Redis is an open-source, in-memory data store that supports various data types: such as String, Sets, Lists, Hashes, etc. to store data according to the needs of the user. These diverse data types make Redis versatile for caching, messaging, real-time analytics, and more. Different Data Types in Redis1. String Data Type in Redis2. Hash Data type in
9 min read
Complete Guide on Redis Strings
Redis String is a sequence of bytes that can store a sequence of bytes, including text, Object, and binary arrays. which can store a maximum of 512 megabytes in one string. Redis String can also be used like a Redis Key for mapping a string to another string. String Data Types are useful in different types of use cases like caching HTML fragments o
6 min read
Complete Guide to Redis PHP
Redis, an acronym for "Remote DIctionary Server," is an open-source, in-memory data structure store that can be used as a caching mechanism, message broker, and data store. In this article, we will delve into Redis integration with PHP, covering its installation, how to connect to a Redis server, common Redis commands, and practical examples. Impor
4 min read
Complete Guide for Redis Benchmark
Redis Benchmarks, often simply referred to as redis-benchmark, is a command-line tool that allows you to measure the performance of your Redis server. Important Topics for Redis BenchmarkSyntax of Redis BenchmarksRedis Benchmarking CommandsRedis Benchmarking ExamplesPitfalls and Misconceptions of Redis BenchmarkingFactors Impacting Redis Performanc
7 min read
Complete Guide to Redis Pipelining
Redis is unbeatable when it comes to storing and retrieving data. It is open source and functions as an in-memory data structure store that provides performance, low latency data to different applications. Among others, one feature to effectuate this is Redis Pipelining – so stated by Redis. Here we analyze the concept of Redis pipelining Important
10 min read
Article Tags :