Open In App

Apache Kafka Serializer and Deserializer

Last Updated : 09 Sep, 2022
Improve
Improve
Like Article
Like
Save
Share
Report

Apache Kafka is a publish-subscribe messaging system. A messaging system lets you send messages between processes, applications, and servers. Broadly Speaking, Apache Kafka is software where topics (A topic might be a category) can be defined and further processed. Applications may connect to this system and transfer a message onto the topic. A message can include any kind of information, from any event on your Personal blog or can be a very simple text message that would trigger any other event. Here we will be discussing the two most important concepts of Kafka e.g Kafka Serializer and Deserializers

Kafka Serializer

To understand Kafka Serializer in detail let’s first understand the concept of Kafka Producers and Kafka Message Keys. Kafka Producers are going to write data to topics and topics are made of partitions. Now the producers in Kafka will automatically know to which broker and partition to write based on your message and in case there is a Kafka broker failure in your cluster the producers will automatically recover from it which makes Kafka resilient and which makes Kafka so good and used today. So if we look at a diagram to have the data in our topic partitions we’re going to have a producer on the left-hand side sending data into each of the partitions of our topics. 

Kafka Serializer

 

So how does a producer know how to send the data to a topic partition? For this, we can use Message Keys

Kafka Message Keys

So alongside the message value, we can choose to send a message key and that key can be whatever you want it could be a string, it could be a number whatever you want and it turns out that if you don’t send the key, the key is set to null then the data will be sent in a Round Robin fashion to make it very simple. So that means that your first message is going to be sent to partition 0, and then your second message to partition 1 and then partition 2, and so on. This is why it’s called Round Robin, but in case you send a key with your message, all the messages that share the same key will always go to the same partition. So this is a very very important property of Kafka because that means if you need ordering for a specific field, for example, if you have cars and you want to get all the GPS positions in order for that particular car then you need to make sure to have your message key set as the unique identifier for your car i.e carID and so in our car GPS example that we have discussed in this article, Topics, Partitions, and Offsets in Apache Kafka, we need to choose the message key to be equal to carID so that we have all the car positions for that one specific car in order as part of the same partition. 

Kafka Message Keys

 

Note: Please refer to the Topic Example that has been discussed in this article, Topics, Partitions, and Offsets in Apache Kafka, so that you can understand which example we are discussing here. 

So the second example again if we have the producer sends data to 2 partitions and the key is carID then carID_123 will always go in partition 0, carID_234 as well will always go in partition 0 and carID_345 and carID_456 will always go in partition 1. The idea here again is that you will never find the carID_123 data in partition 1 because of this key property we just mentioned.

Kafka Message Keys Example

 

So now let’s discuss how does a Kafka message look like. 

Kafka Message Anatomy

The Kafka messages are created by the producer and the first fundamental concept we discussed is the Key. The key can be null and the type of the key is binary. So binary is 0 and 1, but it can be strings and numbers and we’ll see how this happens to convert a string or a number into a binary. 

Kafka Message Anatomy

 

Please refer to the above image. So we have the key which is a binary field that can be null and then we have the value which is the content of your message and again this can be null as well. So the Key-Value is some of the two most important things in your message but there are other things that go into your message. For example, your message can be compressed and so the compression type can be indicated as part of your message. For example, none means no compression but we have four different kinds of compressions available in Kafka that are mentioned below.

  • gzip
  • snappy
  • lz4
  • zstd

We also have optional headers for your message. So headers are pairs of key-value and you can have many of those in part of one message and it is common to set them in case you’re trying to add metadata to your messages. Once a message is sent into a Kafka Topic then it will receive a partition number and an offset id. So the partition and the offset are going to be part of the Kafka message and then finally a timestamp alongside the message will be added either by the user or by the system and then that message will be sent to Kafka. So remember that the key is a binary and the value is binary but when we start writing some messages in Kafka we’re obviously going to use some higher-level objects and so to transform these objects into binaries we’re going to use the Producer Serializer

Producer Serializer

Serializer will indicate how to transform these objects into bytes and they will be used for the key and the value. So say for example that we have the value to be “hello world” and as a string and the key to be “123” and that’s an integer. In that case, we need to set the KeySerializer to be an IntegerSerializer and what this will do internally is that it will convert that integer into bytes, and these bytes will be part of the key which is going to be binary, and the same for the value which is “hello world” as a string. We’re going to use a StringSerializer as the ValueSerializer to convert that string into bytes and again this is going to give us our value as part of a binary field. 

Producer Serializer

 

Here are some common serializers given below

  • String (Including JSON if your data is adjacent)I
  • Integer, and Float for numbers
  • Avro, and Protobuf for advanced kind of data

Kafka Deserializer

To understand Kafka Deserializers in detail let’s first understand the concept of Kafka Consumers. Kafka Consumers is used to reading data from a topic and remember a topic again is identified by its name. So the consumers are smart enough and they will know which broker to read from and which partitions to read from. And in case of broker failures, the consumers know how to recover and this is again a good property of Apache Kafka. Now data for the consumers is going to be read in order within each partition. Now please refer to the below image. So if we look at a Consumer consuming from Topic-A/Partition-0, then it will first read the message 0, then 1, then 2, then 3, all the way up to message 11. If another consumer is reading from two partitions for example Partition-1 and Partition-2, is going to read both partitions in order. It could be with them at the same time but from within a partition the data is going to be read in order but across partitions, we have no way of saying which one is going to be read first or second and this is why there is no ordering across partitions in Apache Kafka

Apache Kafka Consumer

 

So our Kafka consumers are going to be reading our messages from Kafka which are made of bytes and so a Deserializer will be needed for the consumer to indicate how to transform these bytes back into some objects or data and they will be used on the key and the value of the message. So we have our key and our value and they’re both binary fields and bytes and so we will use a KeyDeserializer of type IntegerDeserializer to transform this into an int and get back the number 123 for Key Objects and then we’ll use a StringDeserializer to transform the bytes into a string and read the value of the object back into the string “hello world”. Please refer to the below image. 

 

So as we can see here choosing the right Deserializer is very important because if you don’t choose the right one then you may not get the right data in the end. So some common Deserializer is given below

  • String (Including JSON if your data is adjacent)I
  • Integer, and Float for numbers
  • Avro, and Protobuf for advanced kind of data


Like Article
Suggest improvement
Share your thoughts in the comments

Similar Reads