Open In App

Apache Kafka – Producer Acknowledgement and min.insync.replicas

Improve
Improve
Like Article
Like
Save
Share
Report

In Apache Kafka, one of the most important settings is Producer Acks. Acks mean acknowledgments. Kafka Producer can choose to receive acknowledgment of data writes. Acknowledgment is also known as confirmation. And so there are three confirmation or acknowledgment modes available in Apache Kafka Producer.

  1. acks=0 (possible data loss)
  2. acks=1 (limited data loss)
  3. acks=all (no data loss)

1. acks=0

Which is the Producer who just sends the data and will not wait for an acknowledgment. In that case, there’s a possible data loss because if you send the data to your Broker and the Broker is down, well we don’t know about it because we set acks=0 and we will not get an acknowledgment and we will lose the data. So this is tricky and we’ll see examples in which acks=0 is used, but it is very dangerous. So, here’s how it looks like

 

So in the above image, we have the producer, and we have Broker 101, which has one partition for now, and it’s the leader of that partition. So, the producers send data to the leader, but because acks equal zero, the Broker will just do the writes. But it’s useful when you have

  • Metrics collection. When you send a lot of metrics, it’s okay to lose one metric once in a while. 
  • And, it’s okay when you have the Log collection.

Obviously, it’s not great to lose data, but acks equal zero really is nice on performance because the broker never replies to the producers.

2. acks=1

Now, the default as of Kafka 2.0 is acks equals one, which is leader acknowledgment. And so, basically, the leader response is requested, but there’s no guarantee of replication. Replication is something that happens in the background, but it’s not the prerequisite to receiving a response. If an ack is not received, then the producer has to retry. And so, here’s what it looks like. 

 

We have the producer, we have the broker, and the broker sends the data to the leader, and the leader says, “Yes, I got the data, “and I’m going to write it to disk. “Here it is, here’s your response.” And the writes happen. So, then, the writes happen, then the response is given to the producer. The producer knows that the broker has the data. But, if the leader goes down before the replicas have the chance the replicate the data yet, then we’ll have a data loss. So, this is one of the most common mistakes we see, because it’s a default, and people are not aware of it. Sometimes, there’s data loss just because acks is equal to one.

3. acks=all

So, now there is acks equals to all, and it’s basically saying, “I want the replicas to acknowledge.” And so, the leader and the replica, this time, are requested for acknowledgment. So, here’s a more complicated example. 

 

We have Broker 101, 102, and 103, and they all have partition zero. One is the leader, it’s 101, and the rest are replicas. And so, as you can see, the partitions are exactly the same, because they are replicas of one another. Now, the broker sends the data to the leader. And the broker says, “Yep, I got it.” And then, it sends it to the replicas for the replication process. This is simplified, but it sends it to the replicas. The replica says, “Yep, I got it. “Here is an acknowledgment for you, leader. So, the leaders acknowledge, which is why it’s called acks equals all, and then, when the broker has gotten all the acknowledgments from all the replicas, hence the name all, it will send a response to the producer and say, “You know what, we’re good, everyone got it.” And so, basically, it added a bit of latency, because now, for the producer to get a response from the broker leader, it needs to wait for all the replicas to do their replication. And, obviously, because you add latency, you also add safety, because we have requested more guarantees now. And so, this process guarantees no data loss if you have enough replicas online, obviously. Now, it’s a necessary setting if you don’t want to lose data, and you have to know about it. So, when you have a safe producer, acks is always equal to all. Now, basically, when you do acks equals all because it’s not complicated enough, you have to use another setting called min.insync.replicas.

min.insync.replicas

So, basically, acks=all must be used in conjunction with min.insync.replicas. It is a broker and topic setting. If you set up the broker level, you can override it at the topic level. So, the most common setting for min.insync.replicas is

min.insync.replicas=2

That means that at least two brokers that are insync replicas, including the leader, must respond that they have the data. Otherwise, you’ll get an error message. So, that means that if you use replication.factor=3, min.insync.replica=2, and acks=all, you can only tolerate one broker going down. Otherwise, the producer will receive an exception on Send. Here is a diagram to understand this concept.

 

So, you have min.insync.replicas=two, and we have our sync set up. Now, let’s assume that Broker 102 and 103 are both down, so we only have one broker out of this. We have a replication of three, but only one broker is still online. Now, the producer sends the data to the leader, and the leader is going to try to replicate it, but this replication cannot happen, because the other brokers that were replicas are down. And so, basically, the number of insync replicas right now is one. And so, what Broker 101 will say is, it’ll reply to the producer and says, “Exception:NotEnoughReplicas“, that the data will be returned safely to the cluster. And it is the role of the producer to perform retries up until this write succeeds. So, if you want maximum safety, maximum availability, then min.insync.replicas=two, acks=all, and a replication.factor of at least 3.


Last Updated : 18 Mar, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads