Nats vs kafka


  • Experimental Design and Setup
  • Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?
  • Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the Fastest?
  • Modern Open Source Messaging: Apache Kafka, RabbitMQ and NATS in Action
  • Experimental Design and Setup

    Post category: Research In recent years, there has been a huge growth and emergence of mainly Internet of Things IoT devices which has brought many new challenges for the IT-industry such as processing data between devices in real-time. Therefore, the demand for a scalable, high throughput, and low latency system has become very high. It remains quite unclear how stateful cloud native applications, such as message queue systems, perform in Kubernetes clusters.

    The primarily goals of this thesis have been to evaluate the message queues in Kubernetes based on the following criteria: Functional properties and capabilities: What is the message queue capable to do?

    Performance: How is the throughput and latency for the message queue? Scalability: How well does the message queue scale in Kubernetes? Overhead: Is the message queue lightweight or heavyweight? Ease of use: Is it easy or complicated to use the message queue?

    Popularity and vitality: Well-supported software with good documentation? Reliability: How robust and fault-tolerant is the message system? Message queue architectures using the publish-subscribe paradigm are widely implemented in event-based systems. This paradigm uses asynchronous communication between entities and conforms to scalable, high throughput, and low latency systems that are well adapted within the IoT-domain.

    This thesis evaluates the adaptability of three popular message queue systems in Kubernetes. The systems are designed differently, where e.

    The conducted tests give further knowledge on how the performance of the Kafka system is affected in multi-broker clusters using multiple number of partitions, enabling higher levels of parallelism for the system.

    Furthermore, it has proven to be a difficult task for systems to choose the correct message broker that fits their system requirements, however, this thesis outlines the main characteristics of the systems and eases this process. Maybe use this system comparison figure that visualizes the characteristics of the systems? Experimental Design and Setup Testbeds and an evaluation tool have been created in order to evaluate the systems and achieve the goals of the thesis.

    Smaller tests have been conducted for identifying relevant message queue settings to use in the evaluation plan. The general message queue evaluation plan is shown below.

    In order to evaluate such tests, testbeds consisting of 1, 3, and 5 message brokers are created. Each test is ran three times on Kubernetes clusters using e2-highcpu-8 machine types, each consisting of 8 vCPUs and 8 GB memory, and e2-highcpu machine types consisting of 16 vCPUs and 16 GB memory. The evaluations are conducted from Masih which is a concurrent evaluation tool that provides automated orchestration for benchmarking message queues. The testbeds are deployed in Kubernetes clusters on Google Kubernetes Engine GKE consisting of various number of nodes of different machine types.

    Terraform provisioning tool is used for simplifying the building process of the message queue clusters. Each message broker system provides an automated script for configuring the Kubernetes cluster in the Message Queue Provisioning repository.

    The general design of the installation procedure for the systems is shown below. The message queue systems are deployed together with the Prometheus monitoring tool for collecting broker and node specific metrics in the cluster.

    The architectures of the message queue clusters using three brokers are shown below. Results The conducted evaluations were done from a client using e2-highcpu machine type. There were totally 36 conducted evaluations out of 54 possible tests following the evaluation plan.

    The producers and consumers of the tests exchanged messages of totally 7 GB data in the conducted evaluations. The analysis was mainly performed for a single partition Kafka, STAN, and RabbitMQ systems, but the impacts of utilizing multi-partitioned Kafka systems are described as well. By comparing the obtained results with its related test evaluated in Scenario 2, where the systems were evaluated using 1 messages, each message being B, a huge performance impact can be seen.

    All systems are benefited from using B messages, where RabbitMQ makes significant improvements by reducing its mean latency by nearly a factor of 5. In general, the Kafka system outperforms in the tests. Table 1: Performance results for message queues in Scenario 1.

    Message System.

    Redis, Kafka or RabbitMQ: Which MicroServices Message Broker To Choose?

    Kafka is an open source distributed event streaming platform, and one of the five most active projects of the Apache Software Foundation. At its core, Kafka is designed as a replicated, distributed, persistent commit log that is used to power event-driven microservices or large-scale stream processing applications. It recently added event streaming functionality as well. Pulsar is not the only system of its kind as there are also other messaging systems like Apache DistributedLog and Pravega, which have been created on top of BookKeeper and aim to also provide some Kafka-like event streaming functionality.

    It provides persistent storage of messages in ledgers, across server instances called bookies. Each bookie synchronously writes each message to a local journal log for recovery purposes and then asynchronously into its local indexed ledger storage. RabbitMQ is an open-source traditional messaging middleware that implements the AMQP messaging standard, catering to low-latency queuing use cases. Availability and durability are properties of the various queue types offered.

    Classic queues offer the least availability guarantees. Classic mirrored queues replicate messages to other brokers and improve availability. Stronger durability is provided through the more recently introduced quorum queues but at the cost of performance. Since this is a performance-oriented blog post, we restricted our evaluation to classic and mirrored queues. Durability in distributed systems Single-node storage systems e.

    But in distributed systems, durability typically comes from replication, with multiple copies of the data that fail independently. Fsyncing data is just a way of reducing the impact of the failure when it does occur e. Conversely, if enough replicas fail, a distributed system may be unusable regardless of fsync or not. Hence, whether we fsync or not is just a matter of what guarantees each system chooses to depend on for its replication design.

    While some depend closely on never losing data written to disk, thus requiring fsync on every write, others handle this scenario in their design. Kafka is also able to leverage the OS for batching writes to the disk for better performance. We have not been able to ascertain categorically whether BookKeeper offers the same consistency guarantees without fsyncing each write—specifically, whether it can rely on replication for fault tolerance in the absence of synchronous disk persistence.

    But of note, Kafka and RabbitMQ implementations did have some significant shortcomings that affected the fairness and reproducibility of these tests.

    The resulting benchmarking code including the fixes described in more detail below are available as open source. This was critical for being able to not just report results but explain them. Fixes to the OMB Kafka driver We fixed a critical bug in the Kafka driver that starved Kafka producers of TCP connections, bottlenecking on a single connection from each worker instance. The fix makes the Kafka numbers fair, compared to other systems—that is, all of them now use the same number of TCP connections to talk to their respective brokers.

    We also fixed a critical bug in the Kafka benchmark consumer driver, where offsets were being committed too frequently and synchronously causing degradation, whereas it was done asynchronously for other systems.

    We also tuned the Kafka consumer fetch size and replication threads to eliminate bottlenecks in message fetching at high throughputs and to configure the brokers equivalent to the other systems.

    Routing keys were introduced to mimic the concept of partitions per topic, equivalent to the setup on Kafka and Pulsar. We added a TimeSync workflow for the RabbitMQ deployment to synchronize time across client instances for precise end-to-end latency measurements.

    In addition, we fixed another bug in the RabbitMQ driver to ensure accurate end-to-end latency measurement. Fixes to the OMB Pulsar driver For the OMB Pulsar driver, we added the ability to specify a maximum batching size for the Pulsar producer and turned off any global limits that could artificially limit throughput at higher target rates for producer queues across partitions.

    We did not need to make any other major changes to Pulsar benchmark drivers. In the sections below, we call out any changes we have made to these baseline configurations, along the way for different tests. Disks Specifically, we went with the i3en. This means that the tests measure the respective maximum server performance measures, not simply how fast the network is.

    See the full instance type definition for details. Per the general recommendation and also per the original OMB setup, Pulsar uses one of the disks for journaling and one for ledger storage.

    No changes were made to the disk setups of Kafka and RabbitMQ. Figure 1. Establishing the maximum disk bandwidth of i3en.

    Finally, it also tunes the power management quality of service QoS in the kernel for performance over power savings. Memory The i3en. Tuning Kafka and RabbitMQ to be compatible with the test instances was simple. Specifically, we halved the heap size from 24 GB each in the original OMB configuration to 12 GB each, proportionately dividing available physical memory amongst the two processes and the OS.

    In our testing, we encountered java. OutOfMemoryError: Direct buffer memory errors at high-target throughputs, causing bookies to completely crash if the heap size was any lower. This is typical of memory tuning problems faced by systems that employ off-heap memory. While direct byte buffers are an attractive choice for avoiding Java GC, taming them at a high scale is a challenging exercise.

    Throughput test The first thing we set out to measure was the peak stable throughput each system could achieve, given the same amount of network, disk, CPU, and memory resources. We define peak stable throughput as the highest average producer throughput at which consumers can keep up without an ever-growing backlog.

    Fundamentally, this provides a simple and effective way to amortize the cost of different batch sizes employed by Kafka producers to achieve maximum possible throughput under all conditions. If Kafka were configured to fsync on each write, we would just be artificially impeding the performance by forcing fsync system calls, without any additional gains. That said, it may still be worthwhile understanding the impact of fsyncing on each write in Kafka, given that we are going to discuss results for both cases.

    The effect of the various producer batching sizes on Kafka throughput is shown below. Fsyncing every message to disk on Kafka orange bars in Figure 2 yields comparable results for higher batch sizes.

    Note that these results have been verified only on the SSDs in the described testbed. Kafka does make full use of the underlying disks across all batch sizes, either maximizing IOPS at lower batch sizes or disk throughput at higher batch sizes, even when forced to fsync every message.

    Figure 2. With larger batches KB and 1 MB , however, the cost of fsyncing is amortized and the throughput is comparable to the default fsync settings. Pulsar implements similar batching on the producers and does quorum-style replication of produced messages across bookies.

    However, we encountered large latencies and instability on the BookKeeper bookies, indicating queueing related to flushing. We also verified the same behavior using the pulsar-perf tool that ships with Pulsar. As far as we can tell, after consulting the Pulsar community, this appears to be a bug so we chose to exclude it from our tests. Figure 3. From a durability standpoint, our benchmarks indicated that the consumer kept up with the producer, and thus we did not notice any writes to the disk.

    We also set up RabbitMQ to deliver the same availability guarantees as Kafka and Pulsar by using mirrored queues in a cluster of three brokers.

    Test setup The experiment was designed according to the following principles and expected guarantees: Messages are replicated 3x for fault tolerance see below for specific configs.

    We enable batching for all three systems to optimize for throughput. We batch up to 1 MB of data for a maximum of 10 ms. Pulsar and Kafka were configured with partitions across one topic.

    RabbitMQ does not support partitions in a topic. To match the Kafka and Pulsar setup, we declared a single direct exchange equivalent to a topic and linked queues equivalent to partitions. More specifics on this setup can be found below OMB uses an auto-rate discovery algorithm that derives the target producer throughput dynamically by probing the backlog at several rates. We saw wild swings in the determined rate going from 2. These hurt the repeatability and fidelity of the experiments significantly.

    In our experiments, we explicitly configured target throughput without using this feature and steadily increased the target throughput across 10K, 50K, K, K, K, and 1 million producer messages per second, with four producers and four consumers using 1 KB messages.

    We then observed the maximum rate at which each system offers stable end-end performance for different configurations. Throughput results We found Kafka delivered the highest throughput of the systems we compared. Given its design, each byte produced was written just once onto disk on a code path that has been optimized for almost a decade by thousands of organizations across the world. We will delve into these results in more detail for each system below.

    Figure 5. Comparison of peak stable throughput for all three systems: topic partitions with 1 KB messages, using four producers and four consumers We configured Kafka to use batch. We observed that Kafka was able to efficiently max out both the disks on each of the brokers—the ideal outcome for a storage system.

    Figure 6. Kafka performance using the default, recommended fsync settings. See raw results for details. We also benchmarked Kafka with the alternative configuration of fsyncing every message to disk on all replicas using flush.

    The results are shown in the following graph and are quite close to the default configuration. Figure 7. Specifically, it has per-partition producer queues internally, as well as limits for these queue sizes that place an upper bound on the number of messages across all partitions from a given producer.

    To avoid the Pulsar producer from bottlenecking on the number of messages being sent out, we set both the per-partition and global limits to infinity, while matching with a 1 MB byte-based batching limit. We arrived at this value by continuously increasing the value to the point that it had no measurable effect on the peak stable throughput that Pulsar ultimately achieved. See the Pulsar benchmark driver configs for details.

    We found that this fundamental design choice has a profound negative impact on throughput, which directly affects cost. Once the journal disk was fully saturated on the BookKeeper bookies, the producer rate of Pulsar was capped at that point. Figure 8. Prometheus node metrics showing BookKeeper journal disk maxed out for Pulsar and the resulting throughput measured at the BookKeeper bookies.

    To further validate this, we also configured BookKeeper to use both disks in a RAID 0 configuration , which provides BookKeeper the opportunity to stripe journal and ledger writes across both disks.

    Figure 9.

    RabbitMQ was released in and is one of the first common message brokers to be created. RabbitMQ supports all major languages, including Python, Java. Expect some performance issues when in persistent mode. Scale: can send up to a millions messages per second. Persistency: yes.

    One-to-one vs one-to-many consumers: only one-to-many seems strange at first glance, right?! Kafka was created by Linkedin in to handle high throughput, low latency processing. As a distributed streaming platform, Kafka replicates a publish-subscribe service.

    It provides data persistency and stores streams of records that render it capable of exchanging quality messages. They are all the creators and main contributors of the Kafka project. Scale: can send up to a million messages per second.

    Benchmarking Apache Kafka, Apache Pulsar, and RabbitMQ: Which is the Fastest?

    Redis is a bit different from the other message brokers. At its core, Redis is an in-memory data store that can be used as either a high-performance key-value store or as a message broker.

    Originally, Redis was not one-to-one and one-to-many. Each message broker system provides an automated script for configuring the Kubernetes cluster in the Message Queue Provisioning repository. The general design of the installation procedure for the systems is shown below. The message queue systems are deployed together with the Prometheus monitoring tool for collecting broker and node specific metrics in the cluster.

    The architectures of the message queue clusters using three brokers are shown below. Results The conducted evaluations were done from a client using e2-highcpu machine type. There were totally 36 conducted evaluations out of 54 possible tests following the evaluation plan. The producers and consumers of the tests exchanged messages of totally 7 GB data in the conducted evaluations. The analysis was mainly performed for a single partition Kafka, STAN, and RabbitMQ systems, but the impacts of utilizing multi-partitioned Kafka systems are described as well.

    By comparing the obtained results with its related test evaluated in Scenario 2, where the systems were evaluated using 1 messages, each message being B, a huge performance impact can be seen. All systems are benefited from using B messages, where RabbitMQ makes significant improvements by reducing its mean latency by nearly a factor of 5. In general, the Kafka system outperforms in the tests. Table 1: Performance results for message queues in Scenario 1.

    In the sections below, we call out any changes we have made to these baseline configurations, along the way for different tests. Disks Specifically, we went with the i3en. This means that the tests measure the respective maximum server performance measures, not simply how fast the network is.

    See the full instance type definition for details. Per the general recommendation and also per the original OMB setup, Pulsar uses one of the disks for journaling and one for ledger storage. No changes were made to the disk setups of Kafka and RabbitMQ.

    Figure 1. Establishing the maximum disk bandwidth of i3en. Finally, it also tunes the power management quality of service QoS in the kernel for performance over power savings. Memory The i3en. Tuning Kafka and RabbitMQ to be compatible with the test instances was simple. Specifically, we halved the heap size from 24 GB each in the original OMB configuration to 12 GB each, proportionately dividing available physical memory amongst the two processes and the OS.

    In our testing, we encountered java. OutOfMemoryError: Direct buffer memory errors at high-target throughputs, causing bookies to completely crash if the heap size was any lower. This is typical of memory tuning problems faced by systems that employ off-heap memory. While direct byte buffers are an attractive choice for avoiding Java GC, taming them at a high scale is a challenging exercise.

    Modern Open Source Messaging: Apache Kafka, RabbitMQ and NATS in Action

    Throughput test The first thing we set out to measure was the peak stable throughput each system could achieve, given the same amount of network, disk, CPU, and memory resources. We define peak stable throughput as the highest average producer throughput at which consumers can keep up without an ever-growing backlog.

    Fundamentally, this provides a simple and effective way to amortize the cost of different batch sizes employed by Kafka producers to achieve maximum possible throughput under all conditions.

    If Kafka were configured to fsync on each write, we would just be artificially impeding the performance by forcing fsync system calls, without any additional gains. That said, it may still be worthwhile understanding the impact of fsyncing on each write in Kafka, given that we are going to discuss results for both cases. The effect of the various producer batching sizes on Kafka throughput is shown below.

    Fsyncing every message to disk on Kafka orange bars in Figure 2 yields comparable results for higher batch sizes. Note that these results have been verified only on the SSDs in the described testbed. Kafka does make full use of the underlying disks across all batch sizes, either maximizing IOPS at lower batch sizes or disk throughput at higher batch sizes, even when forced to fsync every message.

    Figure 2. With larger batches KB and 1 MBhowever, the cost of fsyncing is amortized and the throughput is comparable to the default fsync settings. Pulsar implements similar batching on the producers and does quorum-style replication of produced messages across bookies. However, we encountered large latencies and instability on the BookKeeper bookies, indicating queueing related to flushing. We also verified the same behavior using the pulsar-perf tool that ships with Pulsar.

    As far as we can tell, after consulting the Pulsar community, this appears to be a bug so we chose to exclude it from our tests. Figure 3. From a durability standpoint, our benchmarks indicated that the consumer kept up with the producer, and thus we did not notice any writes to the disk. We also set up RabbitMQ to deliver the same availability guarantees as Kafka and Pulsar by using mirrored queues in a cluster of three brokers.


    Nats vs kafka