I’m pleased to announce the availability of Apache Apollo 1.0. Apollo is a faster, more reliable, easier to maintain messaging broker built from the foundations of the Apache ActiveMQ project but with a radically different threading architecture which lets it scale to large number of concurrent connections and destinations while using a constant number of threads.
- Stomp 1.0 Protocol Support
- Stomp 1.1 Protocol Support
- Topics and Queues
- Queue Browsers
- Durable Subscriptions on Topics
- Persistent/Reliable Messaging
- Message Expiration
- Message Swapping
- Message Selectors
- JAAS Authentication
- ACL based Authorization
- SSL/TLS Support and Certificate based Authentication
- REST Management API
I’ve re-run the STOMP benchmarks I covered in a previous post against the 1.0 release. The benchmark uses the STOMP protocol to build an apples to apples performance comparison between all the major STOMP servers. I’ve also built a new JMS benchmark which uses the JMS API build the same kind of performance comparisons. Both benchmarks are open source and available to be tried out on your choice of hardware. Or better yet, follow the simple instructions found in each project’s readme to run the benchmarks on EC2:
For those of you wondering why the ActiveMQ project create new message broker as a subproject, it’s because we wanted to address the shift in the processor market to multi core for the next major release. ActiveMQ currently employs complex thread locking which starts to become bottleneck as you increase the core count and you don’t end up fully exploiting the potential on large core count machines. By developing the solution as a subproject, it’s easier to explore the best options without impacting current main line development of ActiveMQ 5.x.
Now that the Apollo sub project has proven itself, I think it’s time to start integrating it into ActiveMQ. Ideally, ActiveMQ 6 switches Apollo’s messaging engine and adds/ports all the existing features we’ve come to expect out of ActiveMQ like networks of brokers and priority queues. But we should really be talking about this on the Apache mailing lists.. join me there!
I’ve recently run STOMP benchmarks against the lastest releases of the 4 most feature packed STOMP servers:
- Apache ActiveMQ 5.5.1
- Apache ActiveMQ Apollo 1.0 Beta 6 (an ActiveMQ Subproject)
- HornetQ 2.2.5
- RabbitMQ 2.7.0
STOMP is an asynchronous messaging protocol with design roots are based on HTTP protocol. It’s simplicity has made the protocol tremendously popular since it reduces the complexity of integrating different platforms and languages. There are a multitude of client libraries for your language of choice to interface with STOMP servers.
The benchmark focuses on finding the maximum producer and consumer throughput for a variety of messaging scenarios. For example, it benchmarks a combination of all the following scenario options:
- Queues or Topics
- 1, 5, or 10 Producers
- 1, 5, or 10 Consumers
- 1, 5, or 10 Destinations
- 20 byte, 1k, or 256k message bodys
The benchmark warms up each scenario for 3 seconds. Then for 15 seconds it samples the total number of messages that were produced/consumed every second. Finally, the destination gets drained of any messages before benchmarking the next scenario. The benchmark generates a little HTML report with a graph for each scenario run where it displays messaging rate for each of the servers over the 15 second sampling interval.
I’ve run the benchmarks on a couple if different machines and I’ve posted the results at: http://hiramchirino.com/stomp-benchmark/
Since anyone can get access to an EC2 instance to reproduce those results, the rest of this article will focus on the results of the obtained on the EC2 High-CPU Extra Large Instance. If you want to reproduce, just spin up a new Amazon Linux 64 bit AMI and then run the following commands in it:
sudo yum install -y screen curl https://nodeload.github.com/chirino/stomp-benchmark/tarball/master | tar -zxv mv chirino-stomp-benchmark-* stomp-benchmark screen ./stomp-benchmark/bin/benchmark-all
Note: RabbitMQ 2.7.0 sometimes dies midway through the benchmark. It seems RabbitMQ does not enforce very strict flow control and you can get into situations where it runs out of memory if you place too much load on it. It seems that crash becomes more likely as you increase the core speed of the cpu or reduce the amount of physical memory on the box. Luckily, the RabbitMQ folks are aware of the issue and hopefully will fix it by the next release.
The ‘Throughput to an Unsubscribed Topic’ benchmarking scenario is interesting to just get a idea/base line what the fastest possible rate a producer can send to server. Since there are not attached consumers, the broker should be doing very little work since it’s just dropping all the messages that get sent to it.
The Queue Load/Unload scenarios a very important to look at if your application uses queues. You often times run into situations where messages start accumulating in a queue with either no consumers or with not enough consumers to keep up with the producer load. This benchmark first runs a producer for 30 seconds enqueuing non-persistant messages and then runs a producer enqueuing persistant messages for 30 seconds. Finally, it runs a consumer to dequeue the messages for 30 seconds. An interesting observation in this scenario is that Apollo was the only sever which could dequeue at around the same maximum enqueue rates which is important if you ever want your consumers to catch up with fast producers.
The Fan In/Out Load Scenarios help you look at cases where you have either multiple producers or multiple consumers running against a single destination. It helps you see how performance will be affects as you scale up the producers and consumers. You should follow the “10 Producers” columns and “10 Consumers” rows to really get a sense of which servers do well as the you increase the number of clients on a single destination.
The Partitioned Load Scenarios look at how well the server scales as you start to increase load on multiple destinations at different message sizes.
I’ve tried to make the benchmark as fair as possible to all the contenders, all the source code to the benchmark is available on github. Please open an issue or send me pull request if you think of ways to improve it!
ActiveMQ Apollo is a new generation of messaging broker built from the foundations of the ActiveMQ messaging broker, but using a radically different threading and message dispatching architecture. In it’s current incarnation, Apollo only supports the STOMP protocol but just like the original ActiveMQ, it’s been designed to be a multi protocol broker and in future iterations it should get OpenWire support so it can be compatible with ActiveMQ 5.x JMS clients.
The BIGGEST change in the broker architecture was a switch to a non-blocking threading model for message processing using the HawtDispatch library. This meant changing many of the APIs to using a continuation passing style so that caller never blocks on a request. The upside of the new architecture is that the most complex and brittle areas of multi-threaded code in ActiveMQ 5.x have now been tremendously simplified. Those areas are using actor style coarse grained synchronization thanks to HawtDispatch.
It is impressive how well Apollo performs. Using a little benchmarking tool, I compared the performance of Apollo and three other STOMP server implementations. The benchmark tests different combinations of consumers, destinations, producers, persistence options, message sizes for a total of 74 common usage scenarios. I ran the benchmarks on two different boxes. If you want to peek at the complete benchmark reports, see Ubuntu 4 Core report and OS X 8 Core report.
Apollo ends up doing really well in comparison to other servers in most cases but in some cases it’s just mind blowing. For example:
The scenario above is the case where you have 10 producer and 10 consumers on a single topic moving non persistent STOMP messages with a 20 byte payload. The graph shows that Apollo can easily sustain a total consumer processing rate of 1.2 Million messages per second while the closest contender could only reach about 105,000 thousand messages per second.
I am pleased to announce the availability of RestyGWT 1.0.
RestyGWT is a GWT generator for REST services and JSON encoded data transfer objects. What I really like about it is that it allows you to write rich JAX-RS based services and access those services from at GWT client reusing all the same DTO objects that you use on the server side.
In other words, it just as easy to use as GWT-RPC except it’s using simple RESTful URLs and JSON encoding of data when it access server side resources. This has the nice benefit that you can implement the server side in something other than JAVA easily.
Further information see:
My previous post promised a follow up to explain how network IO events are handled by HawtDispatch. Before I get into the details, I urge you to read Mark McGranaghan’s post on Threaded vs Evented Servers. He does an excellent job describing how event driven servers scale in comparison to threaded servers. This post will try to highlight how HawtDispatch provides an excellent framework for the implementation of event based servers.
When implementing event based servers, there are generally 2 patterns used, the reactor pattern and the proactor pattern. The reactor pattern can be though of as being a synchronous version of the proactor pattern. In a reactor pattern IO events are serviced by the thread in the IO handling loop. In a proactor pattern the thread processing the IO event loop passes off the the IO event to another thread for processing. HawtDispatch can support both styles of IO processing.
HawtDispatch uses a fixed sized thread pool sized to match the number of cores on your system. Each thread in the pool runs an IO handling loop. When a NIO event source is created, it gets assigned to one of the threads. When network events occur the source causes callbacks to occur against on the dispatch queue targeted in the event source. Typically that target dispatch queue is set to a serial queue which the application uses to handle the network protocol. Since it’s a serial queue, the handling of the event can be done in a thread safe way. The proactor pattern is being used since the serial queue can execute in any of the threads in the thread pool .
To use the reactor pattern, HawtDispatch supports ‘pinning’ the serial queue to a thread. When a dispatch source is created on a pinned dispatch queue, then the event source gets registered against the same ‘pinned’ thread. The benefit of the reactor pattern is that it avoids some of cross thread synchronization needed for the proactor pattern and provides cheaper GCs. The down side to the reactor pattern is that you may have to manage reblanacing network sources across all the available thread. Lucky HawtDispatch does support moving pinned dispatch queues and sources to different threads.