Messaging
Wow, it’s been a while since I’ve posted but several exiting events have been happing in the ActiveMQ arena.
- Apache ActiveMQ has GRADUATED out of the incubator and is an official Apache project!
- The ActiveMQ website has moved to http://activemq.apache.org and has received a new face lift thanks to yours truly.
- ActiveMQ 5.0 development is making huge progress and Rob has put up an excellent post outlining the upcoming features in the next version of ActiveMQ.
If you want to get a preview of those features, get the latest from the ActiveMQ trunk and kick it around. Feedback would be appreciated.
Checkout this server side thread. Folks are starting to think about using JavaScript on the browser to access an “Internet Messaging Bus”. They want to have thing like:
- Guaranteed delivery
- Once and only once deliveryGuaranteed
- order of deliveryServer
- push and client pullFunny
thing is that most of all that is available today with ActiveMQ! And to get really great performance, use ActiveMQ with Jetty! ActiveMQ comes with a simple little JavaScript API that allows you to access the ActiveMQ message bus using Comet style http polling. And ActiveMQ provides you all the above guarantees like all good message brokers.
Other news on this front is all the Ajax tool kits are starting to look at being able to inter-operate. At a minimum, a page with multiple tool kits will need to share it’s connections back to the server. So they will need to share an API to broker requests back to the server. I’ll keep an eye out for this API because it sounds like something that could be easily tied into ActiveMQ.
A few days ago the AMQP spec was announced on TSS. I quickly downloaded the spec and I have some initial impressions.
#1: I think it’s unfortunate that the “AMQP” sounds too much like it has something to do with ActiveMQ which most folks abbreviate to AMQ, as in, “Have you downloaded AMQ 4?”. This along with the fact that AMQ and AMQP are both related technologies, the first is a mom provider and the second is a wire protocol for mom providers.
#2: This is nice spec. I like the client specified “binding” concepts introduced in this spec. Perhaps these concepts can be introduced in higher level APIs like JMS one day.
#3: It seems that one of the goals of the spec is for vendors to be able to interoperate with each other. I have got a feeling that specs like WS-Notification or STOMP have a better chance at accomplishing this. Binary wire protocols are hard to implement and I doubt there will be many implementations of it. If a good open source reference implementation becomes available then this would become more plausible.
#4: It does not go into details of how the content of the messages should be encoded. This effectively means that JMS implementations will not be able to interoperate using AMQP since different implementations would encodes the content of a JMS Message differently. Even for simple things like TextMessage, would they use UTF-8 or ASCII? And it gets even more complex for messages lke StreamMessage and MapMessage.
#5: I may be wrong, but it seems like messages are always sent asynchronously. In some cases the JMS spec requires messages to be sent synchronously. Perhaps transactions can be used to simulate a synchronous send, but wouldn’t that add substantial of overhead?
Rob Davies and I spent some time last week looking at his Kaha message store implementation. In a way, it’s similar to a experimental QuickJournal implementation that I had committed previously. The idea of the QuickJournal was that Journal log files were not deleted and that messages could be easily retrieved from the Journal. The journal would only checkpoint to the long term store the location of where the messages are located in the journal.
In a way, the long term store (JDBC in most cases) is being used like an index into the Journal files. This increases the performance of the journal since the amount of data that needs to be stored in long term store is drastically reduced and generally of a small size which works better with JDBC batch operations. Also messages do not need to be batched up in memory (for batch insertion into the DB) thus reducing the memory impact of the message store.
The funny part is that at some time last week, somehow a wager got started about who could build the fastest message store implementation that could stay under 64 megs of memory usage when a queue was loaded up with 10,000,00 1k messages. Kaha at the current time keeps it’s indexes fully in memory. So Rob started looking into a way to optimize the down so that they could fit in 64 megs (Rob is the optimization King BTW). I went down the route of we need to only load parts of the index into memory. I even shared my algorithm concept with him as long as he did not use it, LOL. In the end we realized, it was not going to be a weekend deal to implement this stuff and it would be best to work on a single solution together.
Kaha, IS a nice API and is much more general purpose than the MessageStore APIs. Some of the problems that Kaha currently has is that it does not guarantee constancy of the indexes and it does not support transactional operations. Those are 2 things that the journal can do today, and which Kaha could do if we modified it’s DataManager so that it journaled operations instead of just storing data items. So I’m going to try to integrate many of the Journal concepts into the DataManager so that:
- The data file acts as a redo log that is ‘replayed’ on startup to bring the indexes to a consistent state
- Use async batch writes for increased throughput: micro benchmarks showed that the journal can write at about 21 megs/s while the current DataManager maxes out at 8 megs/s
Other things to consider is that since the interfaces to the Kaha APIs are based on the List and Map interfaces, there is no easy way to:
- switch between doing async and sync operations against the data files. Currently Kaha has a force() method on the store that does syncs up any pending write but this is not optimal when using async batched writes (you end up syncing on a subsequent write).
- associate a transaction with a operation against a list or a map
An idea I’ve been floating in my head is the ability to have multiple proxies to a single physical container. Each proxy could be enlisted in a different transaction or it’s flag to do sync vs. async actions changed.
As you might be able to tell by now, I’m on the Kaha crack now… bless Rob.
I exposed the gigantic destination issues that ActiveMQ has in a previous blog post. I’ll take a little time to expand on the issue and why it’s not simple to solve, and what ActiveMQ 4.0 does today.
It’s obvious that we need to swap messages to disk when a queue needs to hold more messages than it could hold in RAM. We sometimes also call that spooling messages to disk. The issues that make this hard to implement are:
- Writing a message to disk slows you down a little, avoid it if possible. Sometimes you have no choice if the message was marked a persistent.
- Sometimes we may need to swap out even non-persistent messages.
- Avoid chucking a message out of ram if possible since loading it back from disk is REALLY slow.
- When a consumer is ready to consume a message, that message should already be in memory, waiting for it to load from disk will lead to consumer starvation.
- Even keeping lists of message references to where messages are on disk can use up too much memory. 10,000,000 disk locations in a linked list where every node in the list used only 100 bytes would still chew up about 100 megs of memory.
ActiveMQ 4.0 takes a simple approach and when sending persistent message to a Queue, it uses a MessageReference when moving a message though the Broker message dispatch process. A process that could take a while for a message to go from producer to a consumer and finally message acknowledgement. The MessageReference starts out being direct, in that it hold a reference to the message to keep it in RAM, but if the reference count drops below 1, then the direct reference is dropped. The reference count is allowed to drop to 0 when the message is just sitting in the Queue’s message list or in a consumer’s pending list. The reference count is > 0 while it’s being dispatched to a consumer. The MessageReference knows how to reload a Message from the peristence store when it’s reference count goes up above 0.
This is a quick and dirty fix and it works, but it obviously does not fix all the issues outlined initial. The shortcomings of the current solution is that:
- it is only implemented for Queues
- consumer starvation problem can exist since it does not persisted load messages asynchronously
- it keeps a list of MessageReference objects which can still exhaust JVM memory