activemq
Checkout this server side thread. Folks are starting to think about using JavaScript on the browser to access an “Internet Messaging Bus”. They want to have thing like:
- Guaranteed delivery
- Once and only once deliveryGuaranteed
- order of deliveryServer
- push and client pullFunny
thing is that most of all that is available today with ActiveMQ! And to get really great performance, use ActiveMQ with Jetty! ActiveMQ comes with a simple little JavaScript API that allows you to access the ActiveMQ message bus using Comet style http polling. And ActiveMQ provides you all the above guarantees like all good message brokers.
Other news on this front is all the Ajax tool kits are starting to look at being able to inter-operate. At a minimum, a page with multiple tool kits will need to share it’s connections back to the server. So they will need to share an API to broker requests back to the server. I’ll keep an eye out for this API because it sounds like something that could be easily tied into ActiveMQ.
Rob Davies and I spent some time last week looking at his Kaha message store implementation. In a way, it’s similar to a experimental QuickJournal implementation that I had committed previously. The idea of the QuickJournal was that Journal log files were not deleted and that messages could be easily retrieved from the Journal. The journal would only checkpoint to the long term store the location of where the messages are located in the journal.
In a way, the long term store (JDBC in most cases) is being used like an index into the Journal files. This increases the performance of the journal since the amount of data that needs to be stored in long term store is drastically reduced and generally of a small size which works better with JDBC batch operations. Also messages do not need to be batched up in memory (for batch insertion into the DB) thus reducing the memory impact of the message store.
The funny part is that at some time last week, somehow a wager got started about who could build the fastest message store implementation that could stay under 64 megs of memory usage when a queue was loaded up with 10,000,00 1k messages. Kaha at the current time keeps it’s indexes fully in memory. So Rob started looking into a way to optimize the down so that they could fit in 64 megs (Rob is the optimization King BTW). I went down the route of we need to only load parts of the index into memory. I even shared my algorithm concept with him as long as he did not use it, LOL. In the end we realized, it was not going to be a weekend deal to implement this stuff and it would be best to work on a single solution together.
Kaha, IS a nice API and is much more general purpose than the MessageStore APIs. Some of the problems that Kaha currently has is that it does not guarantee constancy of the indexes and it does not support transactional operations. Those are 2 things that the journal can do today, and which Kaha could do if we modified it’s DataManager so that it journaled operations instead of just storing data items. So I’m going to try to integrate many of the Journal concepts into the DataManager so that:
- The data file acts as a redo log that is ‘replayed’ on startup to bring the indexes to a consistent state
- Use async batch writes for increased throughput: micro benchmarks showed that the journal can write at about 21 megs/s while the current DataManager maxes out at 8 megs/s
Other things to consider is that since the interfaces to the Kaha APIs are based on the List and Map interfaces, there is no easy way to:
- switch between doing async and sync operations against the data files. Currently Kaha has a force() method on the store that does syncs up any pending write but this is not optimal when using async batched writes (you end up syncing on a subsequent write).
- associate a transaction with a operation against a list or a map
An idea I’ve been floating in my head is the ability to have multiple proxies to a single physical container. Each proxy could be enlisted in a different transaction or it’s flag to do sync vs. async actions changed.
As you might be able to tell by now, I’m on the Kaha crack now… bless Rob.
I exposed the gigantic destination issues that ActiveMQ has in a previous blog post. I’ll take a little time to expand on the issue and why it’s not simple to solve, and what ActiveMQ 4.0 does today.
It’s obvious that we need to swap messages to disk when a queue needs to hold more messages than it could hold in RAM. We sometimes also call that spooling messages to disk. The issues that make this hard to implement are:
- Writing a message to disk slows you down a little, avoid it if possible. Sometimes you have no choice if the message was marked a persistent.
- Sometimes we may need to swap out even non-persistent messages.
- Avoid chucking a message out of ram if possible since loading it back from disk is REALLY slow.
- When a consumer is ready to consume a message, that message should already be in memory, waiting for it to load from disk will lead to consumer starvation.
- Even keeping lists of message references to where messages are on disk can use up too much memory. 10,000,000 disk locations in a linked list where every node in the list used only 100 bytes would still chew up about 100 megs of memory.
ActiveMQ 4.0 takes a simple approach and when sending persistent message to a Queue, it uses a MessageReference when moving a message though the Broker message dispatch process. A process that could take a while for a message to go from producer to a consumer and finally message acknowledgement. The MessageReference starts out being direct, in that it hold a reference to the message to keep it in RAM, but if the reference count drops below 1, then the direct reference is dropped. The reference count is allowed to drop to 0 when the message is just sitting in the Queue’s message list or in a consumer’s pending list. The reference count is > 0 while it’s being dispatched to a consumer. The MessageReference knows how to reload a Message from the peristence store when it’s reference count goes up above 0.
This is a quick and dirty fix and it works, but it obviously does not fix all the issues outlined initial. The shortcomings of the current solution is that:
- it is only implemented for Queues
- consumer starvation problem can exist since it does not persisted load messages asynchronously
- it keeps a list of MessageReference objects which can still exhaust JVM memory
One of the current issues with ActiveMQ is that it’s an uber fast message broker while consumers are online and consuming messages, but things start to kinda not works so great when you have a use case where you want to queue up ‘work/messages’ for a consumer that will be offline for days.
In ActiveMQ 4.0, we have hacked in some initial support for loading up a queue with a huge number of messages without blowing up the memory usage of the JVM, but it’s a bit hacky and it may fail work right if a consumer comes back online and the consumer recovery process kicks in.
All in all 4.0 is a solid broker with a ton of new and exciting features, but personally, I would like to focus on getting 4.1 to be the broker that can handle Gigantic Queues and Topics. I’ll post some more messages on this topic in the next few hours as I recap what I’ve discovered in the last few weeks.