I exposed the gigantic destination issues that ActiveMQ has in a previous blog post. I’ll take a little time to expand on the issue and why it’s not simple to solve, and what ActiveMQ 4.0 does today.
It’s obvious that we need to swap messages to disk when a queue needs to hold more messages than it could hold in RAM. We sometimes also call that spooling messages to disk. The issues that make this hard to implement are:
- Writing a message to disk slows you down a little, avoid it if possible. Sometimes you have no choice if the message was marked a persistent.
- Sometimes we may need to swap out even non-persistent messages.
- Avoid chucking a message out of ram if possible since loading it back from disk is REALLY slow.
- When a consumer is ready to consume a message, that message should already be in memory, waiting for it to load from disk will lead to consumer starvation.
- Even keeping lists of message references to where messages are on disk can use up too much memory. 10,000,000 disk locations in a linked list where every node in the list used only 100 bytes would still chew up about 100 megs of memory.
ActiveMQ 4.0 takes a simple approach and when sending persistent message to a Queue, it uses a MessageReference when moving a message though the Broker message dispatch process. A process that could take a while for a message to go from producer to a consumer and finally message acknowledgement. The MessageReference starts out being direct, in that it hold a reference to the message to keep it in RAM, but if the reference count drops below 1, then the direct reference is dropped. The reference count is allowed to drop to 0 when the message is just sitting in the Queue’s message list or in a consumer’s pending list. The reference count is > 0 while it’s being dispatched to a consumer. The MessageReference knows how to reload a Message from the peristence store when it’s reference count goes up above 0.
This is a quick and dirty fix and it works, but it obviously does not fix all the issues outlined initial. The shortcomings of the current solution is that:
- it is only implemented for Queues
- consumer starvation problem can exist since it does not persisted load messages asynchronously
- it keeps a list of MessageReference objects which can still exhaust JVM memory