Thoughts of Hiram Chirino

Tuesday, May 30, 2006

Blog URL Moved

Since I hate to be dependent on 3rd party services and URLs, I moved my Blog URL to a URL that I own: http://hiramchirino.com

They funny thing is that blogspot wasted no time when I changed how I published my blog, and someone else snatched up my old blogbucket.blogspot.com url.

Monday, May 22, 2006

Beefing up Kaha

Rob Davies and I spent some time last week looking at his Kaha message store implementation. In a way, it's similar to a experimental QuickJournal implementation that I had committed previously. The idea of the QuickJournal was that Journal log files were not deleted and that messages could be easily retrieved from the Journal. The journal would only checkpoint to the long term store the location of where the messages are located in the journal.

In a way, the long term store (JDBC in most cases) is being used like an index into the Journal files. This increases the performance of the journal since the amount of data that needs to be stored in long term store is drastically reduced and generally of a small size which works better with JDBC batch operations. Also messages do not need to be batched up in memory (for batch insertion into the DB) thus reducing the memory impact of the message store.

The funny part is that at some time last week, somehow a wager got started about who could build the fastest message store implementation that could stay under 64 megs of memory usage when a queue was loaded up with 10,000,00 1k messages. Kaha at the current time keeps it's indexes fully in memory. So Rob started looking into a way to optimize the down so that they could fit in 64 megs (Rob is the optimization King BTW). I went down the route of we need to only load parts of the index into memory. I even shared my algorithm concept with him as long as he did not use it, LOL. In the end we realized, it was not going to be a weekend deal to implement this stuff and it would be best to work on a single solution together.

Kaha, IS a nice API and is much more general purpose than the MessageStore APIs. Some of the problems that Kaha currently has is that it does not guarantee constancy of the indexes and it does not support transactional operations. Those are 2 things that the journal can do today, and which Kaha could do if we modified it's DataManager so that it journaled operations instead of just storing data items. So I'm going to try to integrate many of the Journal concepts into the DataManager so that:
  • The data file acts as a redo log that is 'replayed' on startup to bring the indexes to a consistent state
  • Use async batch writes for increased throughput: micro benchmarks showed that the journal can write at about 21 megs/s while the current DataManager maxes out at 8 megs/s
Other things to consider is that since the interfaces to the Kaha APIs are based on the List and Map interfaces, there is no easy way to:
  • switch between doing async and sync operations against the data files. Currently Kaha has a force() method on the store that does syncs up any pending write but this is not optimal when using async batched writes (you end up syncing on a subsequent write).
  • associate a transaction with a operation against a list or a map
An idea I've been floating in my head is the ability to have multiple proxies to a single physical container. Each proxy could be enlisted in a different transaction or it's flag to do sync vs. async actions changed.

As you might be able to tell by now, I'm on the Kaha crack now... bless Rob.

A Closer Look at the Gigantic Destination Nut

I exposed the gigantic destination issues that ActiveMQ has in a previous blog post. I'll take a little time to expand on the issue and why it's not simple to solve, and what ActiveMQ 4.0 does today.

It's obvious that we need to swap messages to disk when a queue needs to hold more messages than it could hold in RAM. We sometimes also call that spooling messages to disk. The issues that make this hard to implement are:
  • Writing a message to disk slows you down a little, avoid it if possible. Sometimes you have no choice if the message was marked a persistent.
  • Sometimes we may need to swap out even non-persistent messages.
  • Avoid chucking a message out of ram if possible since loading it back from disk is REALLY slow.
  • When a consumer is ready to consume a message, that message should already be in memory, waiting for it to load from disk will lead to consumer starvation.
  • Even keeping lists of message references to where messages are on disk can use up too much memory. 10,000,000 disk locations in a linked list where every node in the list used only 100 bytes would still chew up about 100 megs of memory.
ActiveMQ 4.0 takes a simple approach and when sending persistent message to a Queue, it uses a MessageReference when moving a message though the Broker message dispatch process. A process that could take a while for a message to go from producer to a consumer and finally message acknowledgement. The MessageReference starts out being direct, in that it hold a reference to the message to keep it in RAM, but if the reference count drops below 1, then the direct reference is dropped. The reference count is allowed to drop to 0 when the message is just sitting in the Queue's message list or in a consumer's pending list. The reference count is > 0 while it's being dispatched to a consumer. The MessageReference knows how to reload a Message from the peristence store when it's reference count goes up above 0.

This is a quick and dirty fix and it works, but it obviously does not fix all the issues outlined initial. The shortcomings of the current solution is that:
  • it is only implemented for Queues
  • consumer starvation problem can exist since it does not persisted load messages asynchronously
  • it keeps a list of MessageReference objects which can still exhaust JVM memory

Mapping Beans to REST

I'm one of those guys that thinks that REST is great technology. Sure, REST is does not do everything that SOAP can do, but I think that REST is built to be SIMPLE, something that SOAP and it's WS-* buddies forgot about.

What we are missing is a good standard way to map REST to the simple POJO programming model that most of the Java industry has been quickly adopting. Seems SeXFire Dan has good start on a way of doing that!

Dan if you read this, I would make it even simpler if by default the method names for a service are determined by convention. For example, for a given XService, the methods:

getX(...) : is automatically mapped to a HttpMethod.GET
deleteX(...): is automatically mapped to a HttpMethod.DELETE
addX(...): is automatically mapped to HttpMethod.POST
updateX(...): is automatically mapped to a HttpMethod.PUT

Scaling to Gigantic Queues and Topics

One of the current issues with ActiveMQ is that it's an uber fast message broker while consumers are online and consuming messages, but things start to kinda not works so great when you have a use case where you want to queue up 'work/messages' for a consumer that will be offline for days.

In ActiveMQ 4.0, we have hacked in some initial support for loading up a queue with a huge number of messages without blowing up the memory usage of the JVM, but it's a bit hacky and it may fail work right if a consumer comes back online and the consumer recovery process kicks in.

All in all 4.0 is a solid broker with a ton of new and exciting features, but personally, I would like to focus on getting 4.1 to be the broker that can handle Gigantic Queues and Topics. I'll post some more messages on this topic in the next few hours as I recap what I've discovered in the last few weeks.

<start-of-blog>

This just a quick post on why I'm going to start blogging.

I think that I get involved in many small projects, and sometimes I don't get back some of the smaller ones. I may be deluding myself, but I'm hoping that If I post about it, one day I will re-read my blog and get back to projects that I had found interesting before. I also hope other folks may find some of these projects interesting and will help with them.