Hiram Chirino

Hiram Chirino

Bit Mojo - My random ramblings on hawt technology.

Hiram Chirino

3 minute read

Rob Davies and I spent some time last week looking at his Kaha message store implementation. In a way, it’s similar to a experimental QuickJournal implementation that I had committed previously. The idea of the QuickJournal was that Journal log files were not deleted and that messages could be easily retrieved from the Journal. The journal would only checkpoint to the long term store the location of where the messages are located in the journal.

In a way, the long term store (JDBC in most cases) is being used like an index into the Journal files. This increases the performance of the journal since the amount of data that needs to be stored in long term store is drastically reduced and generally of a small size which works better with JDBC batch operations. Also messages do not need to be batched up in memory (for batch insertion into the DB) thus reducing the memory impact of the message store.

The funny part is that at some time last week, somehow a wager got started about who could build the fastest message store implementation that could stay under 64 megs of memory usage when a queue was loaded up with 10,000,00 1k messages. Kaha at the current time keeps it’s indexes fully in memory. So Rob started looking into a way to optimize the down so that they could fit in 64 megs (Rob is the optimization King BTW). I went down the route of we need to only load parts of the index into memory. I even shared my algorithm concept with him as long as he did not use it, LOL. In the end we realized, it was not going to be a weekend deal to implement this stuff and it would be best to work on a single solution together.

Kaha, IS a nice API and is much more general purpose than the MessageStore APIs. Some of the problems that Kaha currently has is that it does not guarantee constancy of the indexes and it does not support transactional operations. Those are 2 things that the journal can do today, and which Kaha could do if we modified it’s DataManager so that it journaled operations instead of just storing data items. So I’m going to try to integrate many of the Journal concepts into the DataManager so that:

  • The data file acts as a redo log that is ‘replayed’ on startup to bring the indexes to a consistent state
  • Use async batch writes for increased throughput: micro benchmarks showed that the journal can write at about 21 megs/s while the current DataManager maxes out at 8 megs/s

Other things to consider is that since the interfaces to the Kaha APIs are based on the List and Map interfaces, there is no easy way to:

  • switch between doing async and sync operations against the data files. Currently Kaha has a force() method on the store that does syncs up any pending write but this is not optimal when using async batched writes (you end up syncing on a subsequent write).
  • associate a transaction with a operation against a list or a map

An idea I’ve been floating in my head is the ability to have multiple proxies to a single physical container. Each proxy could be enlisted in a different transaction or it’s flag to do sync vs. async actions changed.

As you might be able to tell by now, I’m on the Kaha crack now… bless Rob.

comments powered by Disqus

About

I'm a software engineer for Red Hat Inc.
Disclaimer: The views expressed on this site are mine alone and do not necessarily reflect the views of my employer or its affiliated entities.

Recent posts

See more

Categories