Thoughts of Hiram Chirino

Monday, September 21, 2009

STOMP Clarification

I just saw a tweet which demonstrates that the STOMP spec still needs more clarification. I think Brian McCallister, the founding architect of protocol, will agree that one of the tenets of the protocol was for it to be simple enough to even use by user which directly connects to a server via telnet.

And to support that use case, newlines after the frame terminator are a natural occurrence. But it might be easier to describe it as:
  • A stomp frame may have zero or more newlines preceding it's command verb.

Friday, September 18, 2009

ActiveMQ Protobuf Implemtation Features

I promised I would follow up on my previous post on how the "The ActiveMQ Protobuf Implementation Rocks!".

So you might be asking yourself, what's the secret sauce? Well before I get into that, let me first explain the class model that our proto compiler generates.

For every message definition in the '.proto' file, the compiler will generate 3 classes:
  • the message interface: is implemented by the bean and buffer classes. It has all the 'getters'.
  • the bean class: has all the 'setters' and 'merge' methods
  • the buffer class: has all the encoding and decoding methods. It does not allow mutation.
The message interface also defines the freeze(), frozen(), and copy() methods which allow you to make an instance immutable, check to see if an instance is immutable, and create a mutable copy. Buffer classes are alway immutable. Bean class can transition to being immutable via freeze(). freeze() naturally returns a buffer object. copy() naturally returns a bean object.

This bean model gives substantial flexibility. Besides making it easy to transition from immutable to mutable and back, the message interface lets you implement business methods that operate against either type of instance. You could use the bean class purely in a builder style to always generate a buffer instance, or you could just use them like traditional java bean objects.

Once a bean instance is frozen, any attempts to modify the instance will throw assertion errors if assertions are enabled in your JVM. So the CPU cost of validating program correctness can can be disabled at run time.

Finally, the most important feature of the buffer class is that it holds on to either the byte array that it was created from or the frozen bean that created it, and sometimes both, after it builds one from the other. This has several implications. Firstly, once a buffer is encoded to a byte[], subsequent encoding passes are free. This is also true when a buffer is decoded, as the next encoding is free since it still retains the original encoding of the message. And the other benefit that this provides which the benchmark highlighted, is that deferred decoding is possible. A newly created buffer class will not decode the data until a field is accessed. This also true of the nested messages that are encoded in a buffer. While the outer message may get decoded, the nested message will not be decoded until it's fields are accessed.

The ActiveMQ Protobuf Implementation Rocks!

While reading Comparing the Java Serialization Options I ran across the a cool google code project which has done an excellent job benchmarking a wide variety of serialization options for java.

I've had been researching the protobuf encoding format for a while and really liked it. But I did not really like the Java implementation that Google had published. It was kinda clunky to use and I saw several optimizations that could be used that were missing. Optimizations that could create huge performance wins when applied to the usage patterns of an enterprise messaging system like Apache ActiveMQ. So I created a new protobuf implementation in the ActiveMQ project.

Naturally, I was curious to see how the activemq protobuf implementation stacked up against similar technologies. So I grabbed the V1 benchmark source code and added our implementation to it. If you want to do the same, apply this patch.

Once I ran the benchmark and I was very pleased with the results. I'm including the performance graphs of our impl and standard protobuf and thrift for comparison.









As it turns out, our implementation looks awesome in the benchmark! How about that decoding speed!

It's getting late here.. so I'll have do a follow up post explaining how come we did so much better.

Tuesday, July 07, 2009

Openwire Python Client for ActiveMQ

Wow, I can't believe I missed it. Python lovers rejoice! Seems some good folks have created a python client for ActiveMQ which is using the very robust ActiveMQ C++ client.

And for those of you on Ubuntu, Dejan Bosanac has put together an excellent guide on how to build it on ubuntu.

Jansi - Bringing ANSI Support to Java on Windows

Last weekend I got a little spare time an through together a small little library while should help with the problem of boring Java console applications on Windows.


It's called Jansi and it provides support for using ANSI escape sequences in your Java console applications on Windows.


With ANSI escape sequences, you can fully control the the cursor positioning and the foreground and background color of the console text output. Here is quick example of what's posssible:


Saturday, August 02, 2008

New Checksum Plugin

So in my last post I was suggesting making it easier to include dependency checksums as part of a maven build. I decided that it should be simple enough to implement this as a Maven Plugin. For those of you interested, you can get the source to the new Checksum Plugin here.

The basic problem the plugin is trying to solve is that it is possible that central repositories get hacked and the artifacts/dependencies of our builds get replaced with
malicious versions. Right now we have no way to easily detect that
and we could potential create a release build of a project which
bundles one of those malicious dependencies. In practice this rarely
occurs, but it's not out of the realm of possibilities.

Basically the plugin supports generating a checksum.txt file that is included as part of the project build. This file holds all the checksums for the dependencies (including the dependencies' pom checksum). Generating/updating is induced via the use of a maven profile. This is only done when dependencies get updated.

In a normal build the plugin just validates the checksums of the downloaded dependencies against those stored in the checksum.txt file.

I wish I could move up the validation of the dependencies from their current maven life cycle locations, but it seems you can't get the list of dependencies it gets moved up any more. Any maven mojo hackers have any work arounds for that?

Monday, July 28, 2008

Comments on the Maven Repository Security Proposal

For those of you who don't know, Maven is an awesome build tool. It uses centralized repositories to share build artifacts. Right now there is a problem, where if a repository is hacked, malicious code could be injected into those artifacts and distributed by other builds. Lots of folks object to using maven solely due to this possibility. It's a good thing that the maven teams seems to be working on fix those problems.

First off, I love the Maven Repository Security Proposal. I think that the 'Specified Checksums' idea is awesome. I think it needs to be made so easy to use that folks always use it. Right now it's a little ugly because it makes the dependency declaration much more verbose. Plus it does not seem to cover transitive dependencies that are being used during the build, and I think that those checksums NEED to be included too.

I think that what would be better is if maven provided the tools to update the checksum information in the pom.

Lets say that a build for a module is setup in some strict mode where only artifacts with known checksums are allowed. If the pom is updated to add a new dependency, I think there should be some maven command which automatically adds the checksum for the new dependency (and transitive dependencies). Artifacts that are signed with a trusted key get added without prompting, and a confirmation prompt would be given for artifacts that are not GPG trusted.

So the question is why go through all that trouble? So that folks get a trusted source distribution (out of SCM or a signed tar ball), can do a build and have a high level of guarantee that the dependencies that are being used in the source build match what was intended by the developers of the source distribution. Furthermore, it will not matter if the transitive dependencies are signed and have keys in the end user's keyring since all the checksums are include in the build.

Now, since there could be lots of dependencies in a build, due to the use of build plugins and transitive dependencies, it might be worth storing the checksum data in a file external to pom.xml, or at least in a different xml section from the dependencies declaration.

Things to think about: Having SNAPSHOT dependencies in the build could complicate things, as the build would be tied to a particular SNAPSHOT/checksum, but maybe that's a good thing.

Thursday, July 17, 2008

Keep an eye out for ZooKeeper

Wow, I love the simplicity that ZooKeeper brings to a really hard set of distributed problems. Check out this Introductory Video that explains it more in depth. Basically group leadership/coordination and cluster wide configuration issues are taken care of if you Use ZooKeeper.

Oh and it's an Apache Project now. Yay! Seems like the project website is still not fully setup since they are migrating from SourceForge to Apache, be here's a link to the source tree.

TODO: Double Write Buffers

Note to self: investigate implementing the Double Write Buffers idea in ActiveMQ. ActiveMQ keeps several indexes into the persistent messages that it's holding and when ActiveMQ is shutdown ungracefully, we rebuild the indexes from the data logs due to them being in inconsistent state. If your queueing up millions of messages, building those indexes can take a long time.

Double buffering may allow us fix inconistencies in those index and gets us running faster..

Monday, June 02, 2008

ActiveMQ/SpecJMS/Camel Webinar

Whoa, time flies by, and I forgot to post about the upcoming webinar that I will be co-hosting with Rob Davies on June 10th. We will be covering some messaging basics, introducing Apache ActiveMQ and Apache Camel to the audience, but most interesting I think will be the section where Rob will be covering the results that IONA has been seeing benchmarking ActiveMQ against the SpecJMS2007 test suite. I totally agree with Rob's comment that "An independent benchmark is important, because it negates the chance to skew home groan tests to a vendor's strengths."