HawtDispatch Event Based IO

My previous post promised a follow up to explain how network IO events are handled by HawtDispatch.  Before I get into the details, I urge you to read Mark McGranaghan’s post on Threaded vs Evented Servers.  He does an excellent job describing how event driven servers scale in comparison to threaded servers.   This post will try to highlight how HawtDispatch provides an excellent framework for the implementation of event based servers.

When implementing event based servers, there are generally 2 patterns used, the reactor pattern and the proactor pattern.  The reactor pattern can be though of as being a synchronous version of the proactor pattern.  In a reactor pattern IO events are serviced by the thread in the IO handling loop.  In a proactor pattern the thread processing the IO event loop passes off the the IO event to another thread for processing.  HawtDispatch can support both styles of IO processing.

HawtDispatch uses a fixed sized thread pool sized to match the number of cores on your system.  Each thread in the pool runs an IO handling loop.  When a NIO event source is created, it gets assigned to one of the threads.  When network events occur the source causes callbacks to occur against on the dispatch queue targeted in the event source.  Typically that target dispatch queue is set to a serial queue which the application uses to handle the network protocol.  Since it’s a serial queue, the handling of the event can be done in a thread safe way.  The proactor pattern is being used since the serial queue can execute in any of the threads in the thread pool .

To use the reactor pattern, HawtDispatch supports ‘pinning’ the serial queue to a thread.  When a dispatch source is created on a pinned dispatch queue, then the event source gets registered against the same ‘pinned’ thread.  The benefit of the reactor pattern is that it avoids some of  cross thread synchronization needed for the proactor pattern and provides cheaper GCs.  The down side to the reactor pattern is that you may have to manage reblanacing network sources across all the available thread.  Lucky HawtDispatch does support moving pinned dispatch queues and sources to different threads.

Scaling Up with HawtDispatch

I just spotted an excellent article on how reducing the number of cores used by a multi-threaded actually increased it’s performance.  This seems counter intuitive at first, but it is a sad reality.  It is very easy to create contention across threads in a multi-threaded app which in turn lowers performance.

A few months ago, I experienced similar results while hacking on ActiveMQ.  I noticed that passing messages from producer connections to consumer connections was dramatically faster if the producer and consumer were being serviced by the same thread.  I decided that the next version of the broker would to be need to be built using a thread management framework which could optimize itself so that those connections could collocate onto one thread if possible.

Then I saw the the libdispatch API (it forms the foundation of the Grand Central Dispatch technology in OS X) and fell in love with it’s simplicity and power.  I realized that implementation of that API could in theory provides the threading optimizations I was looking for.  So I started hacking on HawtDispatch, a Java/Scala clone of libdispatch.

The central concepts to libdispatch and hawtdispatch are global and serial queues.  Global queues are executors which execute tasks concurrently using a fixed size thread pool. Serial queues are executors without an assigned thread and which execute tasks in FIFO order.   When tasks added to a serial queue, the serial queue gets added to a global queue so that the serial queue can execute it’s tasks.  Multiple serial queues execute concurrently on the global queue.

The overhead of a serial queue is very small, it’s just a several counters and a couple of linked lists.  You can use them like a lightweight thread. Feel free to create thousands of them.  If you squint at it just right, they allows you to use erlang style actor thread model.

Now that you have an idea how HawtDispatch is used, lets get back to what kinds of optimizations it can do to help with cross thread contention.  HawtDispatch generally uses a concurrent linked list to enqueue a task in serial queue, but there are times when it can avoid that synchronization of the concurrent linked list.  For example, if the serial queue is currently executing in the current thread, then an enqueue can just add the task to a non synchronized linked list.  HawtDispatch also supports ‘pining’ a serial queue to one of the threads in the global queue’s thread pool.  This allows you to force serial queues to collocate onto one thread so that when they do need to communicate, there is no thread contention involved.

But you still run into cases where you need to move tons of events from one serial queue to another which is executing on a different thread.  For these cases, you use a custom event source.  It allows you to  coalesce a bunch of events generated on on thread as a single event delivered to the another queue.  HawtDispatch will aggregate custom events  into a thread local (to avoid contention) and once the current thread has drained all execution queues, it will deliver those custom events to their target queues.

This post is already getting kind of long, so I’ve have to do a follow up post on how all that interacts with network IO events.  But the general idea is, yes, keeping stuff on 1 core is fast, but it won’t scale once your CPU bound, so having a framework like can HawtDispatch help minimize cross thread contention while still providing the ability to scale up to multiple cores as load increases.