Tuesday, October 1, 2013

Persistence Adaptors and ActiveMQ Options

Where do you find ActiveMQ options that aren't listed on the ActiveMQ site pages?  And how do you configure the newer, more pluggable persistence and lock adapters?

Recently, we've had problems around ActiveMQ again - less in terms of the brokers than in the clients having ever increasing numbers of connections causing the broker to eventually crash.  While we try to understand the issue with the clients (unlikely due to ActiveMQ), we tried to reconfigure the ActiveMQ brokers to cope with the load.  One thing mentioned previously was turning off Network of Brokers - it fails too easily and ends up in a split brain.  Restarting a broker fixes it, but can kick off the clients - consumers and producers - into increasing connections.  In other words, we were trapped - increasing connections causing ActiveMQ problems causing increasing connections causing ...

While we were experimenting, we went back to a SQL persistence layer rather than KahaDB.  For pseudo random testing, it was about 60-70% as performance as KahaDB, but in real work loads, it seemed noticeably slower.  The consumers and producers seemed to think so as well as the spiraling connections was much worse with SQL backed storage, presumably due to slower performance causing more connection issues causing slower performance causing ... you get the picture.

We're tempted by a solution that is based completely NFS shared storage, but some colleagues are nervous about KahaDB stale file issues over NFS and shared between brokers.  If we have KahaDB files that won't clear and are shared between two brokers, it seems even more problematic than our current set up.

A couple of ideas: ActiveMQ supports mKahaDB which allows you to have different KahaDB files for different queues or topics (with "catch all" defaults available).  That way the slow and fast consumers can be separated from each other and the file size can be controlled better.  It might also help with stale file issues.  See this page for more: http://activemq.apache.org/kahadb.html
Another idea is to switch to LevelDB which should be faster than KahaDB according to the ActiveMQ site and also might not suffer from the stale file problems.  Or maybe it does.
Yet another option is to use the ActiveMQ pluggable storage lockers http://activemq.apache.org/pluggable-storage-lockers.html and set the storage to be a local directory for each broker and the lock file can be a shared directory.**  Ok, this option isn't great because messages could be stranded, especially if you have some slow consumers, but for us, we might prefer a little manual work after a failover instead of risking down time - well some of the other guys.

Regardless of approach, the pluggable storage has the ability to separate locks from data if needed and to set some options.**  Look at this for a quick example of
  <kahaDB directory="activemq-data">
       <shared-file-locker lockAcquireSleepInterval="100000"/>
For setting the lock directory, use:
 <shared-file-locker directory="activemq-lock-directory"/>
or similar.  A few of the options for the pluggable storage and lockers is on the main ActiveMQ site, but there are more that aren't.

Finding all the ActiveMQ XML configuration options, including the storage locking ones, is easiest by looking directly at the ActiveMQ XML.  Here's the link to the kahaDB store:
and to the shared-file-locker which is a possible element of that store:
Be aware of version numbers in the two links above (both are for 5.8.0).
The ActiveMQ guys are in the process of finishing the separation of storage vs locks, but not all made it into 5.8 - mixing kaha with SQL lease locks might have to wait until 5.9 (and 5.9 is out with the SQL lease locks):

*** Update - a little delayed in mentioning this, but ActiveMQ 5.9 appears to have all that is needed to use shared locking with individual kahadb stores - we'll test that and report back.
The options in the <statements/> section are available by looking at the code which also has useful pieces like the SQL create statements.  Hopefully, we can drop the work below now!

**Ok, one problem - the XML supports the shared-file-locker directoy="..." syntax, but the code does NOT do anything with it!  After trying this, we realized the lock file was still being put in the same location as the data files.  Reviewing the ActiveMQ code (search for sharedfilelocker.java) made it clear that it hasn't been finished yet.  So, how to use a feature like this: linux based file locking - if you set a script that either locks or changes ownership of the 'lock' file then you'll control ActiveMQ start up.  Detecting and setting the lock requires a little work with NFS, flock, or perhaps something like python - actually just changing the ownership is easier.  Since you're trying to detect if the other ActiveMQ is running, looking for a lock is nice, but you could just curl one of the standard URLs on the other broker to see if it is running - not foolproof, but perhaps workable with the right supplemental checks.

One option to having a shared lock is to try the DB shared locker, but in the spirit of avoiding the DB (and it would be fine to use it for a lock!), here's a little script that flags the lock by leaving a file on the NFS mount (msg_dir).  It's set to look at /proc/kmsg as a test, but change it to the activemq/lock file instead.

#check if file is locked or at leasted opened and indicate with another file
#  using this as flock across NFS didn't initially work for me

#exit value should be 0 for an opened file and fopened should have a process id
# else file shouldn't be opened and therefore should be in use by activemq

if [ -e $fn ]
  fopened=`lsof -wt $fn`
  echo "no file to check!"

if [ "$fopened" != "" -a $exit_value -eq 0 ]
   echo "file is locked"
   echo $fopened > $msg_dir/file_is_locked_on_$hn
   rm -f $msg_dir/file_NOT_or_unknown
   if [ "$standby_server" = "yes" ]
       sleep 5 #make the standby server sleep waiting to see if race for lock with another server
       remote_lock=`ls -1 $msg_dir | grep locked | grep -v $hn | wc -l`
       if [ $remote_lock -gt 0 ]
          touch $msg_dir/file_NOT_or_unknown
          rm -f $msg_dir/file_is_locked_on_$hn
         echo "still not locked remotely so setting for local startup"
   echo "not locked or unknown"
   touch $msg_dir/file_NOT_or_unknown
   rm -f $msg_dir/file_is_locked_on_$hn

# Have left a marker that one instance is up or not, now use that to control activemq

remote_lock=`ls -1 $msg_dir | grep locked | grep -v $hn | wc -l`
local_lock=`ls -1 $msg_dir | grep locked | grep $hn | wc -l`
echo "remote lock: $remote_lock; locally locked: $local_lock"
if [ $remote_lock -gt 0 ]
 chown root.root $amq_lock_file #prevent activemq from locking and starting
elif [ $local_lock -eq 1 ]
 chown activemq.activemq $amq_lock_file #allow activemq to lock and start
 # could just start activemq at this point



  1. Hi JM,
    It was a very nice blog post. Currently, we're trying to set up a HA activemq cluster and I'm trying to choose the correct option for our system. Have you checked newer versions of AMQ and how do you compare Master/Slave options such as Shared Disk and Replicated LevelDBStore?

    1. Hi - we've not yet tried the LevelDB configuration, but would like to see how it works. If it works well, then LevelDB would probably be our preferred option as the support for it will probably be better and we use similar DBs with other tools. The only downside that I can see to LevelDB vs Shared Disk is the extra overhead in running a separate DB cluster. We already use a DB cluster with our SQL backed ActiveMQ cluster which has been good, but has much more limited performance vs our KahaDB backed cluster; in this instance we had SQL running for other applications, so caused no added overhead.
      Shared Disk clusters could be very fast (using the latest NFS) and simple, but require getting the locking correct - that's very 'do able', but takes some effort in code and configuration. Interested to know if ActiveMQ 5.10 makes locking easier.

  2. Hi, thanks for sharing your experiences with AMQ.
    One thing that concerns me about your setup, which I assume is multiple KahaDBs + DB locking, is failover. What will happen if one of your KahaDB fails? Do you somehow prevent losing messages from a failed KahaDB instance?

  3. Sorry for the late reply! With a separate KahaDBs and a single DB locking, there is definitely the possibility that messages will be held in the unavailable ActiveMQ. We've handled this by bringing that ActiveMQ up on a different port (to prevent clients from connecting) and moving the messages to the active instance. It's not ideal from that perspective in terms of manual recovery work - although if your messages are short lived you don't need to do this. We would use shared storage or replicated storage as our main setup if possible. We're trying replicated LevelDB now so will feed back shortly on that experience.