Tuesday, October 1, 2013

Persistence Adaptors and ActiveMQ Options

Where do you find ActiveMQ options that aren't listed on the ActiveMQ site pages?  And how do you configure the newer, more pluggable persistence and lock adapters?

Recently, we've had problems around ActiveMQ again - less in terms of the brokers than in the clients having ever increasing numbers of connections causing the broker to eventually crash.  While we try to understand the issue with the clients (unlikely due to ActiveMQ), we tried to reconfigure the ActiveMQ brokers to cope with the load.  One thing mentioned previously was turning off Network of Brokers - it fails too easily and ends up in a split brain.  Restarting a broker fixes it, but can kick off the clients - consumers and producers - into increasing connections.  In other words, we were trapped - increasing connections causing ActiveMQ problems causing increasing connections causing ...

While we were experimenting, we went back to a SQL persistence layer rather than KahaDB.  For pseudo random testing, it was about 60-70% as performance as KahaDB, but in real work loads, it seemed noticeably slower.  The consumers and producers seemed to think so as well as the spiraling connections was much worse with SQL backed storage, presumably due to slower performance causing more connection issues causing slower performance causing ... you get the picture.

We're tempted by a solution that is based completely NFS shared storage, but some colleagues are nervous about KahaDB stale file issues over NFS and shared between brokers.  If we have KahaDB files that won't clear and are shared between two brokers, it seems even more problematic than our current set up.

A couple of ideas: ActiveMQ supports mKahaDB which allows you to have different KahaDB files for different queues or topics (with "catch all" defaults available).  That way the slow and fast consumers can be separated from each other and the file size can be controlled better.  It might also help with stale file issues.  See this page for more: http://activemq.apache.org/kahadb.html
Another idea is to switch to LevelDB which should be faster than KahaDB according to the ActiveMQ site and also might not suffer from the stale file problems.  Or maybe it does.
Yet another option is to use the ActiveMQ pluggable storage lockers http://activemq.apache.org/pluggable-storage-lockers.html and set the storage to be a local directory for each broker and the lock file can be a shared directory.**  Ok, this option isn't great because messages could be stranded, especially if you have some slow consumers, but for us, we might prefer a little manual work after a failover instead of risking down time - well some of the other guys.

Regardless of approach, the pluggable storage has the ability to separate locks from data if needed and to set some options.**  Look at this for a quick example of
<persistenceAdapter>
  <kahaDB directory="activemq-data">
    <locker>
       <shared-file-locker lockAcquireSleepInterval="100000"/>
    </locker>
  </kahaDB>
</persistenceAdapter>
For setting the lock directory, use:
 <shared-file-locker directory="activemq-lock-directory"/>
or similar.  A few of the options for the pluggable storage and lockers is on the main ActiveMQ site, but there are more that aren't.

Finding all the ActiveMQ XML configuration options, including the storage locking ones, is easiest by looking directly at the ActiveMQ XML.  Here's the link to the kahaDB store:
activemq.apache.org/schema/core/activemq-core-5.8.0-schema.html#kahaDB
and to the shared-file-locker which is a possible element of that store:
http://activemq.apache.org/schema/core/activemq-core-5.8.0-schema.html#shared-file-locker
Be aware of version numbers in the two links above (both are for 5.8.0).
The ActiveMQ guys are in the process of finishing the separation of storage vs locks, but not all made it into 5.8 - mixing kaha with SQL lease locks might have to wait until 5.9 (and 5.9 is out with the SQL lease locks):
https://issues.apache.org/jira/browse/AMQ-4365

*** Update - a little delayed in mentioning this, but ActiveMQ 5.9 appears to have all that is needed to use shared locking with individual kahadb stores - we'll test that and report back.
The options in the <statements/> section are available by looking at the code which also has useful pieces like the SQL create statements.  Hopefully, we can drop the work below now!

**Ok, one problem - the XML supports the shared-file-locker directoy="..." syntax, but the code does NOT do anything with it!  After trying this, we realized the lock file was still being put in the same location as the data files.  Reviewing the ActiveMQ code (search for sharedfilelocker.java) made it clear that it hasn't been finished yet.  So, how to use a feature like this: linux based file locking - if you set a script that either locks or changes ownership of the 'lock' file then you'll control ActiveMQ start up.  Detecting and setting the lock requires a little work with NFS, flock, or perhaps something like python - actually just changing the ownership is easier.  Since you're trying to detect if the other ActiveMQ is running, looking for a lock is nice, but you could just curl one of the standard URLs on the other broker to see if it is running - not foolproof, but perhaps workable with the right supplemental checks.

One option to having a shared lock is to try the DB shared locker, but in the spirit of avoiding the DB (and it would be fine to use it for a lock!), here's a little script that flags the lock by leaving a file on the NFS mount (msg_dir).  It's set to look at /proc/kmsg as a test, but change it to the activemq/lock file instead.

#check if file is locked or at leasted opened and indicate with another file
#  using this as flock across NFS didn't initially work for me
fn=/proc/kmsg
#fn=$activemqdir/lock
hn=`hostname`
msg_dir=/tmp
fn=$msg_dir/file_is_locked_on_$hn
standby_server="no"
amq_lock_file=/tmp/activemq_home_lock

#exit value should be 0 for an opened file and fopened should have a process id
# else file shouldn't be opened and therefore should be in use by activemq

if [ -e $fn ]
 then
  fopened=`lsof -wt $fn`
  exit_value=$?
 else
  echo "no file to check!"
  fopened=""
  exit_value=1
fi


if [ "$fopened" != "" -a $exit_value -eq 0 ]
 then
   echo "file is locked"
   echo $fopened > $msg_dir/file_is_locked_on_$hn
   rm -f $msg_dir/file_NOT_or_unknown
   if [ "$standby_server" = "yes" ]
     then
       sleep 5 #make the standby server sleep waiting to see if race for lock with another server
       remote_lock=`ls -1 $msg_dir | grep locked | grep -v $hn | wc -l`
       if [ $remote_lock -gt 0 ]
         then
          touch $msg_dir/file_NOT_or_unknown
          rm -f $msg_dir/file_is_locked_on_$hn
       else
         echo "still not locked remotely so setting for local startup"
       fi
   fi
 else
   echo "not locked or unknown"
   touch $msg_dir/file_NOT_or_unknown
   rm -f $msg_dir/file_is_locked_on_$hn
fi

# Have left a marker that one instance is up or not, now use that to control activemq

remote_lock=`ls -1 $msg_dir | grep locked | grep -v $hn | wc -l`
local_lock=`ls -1 $msg_dir | grep locked | grep $hn | wc -l`
echo "remote lock: $remote_lock; locally locked: $local_lock"
if [ $remote_lock -gt 0 ]
 then
 chown root.root $amq_lock_file #prevent activemq from locking and starting
elif [ $local_lock -eq 1 ]
 chown activemq.activemq $amq_lock_file #allow activemq to lock and start
 # could just start activemq at this point
 then
fi