Graphite Clusters

Graphite is a great time series tool. It's losing momentum to the newer players in the field like Prometheus (more on this on in another blog), Influxdb, etc. However, for a few hundred thousand to a few million metrics, Graphite can be an easy choice. Also, for some use cases, such as business metrics Graphite is the better choice. That's because Graphite allows you to update old records (say you found an error and the business needs to know the right story) while most of the other solutions only allow you to write data once - what you knew at the time is all you get.

Graphite installation is fairly straight forward on RedHat and Ubuntu setups although a few tweaks may be necessary (see this post). We use Apache HTTP as the web server for the python front end; make sure WSGI or the python gateway uses the same major version of python as the graphite install did (i.e. python 2 or 3).

To use graphite, send metrics to port 2003 and start viewing them via the UI at port 80. To send metrics to the text port 2003 do something like this from a Linux/Mac command line:
echo "mymetric.somestat.count 342 1489003770" | nc graphite_host 2003
This creates or adds a value for the metric "count" of "somestat" which is a subset of "mymetric" (application, for example). Graphite also listens on 2004 for data in a pickle format for sending bulk data - it's faster as well.
Here's a flow of data for the simple Graphite setup:
data ingestion: data -> carbon-cache (line fed: port 2003; pickle: port 2004) -> stored in its cache -> writes out to whisper files
graphite-web request pulls from: whisper files & carbon-cache port 7102 for cached data


That's great and with reasonable hardware (a few cores, 4-8GB, decent disks) you can scale it up to about 200k-300k metrics without any real effort. To see the performance of the metric recorder carbon-cache, keep an eye on the carbon metrics in the graphite web front end. Also, pay attention to apache's memory demands and IO and CPU load.

When more metrics or more dashboards on the metrics are needed, it's time to cluster graphite or at least scale out. Here's a quick overview:
Use carbon-relay(s) to send metrics to more than one carbon-cache instance which will store them. This can be done in replicated mode (all metrics to multiple hosts) or in a load sharing fashion (metric 1 to host A, metric 2 to host B, etc). Use graphite-web to read local metrics from the whisper files written by carbon and from the data in carbon-cache's cache. Then, if wanted, use a global graphite-web to read from the local graphite-webs. Add memcache to speed things up by storing the images and data coming out of the graphite-webs. There's one extra piece which is the carbon-aggregator which can take a group of related metrics and write them out as an aggregate (sum, ave, etc) to reduce the original storage needs (or add a little to them).

Here's a flow for feeding data into a cluster:
data ingestion (port 2013 (linefeed)/2014 (pickle): 1 or more carbon-relays -> using hashing, rules, or replication setting -> multiple carbon-caches (pickle format on port 2004 (or multiple ports for multiple instances on one host)) -> store in cache -> whisper files
Viewing/retrieving data:
graphite-web -> other graphite-webs in cluster -> read local whisper files and local carbon-cache cached data

The configuration for these cluster settings is in the carbon.conf file

It can get a little more involved when setting up the various components and making sure that files refer to the information.

Helpful links:
http://bitprophet.org/blog/2013/03/07/graphite/
https://rcrowley.org/articles/federated-graphite.html
http://allegro.tech/2015/09/scaling-graphite.html
https://grey-boundary.io/the-architecture-of-clustering-graphite/

Comments

Popular Posts