Saturday, August 17, 2013

Using Graphite - metrics data

Graphite is a tool for time series data storage and graphing.  The main page is here: Graphite.

Graphite (and its Carbon receiver and Whisper data store) stores a metric value against a specific time stamp for the metric that you want to track.  In other words, if you want to know the number of requests per minute that a web server is receiving, you'd send the data to your graphite set up with a command like this (from a Unix/Linux/max command line):

echo "webserver1.requests_per_minute 201 1376748060" | nc graphite_svr.mycomp.com 2003

Where webserver1.requests_per_minute is the metric to be saved. 201 is the value of the metric and 1376748060 is the time in seconds since Jan 1, 1970.  This simple set of values - metric name, metric value, metric timestamp in seconds is all that's needed by Graphite.
In the example above, this information is then piped into the netcat command to the graphite server at port 2003, the default port for Graphite.  There are other ways to get information into Graphite - python's pickle format, for example (there's also apparently AMQP support).

Graphite will store this data into a whisper directory as set up in the graphite installation. Whisper is the database Graphite uses to store data. It is a round-robin database very similar to rrdtool's storage (rrd meaning round robin database) where Graphite started off; however, limitations in the version of rrd then led to the creation of whisper.

In the Graphite/Whisper storage area, the metric data will be stored in a hierarchy based on the metric name given. In the example above, the metric name is webserver1.requests_per_minute which will lead to a directory called webserver1 in which there will be a file called requests_per_minute.wsp.  Graphite uses the "." as a delimiter to create the hierarchy.

To view the data, use a web browser to go to the graphite front end (for example, graphite_svr.mycomp.com) where you can browse the metrics that Graphite is storing and create graphs and dashboards of graphs for viewing.  Individual charts can be viewed by creating the right URL as in:

http://graphite_svr.mycomp.com/render?width=400&from=-24hours&until=-&height=250&target=webserver1.requests_per_minute&target=webserver2.requests_per_minute

This URL will cause Graphite to draw a graph of the requests_per_minute from webserver1 and webserver2 for the last 24 hours until now and with a chart size of 400x250 pixels.

To see the raw data, add "&format=raw" to the end of a request; it will print the data per time slot, but won't show the time stamp. To see the time stamp and the values, you'll need to use some Whisper commands. To view json data, add "&format=json" instead.

whisper-fetch.py requests_per_minute.wsp will show the data with the timestamps.
whisper-info.py requests_per_minute.wsp will show basic information about the wsp file such as the expected time intervals
whisper-resize.py requests_per_minute.wsp 5m:1y 30m:3y will resize the whisper data file to store data every 5 minutes for a year and then start aggregating values to 30 minutes for 3 years.
whisper-update.py requests_per_minute.wsp 1376748060:199 will overwrite the currently stored value at time 1376748060 (201) with the new value (199).  I've had trouble getting this command to work, but have had success resubmitting the information via the netcat command as above.
whisper-dump.py will show a mix of whisper-info.py and whisper-fetch.py data including unfilled slots.
whisper-create.py to create a new metric file - this isn't needed as sending the data to Graphite will cause it to create the file with defaults matching the metric.

Whisper files are created with default values set in the storage-schemas.conf file which has entries like:
[webserver_metrics]
pattern = webserver*
retentions = 60s:90d 5m:3y

which sets the data intervals and retentions to every 60s for 90d.  The default values are every minute for 24 hours/1 day. Make sure you resize it or set the defaults before creating it.  When Whisper starts to aggregate the data it requires a certain number of metrics to start the aggregation. The default is xfactor=.5 which means that at least 50% of the data points must exist for an aggregated value to be created. If you have a flaky data injection, you might want to reduce this amount.

To stop and start carbon:
carbon-cache.py stop
carbon-cache.py start
For more info on this, look at: https://testing.readthedocs.org/en/latest/monitoring_201.html

Scaling a Graphite system
Graphite can be clustered to provide data and performance and even up-time at a scale that isn't possible on a single system.  The clustering available allows spreading data out and/or duplicating data for availability.  See more here: http://www.aosabook.org/en/graphite.html

No comments:

Post a Comment