Date Tags statsd

Developing and maintaining any system requires many decisions about what to do, where to invest time and energy to get the maximum benefits fast. To make correct decisions accurate information is required. It is not a valid argument (although might be common) to say "I feel our DB interaction is slow, lets optimize our data access layer", or "To make our website super fast, I think we should start using CDN services". These decisions could improve performance for sure, but they might not be the best places to start. There is a difference between "feeling" and "actually knowing" something. It is only a valid accurate measurement that helps to detect bottlenecks and their affects on the whole system.

Systems statistics could be measured either from outside or from inside. the former seems easier to implement because of the abstraction with current system, but the accuracy of data could be compromised by network performance. Also it is not easy or even sometimes not possible to profile internals of the system from outside. Collecting measurements from within the system is more accurate and could be done in any layer so the internals of the system could be measured as well. However it is not desired to use system resources to measure statistics about the system itself. It is well known that the performance of a system could be affected while profiling it.

A simple solution for this requirement is proposed as "statsd". By sending small chunks of data from within the system to an external service over a lightweight none blocking protocol, you could actually collect accurate data with the least amount of affect on the performance. The system is sending the data out, so network or remote service performance issues have the least affect on accuracy of it and the system would not wait for a blocking call to store its statistics. This simplicity reminds me of syslog, but instead of sending arbitrary log messages, the data is actually the measured statistics.

It is not a standard, there is no RFC (yet), and could be implemented totally ad hoc. However the proposal for "statsd" by etsy is simple and practical enough that many chose to use it as an implementation reference. There are many servers developed in different languages compatible with this reference implementation.

Statsd uses UDP to send data buckets to the collector service. Each data bucket is a text message with a name, a type, and a value. The statsd server receives the buckets, and on each configured interval flushes the results to available backends which could be a database, another statsd service (relaying), or a custom backend.

A very common usage is to store the data in time series databases provided by Graphite project. I'll introduce Graphite and how to implement measuring a system with statsd in another article.