[ninux-dev] Graphite:
nemesis
nemesis at ninux.org
Sat Mar 29 15:10:00 CET 2014
Ciao Nino e tutti gli altri,
l'altro giorno vi parlavo di Graphite (python/django), che dopo un bel
pò di ricerche e studi credo sia il tool più adatto per immagazzinare e
visualizzare le metriche della rete e delle applicazioni.
Per quanto riguarda il collezionamento dei dati sono orientato verso
statsd (javascript/nodejs): https://github.com/etsy/statsd/
Vi fornisco alcune info essenziali:
Innanzitutto, chi usa Graphite?
Orbitz
Sears Holdings
Etsy (see http://codeascraft.etsy.com/2010/12/08/track-every-release/)
Google (opensource Rocksteady project)
Media Temple
Canonical
Brightcove (see http://opensource.brightcove.com/project/Diamond/)
Vimeo
SocialTwist
Douban
https://graphite.readthedocs.org/en/latest/who-is-using.html
Com'è nato statsd?
http://codeascraft.com/2011/02/15/measure-anything-measure-everything/
Chi usa statsd?
Non ho trovato una lista completa, però a quanto pare la maggior parte
di quelli che usano graphite lo usa in coppia con statsd.
E tra questi a quanto pare c'è anche instagram:
http://instagram-engineering.tumblr.com/post/20541814340/keeping-instagram-up-with-over-a-million-new-users-in
Alcune FAQ sparse che mi avete chiesto l'altra sera:
What is Graphite?
Graphite is a highly scalable real-time graphing system. As a user, you
write an application that collects numeric time-series data that you are
interested in graphing, and send it to Graphite's processing backend,
carbon, which stores the data in Graphite's specialized database. The
data can then be visualized through graphite's web interfaces.
How scalable is Graphite?
From a CPU perspective, Graphite scales horizontally on both the
frontend and the backend, meaning you can simply add more machines to
the mix to get more throughput. It is also fault tolerant in the sense
that losing a backend machine will cause a minimal amount of data loss
(whatever that machine had cached in memory) and will not disrupt the
system if you have sufficient capacity remaining to handle the load.
From an I/O perspective, under load Graphite performs lots of tiny I/O
operations on lots of different files very rapidly. This is because each
distinct metric sent to Graphite is stored in its own database file,
similar to how many tools (drraw, Cacti, Centreon, etc) built on top of
RRD work. In fact, Graphite originally did use RRD for storage until
fundamental limitations arose that required a new storage engine.
What is whisper?
Whisper is a fixed-size database, similar in design to RRD
(round-robin-database). It provides fast, reliable storage of numeric
data over time.
Why don't you just use RRD?
RRD is great, and initially Graphite did use RRD for storage. Over time
though, we ran into several issues inherent to RRD's design.
RRD can't take updates for a timestamp prior to its most recent update.
So for example, if you miss an update for some reason you have no simple
way of back-filling your RRD file by telling rrdtool to apply an update
to the past. Whisper does not have this limitation, and this makes
importing historical data into Graphite way way easier.
At the time whisper was written, RRD did not support compacting
multiple updates into a single operation. This feature is critical to
Graphite's scalability.
RRD doesn't like irregular updates. If you update an RRD but don't
follow up another update soon, your original update will be lost. This
is the straw that broke the camel's back, since Graphite is used for
various operational metrics, some of which do not occur regularly
(randomly occuring errors for instance) we started to notice that
Graphite sometimes wouldn't display data points which we knew existed
because we'd received alarms on them from other tools. The problem
turned out to be that RRD was dropping the data points because they were
irregular. Whisper had to be written to ensure that all data was
reliably stored and accessible.
Why did you totally rewrite RRD? Couldn't you just submit a patch?
I didn't totally rewrite it, I rewrote only a small subset of what RRD
does, its basic storage mechanism. Patching RRD would mean hundreds of
lines of C code, whereas Whisper is under 500 lines of simple python.
Seriously though, the real reason I didn't simply submit a patch for
rrdtool is that whisper's design is incompatible with RRD's feature set.
RRD provides the ability to specify an arbitrary update interval, that
is you could say that you intend to update your RRD file once every
minute, every 10 minutes, whatever. And rrdtool also allows you to
configure your RRA's (round-robin-archives) independant of this update
interval, so you could have a 1-minute precision archive but an update
interval of say, 10 seconds. In this case, RRD will store your updates
in a temporary workspace area and after the minute has passed, aggregate
them and store them in the archive. Whisper on the other hand mandates
that your update interval must be the same as the finest precision
archive you configure. So for instance, if your archive configuration is
1-minute precision for 2 hours, then 5-minute precision for a day, your
update interval *must* be 1-minute. The reason for this is that whisper
inserts your updates *immediately* into your finest precision archive,
so another update within the same interval would overwrite the previous
value. Basically this just means that the onus of aggregating values to
fit in the finest precision archive is on the user, not the database.
How fast is whisper?
Whisper is fast enough. It is slower than rrdtool because whisper is
written in python, rrdtool is written in C, go figure. However the
differences in speed are quite small. I spent a lot of time optimizing
whisper to get as close to rrdtool's performance as I could. Currently
update operations take anywhere from 2 to 3 times as long as rrdtool,
and fetch operations take anywhere from 2 to 5 times as long. This
sounds a lot worse than it is (especially considering it was originally
20x slower for each operation) because in practice the actual difference
is measured in hundreds of microseconds (10^-4), so less than a
millisecond difference for simple cases.
How does whisper work?
Pretty simplistically. See for yourself, visit
http://bazaar.launchpad.net/~graphite-dev/graphite/main/files and click
lib, graphite, then whisper.py.
--------------
Altri link di approfondimento:
https://graphite.readthedocs.org/en/latest/overview.html
http://graphite.wikidot.com/faq
http://graphite.wikidot.com/whisper
--------------
Mi siederò sulle spalle dei giganti, non sarò di certo io a riscrivere
qualcosa che gente con i controcoglioni ha già fatto e quanto sembra
anche bene perchè è utilizzata da migliaia di persone che contribuiscono
a migliorarla.
Ogni riferimento a cose o persone esistenti è puramente casuale... tipo
https://github.com/wlanslovenija/datastream anche questo è casualissimo,
giusto per dimostrarvi che non è vero che ogni community network
rinventa la ruota, noooooo
Federico
More information about the ninux-dev
mailing list