Date Tags Devops

Recently I've been playing around with Prometheus. For now I think it is the best open source solution for monitoring (in the same way that chlamydia is probably the best STD). Previously I was a fan of Sensu, but honestly there are just too many moving parts to go wrong with Sensu, which meant they inevitably did.

So, why do I like Prometheus? Basically, it stays pretty close to the UNIX philosophy of doing one thing and doing it well - basically it is just a time-series database. Alerting is a seperate module for example and graphing is pretty much left to Grafana. Initially I was not taken by it for one simple reason:

  • All its configuration is central.

Unlike with Sensu, a node cannot announce itself to the Prometheus server and then be automatically monitored. In this day an age, that sucks. However, while browsing the docs I discovered that it supports service discovery.

So the process:

  • Use Puppet to configure Prometheus
  • Individual nodes announce to Consul what services they have
  • Prometheus collects its endpoints from Consul

This looks something like this:

Here is a network of 5 machines:

  • Prometheus (also the Consul server
  • Puppet
  • 3 that will be monitored

This is a very simple consul cluster. Normally one would have at least 3 masters (ideally more) spread accross different datacentres. It works for this demo though.

Right, let's jump straight into the Puppet code. I am using the classic 'Roles and Profiles' pattern. You can find my control repo here. There are a few Puppet modules necessarry, so your Puppetfile will contain:

forge 'http://forge.puppetlabs.com'

mod 'KyleAnderson/consul', '2.1.0'
mod 'puppet/archive', '1.3.0'
mod 'puppetlabs/stdlib', '4.15.0'
mod 'puppetlabs/firewall', '1.8.2'

mod 'prometheus',
    :git => 'https://github.com/voxpupuli/puppet-prometheus.git'

To begin with, lets install Node Exporter everywhere. This will collect basic system stats and make them available to Prometheus.

In common.yaml:

---
prometheus::node_export: 0.13.0

and in your profile::base:

class profile::base {
  include ::prometheus::node_exporter
  firewall {'102 node exporter':
    dport  => 9100,
    proto  => tcp,
    action => accept,
  }
}

Consul needs to be everywhere and you need to announce to it that the node exporter is there, so in your base profile:

class profile::base {
  include ::consul
  firewall { '103 Consul':
    dport  => [8400, 8500],
    proto  => tcp,
    action => accept,
  }
}

And in common.yaml:

---
consul::version: 0.7.4
consul::config_hash:
  data_dir: '/opt/consul'
  datacenter: 'homelab'
  log_level: 'INFO'
  node_name: "%{::hostname}"
  retry_join: 
    -  192.168.1.89
consul::services:
  node_exporter:
    address: "%{::fqdn}"
    checks:
      - http: http://localhost:9100
        interval: 10s
    port: 9100
    tags:
      - monitoring

Obviously modify the retry_join to suite your infrastructure. If you are doing the right thing and have a cluster, just expand the array down.

For the consul master create a profile that contains:

profile::consulmaster {
  firewall { '102 consul inbound':
    dport  => [8300, 8301, 8302, 8600],
    proto  => tcp,
    action => accept,
  }
}

You need the following in Hiera applied to that node(s):

---
consul::version: 0.7.4
consul::config_hash:
  bootstrap_expect: 1
  data_dir: '/opt/consul'
  datacenter: 'homelab'
  log_level: 'INFO'
  server: true
  node_name: "%{::hostname}"

Change bootstrap_expect to match what you need.

To configure the prometheus server itself create profile::prometheus:

class profile::prometheus {
  firewall { '100 Prometheus inbound':
    dport  => [9090,9093],
    proto  => tcp,
    action =>  accept,
  }

  class { 'prometheus':
    scrape_configs => [
      {
        'job_name'         => 'consul',
        'consul_sd_configs' => [
          {
            'server' => 'localhost:8500',
            'services' => [
              'node_exporter',
            ],
          },
        ],
      },
    ],
  }
}

This will create a scrape config that queries consul for all services named 'node_exporter'.

Finally, the hiera for your prometheus node will look like:

---
classes:
  - profile::prometheus
prometheus::version: '1.5.0'

That is it!

As an aside, the basic ideas here are based on Gareth Rushgrove's excellent presentation about having 2 different speeds of configuration management. Basically, Puppet is the slow and stable speed then, in parallel, Consul gives another path that is much more reactive.


Comments

comments powered by Disqus