Kubernetes Monitoring

Monitoring dynamic environments

Monitoring Approaches

  • Blackbox monitoring
  • Whitebox monitoring

Blackbox Monitoring

Monitoring externally visible behavior

  • Load
  • Memory
  • Diskspace
  • Processes

Whitebox Monitoring

Monitoring application behavior

  • Metrics representation of internal state
  • Logs
  • ...


Can't provide detailed information the application

Information is only application specific

Big picture requires aggregation


  • de-facto standard for Kubernetes monitoring
  • Time series database
  • Alerting toolkit
  • Query language to access time series data
  • Service discovery for Kubernetes services

Prometheus Limitation

  • Availability over accuracy
  • No high-availability
  • No distributed storage
  • Pull based monitoring

Prometheus Architecture

Prometheus Data Format

Simple text based format

metric_name{label="value"} value timestamp

Each combination of metric_name and labels represents a time series.

# HELP http_requests_total The total number of HTTP requests.
# TYPE http_requests_total counter
http_requests_total{method="post",code="200"} 1027 1395066363000
http_requests_total{method="post",code="400"}    3 1395066363000

# A histogram, which has a pretty complex representation in the text format:
# HELP http_request_duration_seconds A histogram of the request duration.
# TYPE http_request_duration_seconds histogram
http_request_duration_seconds_bucket{le="0.1"} 33444
http_request_duration_seconds_bucket{le="0.5"} 129389
http_request_duration_seconds_bucket{le="1"} 133988
http_request_duration_seconds_bucket{le="+Inf"} 144320
http_request_duration_seconds_sum 53423
http_request_duration_seconds_count 144320

Prometheus Queries

How to query Prometheus time series

Simple queries are similar to the data format


The output is called instant vector, a single value for the current timestamp.

Time series can be selected by labels


Range vectors

A query can also requests all values within a timeframe



Filters can be applied to vectors.

predict_linear(http_requests_total{namespace="default",service="wp-wordpress"}[5m], 24*3600)


Alerting Rules in Prometheus

Alert configuration

  • Alerts are configured in Prometheus
  • Prometheus sends alerts to Alertmanager
  • Alertmanager triggers notifications

Alert definition

- name: example
  - alert: HighErrorRate
    expr: http_requests_total{code=~"5[0-9]{2}"} > 0
    for: 10m
      severity: page
      summary: "High error rate for {{ $ }}"

