Prérequis
- Un cluster Kubernetes
- Un pod Prometheus
Deployer blackbox-exporter
blackbox-exporter va vous permettre de superviser et de générer des métriques de vos services externes.
Il sait mesurer :
- des requêtes http
- des requêtes tcp
- des requêtes dns
- de l’icmp
On va s’interesser ici à la supervision de ce serveur NodeBB via du http. J’ai utilisé ce deployment (deployment.yaml) :
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: blackbox-exporter
template:
metadata:
labels:
app: blackbox-exporter
spec:
containers:
- image: prom/blackbox-exporter:v0.12.0
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /
port: 9115
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
name: blackbox-exporter
ports:
- containerPort: 9115
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /metrics
port: 9115
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 500m
memory: 128Mi
volumeMounts:
- mountPath: /etc/blackbox_exporter
name: blackbox
restartPolicy: Always
volumes:
- name: blackbox
configMap:
name: blackbox-config
Je gère la configuration de blackbox-exporter dans un configMap (configmap.yaml) :
apiVersion: v1
data:
config.yml: |
modules:
nodebb:
prober: http
timeout: 5s
http:
method: GET
fail_if_not_ssl: true
fail_if_not_matches_regexp:
- "healthy"
kind: ConfigMap
metadata:
name: blackbox-config
namespace: prometheus
Je déploie le tout sur mon cluster Kubernetes :
$ kubectl apply -f configmap.yaml,deployment.yaml
$ kubectl -n prometheus expose deployment blackbox-exporter --port=9115 --target-port=9115
Je peux maintenant récupérer les métriques HTTP de binbash.fr :
$ curl "http://$(kubectl -n prometheus get svc blackbox-exporter -o jsonpath='{.spec.clusterIP}'):9115/probe?target=binbash.fr/sping&module=nodebb"
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.071831404
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.292811509
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length 7
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.057645064
probe_http_duration_seconds{phase="processing"} 0.076625814
probe_http_duration_seconds{phase="resolve"} 0.084556566
probe_http_duration_seconds{phase="tls"} 0.10175307
probe_http_duration_seconds{phase="transfer"} 0.000189276
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 1
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.530091576e+09
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
Configuration de Prometheus
Il y a plusieurs facon de faire. Soit vous faites une configuration statique comme l’exemple donné sur le README du projet blackbox-exporter soit une configuration dynamique qui utilisera les services Kubernetes. Je vais détailler cette méthode.
Dans la configuration de mon Prometheus, j’ai rajouté :
- job_name: 'blackbox'
metrics_path: /probe
kubernetes_sd_configs:
- role: service
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
action: keep
regex: true
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: (.+)(?::\d+);(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_service_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_target]
action: replace
target_label: __param_target
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_target]
action: replace
target_label: url
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_module]
action: replace
target_label: __param_module
On va donc utiliser les annotations du service Kubernetes pour construire le target Prometheus. Il me reste à déclarer un service Kubernetes pour que Prometheus scrape mon blackbox-exporter (binbash-nodebb.yaml) :
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/module: nodebb
prometheus.io/probe: "true"
prometheus.io/target: binbash.fr/sping
name: binbash-nodebb
namespace: prometheus
spec:
ports:
- port: 9115
protocol: TCP
targetPort: 9115
selector:
app: blackbox-exporter
sessionAffinity: None
type: ClusterIP
On retrouve dans les annotations :
- prometheus.io/module : le module blackbox-exporter configuré plus haut.
- prometheus.io/probe : pour indiquer à Prometheus qu’il faut gérér ce service.
- prometheus.io/target : l’url de mon site
Je déploie le service :
$ kubectl apply -f binbash-nodebb.yaml
Je retrouve ensuite dans mes targets Prometheus une nouvelle entrée dans mon service discovery blackbox.
Grafana
Voilà un example très simple de dashboard Grafana :
Alerting
Il est maintenant possible d’alerter si le probe ne fonctionne pas ou si les temps de réponse ne sont pas bons.
Exemple de règles pour Prometheus :
groups:
- name: blackbox.rules
rules:
- alert: UrlProbeFailed
expr: probe_success != 1
for: 5m
labels:
scope: blackbox
severity: critical
annotations:
description: '{{ $labels.url }} probe failed'
summary: URL Probe Failed
- alert: UrlProbeSlow
expr: probe_http_duration_seconds > 1
for: 5m
labels:
scope: blackbox
severity: critical
annotations:
description: '{{ $labels.url }} probe slow ({{ $value }}s)'
summary: URL Probe Slow
- alert: SSLCertExpiringSoon
expr: probe_ssl_earliest_cert_expiry - time() < 86400 * 5
for: 5m
labels:
scope: blackbox
severity: critical
annotations:
description: 'Renew SSL cert of {{ $labels.url }} !'
summary: SSL Cert Expiring Soon