7,540
edits
Changes
→Collecting metrics from the access logs
Necessary steps:
* We have to create a docker image from grok-exporter that has rsyslog in the image. (The container must be able to run the rsyslog server as root, which requires extra openShfit configuration)
* The grok-exporter configuration will be placed into a in OpenShfit ConfigMap and the rsyslog workspace must be an OpenShift volume (writing into the a containers file system in runtime is really inefficient)
* We have to create a ClasterIP-type service that can perform load-balancing between grok-exporter pods.
* The HAproxy routers should be configured to write access logs in debug mode and send them to the remote rsyslog server running next to the grok-exporter.
* The rsyslog server running in the grok-exporter pod will both write the received HAproxy access logs into the file ('''/var/log/messages''' (- emptyDir type volume) and sends them to '''stdout ''' as wellfor central log processing.
* Logs written to stdout will be picked up by the docker-log-driver and forwarded to the centralized log architecture (log retention)
* The grok-exporter program reads '''/var/log/messages''', and generates prometheus Prometheus metrics from the HAproxy access-logs.* The Promethues have Prometheus scrape config has to be configured to use extended with a '''kubernetes_sd_configs''' to section. Prometheus must collect the metrics directly collect metric from the grok-exporter pods, not through the Kubernetes service to bypass load-balancing
<br>
==introduction of grok-exporter==
Grok-exporter is a tool that can process logs based on regular expressions and covert convert them into to one of the 4 basic prometheus Prometheus metrics:
* gauge
* counter
* kvantilis
Grok-exporter is based on the implementation of '''logstash-grok'' ', and grok-exporter is using patterns and functions defined in for logstash.
Detailed documentation: <br>
The grok-exporter can read form three types of input sources:
* '''file''': we will stick to this
* '''webhook''': This solution could also be used with logstash used as rsyslog server. Logstash can send the logs to the grok-exporter webhook with the logstash plugin "http-output"* '''stdin''': With rsyslog, stdin can also be used. This requires the use of the '''omprog''' program, that can read data from stockets sockets and pas the content read pass on data through stdin: https://www.rsyslog.com/doc/v8-stable/configuration/modules/omprog.html
=== Alternative Solutions ===
'''Fluentd''' <br>
* fluent-plugin-rewrite-tag-filter
* fluent-plugin-prometheus
'''mtail''':<br>
The other alternative solution would be google's '''mtail''', which is said to be more efficient in processing logs than the grok engine.<br>
https://github.com/google/mtail
* global:
* input: Tells you where and how to retrieve logs. Can be stdin, file and webhook. We will use file input.
* grok: Location of the grok patterns. Pattern definition will be are stored in /grok/patterns folderby default.
* metrics: This is the most important part. Here you need to define the metrics and the associated regular expression
* server: Contains the port of the http metrics server.
Each definition contains 4 parts:
* name: This will be the name of the metric
* help: This will be is the help text for the metric.* match: Describes the structure of the log string in a regular expression styleformat. Here you can use pre-defined grok patterns:
** '''BASIC grok patterns''': https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/grok-patterns
** '''HAROXY patterns''': https://github.com/logstash-plugins/logstash-patterns-core/blob/master/patterns/haproxy
<br>
==== match definition====
Grok assumes that each element is separated by a single space in the source log files. In the match section, you have to write a regular expression using grok building blocks. Each building block has the format: '''%{PATTERN_NAME}''' where PATTERN_NAME must be an existing predefined grok pattern. The most common type is '''%{DATA}''', which refers to an arbitrary data structure that contains no withe-space. There are several compound patterns that are build up from basic grok patterns. We can assign the regular expression result groups to named variables that can be used as the value of the Prometheus metric or as label values. The variable name must be placed inside the curly bracket of the pattern separated by a semicolon from the patter name. :
<pre>
%{DATA:this_is_the_name}
Lets assume that the 'this_is_the_name' variables value is 'myvalue'. Then the metric would receive the following label: '''{mylabel = "myvalue"}''' <br>
We are going to demonstrate it with an end-to-end a full, metric definition example: in the following section <br>
The following log line is given:
<pre>
7/30/2016 2:37:03 PM adam 1.5
</Pre>
And there is given the following metric rule definition in the grok config:
<source lang="C++">
metrics:
</source>
<pre>
# HELP Example counter metric with labels.
<br>
==== Value of the metric ====
For a counter-type metric, we don't need to determine the value of the metric, because as it will just simply count the number of matches of the regular expression. In contrast, for all other types, we have to specify the value. This should It has be defined in the '''value''' section of the metric definition. We also have to reference here a variable defined Variables can be referenced in the match section like same way as we did saw it in in the label definition. The variable must be referenced chapter, in go-template style. Here is an example. The following two log lines are given:
<pre>
7/30/2016 2:37:03 PM adam 1
<br>
==== Functions ====
* add
* subtract
* multiply
* divide
Functions have the following syntax: <pre> {{FUNCTION_NAME ATTR1 ATTR2}} </pre> where ATTR1 and ATTR2 can be either a natural number or a variable name. The variable name must start with a dot. Here is an example using the multiply function for on the the 'grok_example_lines' metrics metric definition form the example above:
<source lang = "C ++">
value: "{{multiply .val 1000}}"
</source>
Explanation:
* '''type:file''' -> read logs from file
* '''path: /var/log/messages''' -> The rsyslog server writes logs to /var/log/messages by default
* '''readall: true''' -> always reads the entire log file. This should only be used for testing, in a live environment, this always has to be set to false.
* '''patterns_dir: ./patterns''' -> Base directory of the Pattern pattern definitions in the docker image
* <pre> value: "{{divide .Tt 1000}}" </pre> The response time in the HAproxy log is in milliseconds so we convert it to seconds.
* '''port: 9144''' -> The http port of the /metrics endpoint