IT Cloud. Eugeny Shtoltc
Data: nested objects;
** Lucene engine;
** JSON join;
** Scalable: Solar Cloud (setting) && ZooKeeper (setting);
** Documentation since 2004.
At the present time, micro-service architecture is increasingly used, which allows due to weak
the connectivity between their components and their simplicity to simplify their development, testing, and debugging.
But in general, the system becomes more difficult to analyze due to its distribution. To analyze the condition
in general, logs are used, collected in a centralized place and converted into an understandable form. Also arises
the need to analyze other data, for example, access_log NGINX, to collect metrics about attendance, mail log,
mail server to detect attempts to guess a password, etc. Take ELK as an example of such a solution. ELK means
a bunch of three products: Logstash, Elasticsearch and Kubana, the first and last of which are heavily focused on the central and
provide ease of use. More generally ELK is called Elastic Stack, since the tool for preparing logs Logstash
can be replaced by analogs such as Fluentd or Rsyslog, and the Kibana renderer can be replaced by Grafana. For example, although
Kibana provides great analysis capabilities, Grafana provides notifications when events occur, and
can be used in conjunction with other products, for example, CAdVisor – analysis of the state of the system and individual containers.
EKL products can be self-installed, downloaded as self-contained containers for which you need to configure
communication or as a single container.
For Elasticsearch to work properly, you need the data to come in JSON format. If the data is submitted to
text format (the log is written in one line, separated from the previous one by a line break), then it can
provide only full-text searches as they will be interpreted as one line. For transmission
logs in JSON format, there are two options: either configure the product under investigation to be output in this format,
for example, for NGINX there is such a possibility. But, often this is impossible, since there is already
the accumulated database of logs, and traditionally they are written in text format. For such cases, it is necessary
post processing of logs from text format to JSON, which is handled by Logstash. It is important to note that if
it is possible to immediately transfer data in a structured form (JSON, XML and others), then this follows
do, because if you do detailed parsing, then any deviation is a one-sided deviation from the format
will lead to inoperability, and if superficial – we lose valuable information. Anyway, parsing in
this system is a bottleneck, although it can be scaled to a limited extent to a service or log
file. Fortunately, more and more products are starting to support structured logging, such as
the latest versions of NGINX support logs in JSON format.
For systems that do not support this format, you can use the conversion to it using such
programs like Logstash, File bear and Fluentd. The first one is included in the standard Elastic Stack delivery from the vendor
and can be installed in one way ELK in Docker – container. It supports fetching data from files, network and
standard stream both at the input and at the output, and most importantly, the native Elastic Search protocol.
Logstash monitors log files based on modification date or receives over the network telnet data from a distributed
systems, for example, containers and, after transformation, it is sent to the output, usually in Elastic Search. It is simple and
comes standard with the Elastic Stack, making it easy and hassle-free to configure. But thanks to
Java machine inside is heavy and not very functional, although it supports plugins, for example, synchronization with MySQL
to send new data. Filebeat provides slightly more options. An enterprise tool for everything
cases of life can serve Fluentd due to its high functionality (reading logs, system logs, etc.),
scalability and the ability to roll out across Kubernetes clusters using the Helm chart, and monitor everything
data center in the standard package, but about this relevant section.
To manage logs, you can use Curator, which can archive old ones from ElasticSearch
logs or delete them, increasing the efficiency of its work.
The process of obtaining logs is logical carried out by special collectors: logstash, fluentd, filebeat or
others.
fluentd is the least demanding and simpler analogue of Logstash. Customization
produced in /etc/td-agent/td-agent.conf, which contains four blocks:
** match – contains settings for transferring received data;
** include – contains information about file types;
** system – contains system settings.
Logstash provides a much more functional configuration language. Logstash agent daemon – logstash monitors
changes in files. If the logs are not located locally, but on a distributed system, then logstash is installed on each server and
runs in agent mode bin / logstash agent -f /env/conf/my.conf . Since run
logstash only as an agent for sending logs is wasteful, then you can use a product from those
the same developers Logstash Forwarder (formerly Lumberjack) forwards logs via the lumberjack protocol to
logstash to the server. You can use the Packetbeat agent to track and retrieve data from MySQL
(https://www.8host.com/blog/sbor-metrik-infrastruktury-s-pomoshhyu-packetbeat-i-elk-v-ubuntu-14-04/).
Also logstash allows you to convert data of different types:
** grok – set regular expressions to rip fields from a string, often for logs from text format to JSON;
** date – in case of archived logs, set the date when the log was created not as the current date, but take it from the log itself
** kv – for logs like key = value;
** mutate – select only the required fields and change the data in the fields, for example, replace the "/" character with "_";
** multiline – for multi-line logs with delimiters.
For example, you can decompose a log in the format "date type number" into components, for example "01.01.2021 INFO 1" decompose into a hash "message":
filter {
grok {
type => "my_log"
match => ["message", "% {MYDATE: date}% {WORD: loglevel} $ {ID.id.int}"]
}
}
The $ {ID.id.int} template takes the class – the ID template, the resulting value will be substituted into the id field and the string value will be converted to the int type.
In the "Output" block, we can specify: output data to the console using the "Stdout" block, to a file – "File", transfer via http via JSON REST API – "Elasticsearch" or send by mail – "Email". You can also order conditions for the fields obtained in the filter block. For instance,:
output {
if [type] == "Info" {
elasticsearch