⏱ Estimated reading time: 10 minutes

Table of Contents

CNCF time

ElasticSearch, Kibana and APM on Kubernetes

By Michael Morello, Principal Software Engineer @ Elastic

Elastic Cloud on Kubernetes

Elastic has a cloud offering for its services. Its promise is to host ElasticSearch, Kibana and other Elastic products in environments managed by Elastic. This offering is now completed with the ability to deploy such infrastructures on Kubernetes. This product is called Elastic Cloud on Kubernetes (ECK).

The recent evolution of Kubernetes motivated Elastic to release a set of Operators, that is extensions to the Kubernetes API adding new features simplifying the deployment and management of more complex resources.

ECK supports vanilla Kubernetes, OpenShift, Google Kubernetes Engine (GKE), Amazon Elastic Kubernetes Service (Amazon EKS), Azure Kubernetes Service (AKS) and many others. It deploys ElasticSearch, Kibana and APM server and brings integration at the operation level (scale, upgrade), topology (hot, warm, cold, masters) and tools (kubectl).

The operator supports ElasticSearch versions >= 6.8 and > 7.0. In its Custom Resource Definition, nodesets.*.config maps to the ElasticSearch config, that is elasticsearch.yaml (documentation). In Kibana’s Custom Resource Definition, it is possible to reference an ElasticSearch cluster using spec.elasticsearchref (documentation). This will create the required user and permissions for Kibana to access ElasticSearch and the Services to ease access to the Pods. Kibana also manages the certificates by default but it is possible to override this behavior and use another certification authority thanks to cert-manager for Let’s Encrypt, for instance.

Kubernetes Operators

An Operator is like an administrator whose knowledge can be automated and presented as an algorithm. On Kubernetes, there are native resources such as Pods. Operators are controllers having the necessary knowledge about a resource, tool or solution and allowing to implement a service using Custom Resources.

When a given Custom Resource gets created, its input is first type checked then sent to its matching operator which fetches the current state of the ecosystem and initiates the Reconciliation Loop to reach the requested state.

ECK creates a dedicated StatefulSet, then generates certificates for the communication between ElasticSearch nodes, which are configurable in the API. The user is then randomly generated and its secret can be imported from a keystore. Eventually, ECK will create the Kubernetes services.

When scaling down, ECK notifies the Pods that will stop so that they can save their data before gracefully disappearing. ECK proceeds similarly for rolling upgrades, except it will notify ElasticSearch beforehand to avoid it from spawning new nodes during the upgrade.

Power users

It is possible to override the default behavior of ECK using podTemplate and volumeClaimTemplates. The former sets the resource required by the components while the latter defines how volumes should be provisioned to fine tune performance, resilience, availability and capacity settings.

To start with ECK, you can run the following command: kubectl apply -f https://download.elastic.co/downloads/eck/1.0.0/all-in-one.yaml. This Operator also manages Role-Based Access Control (RBAC) and can be found on GitHub (source). ECK is also available on the Kubernetes Operator Hub (source).

ECK has been designed and developed by engineers that are experienced with both the Elastic Stack and Kubernetes and only automates tasks when necessary. To send feedback to the team, go to their Discuss forum or leave an issue on their GitHub.

Questions and Answers session

Do all issuers work with cert-manager?

All certificate issuers should work with cert-manager.

Is it possible to add Kibana dashboards with a CRD?

This is not yet possible but it is a very requested feature.

Is ECK production ready?

From version 1.0 onwards and the General Availability announcement, it is safe to state that the solution is production ready.

How many devs are there in the team?

Eight remote software engineers, ten or so people including three French and others coming from Poland, the European Union, the United Kingdom, etc.

Is ECK officially under support?

Yes, from version 1.0 onwards.

How are scaling rules managed?

Currently unopinionated, they are managed by the users. Self-healing capabilities are planned, though.

What was the most difficult part during this project, development-wise?

There were multiple:

  • ECK and Kubernetes are both distributed systems so managing a distributed system with another one;
  • Operators live in the past in the sense that the state they read and process might be different from the actual current state. During an iteration of the reconciliation loop, it is crucial not to apply the same non idempotent mutations like Pod creations.

Log collection and processing on my Kubernetes cluster

By Paul Boutes, Software Engineer @ Elastic

Introduction to logging

When developing an application, is it wise to add logs to understand what is happening. Logging too much makes traces unreadable but too little gives scarce information. It is then necessary to strike the right balance between a sensible logging output and an efficient ingestion of the entries. In the Elastic ecosystem, we often talk about ELK, which stands for ElasticSearch, Logstash, Kibana. ELK has been recently renamed the Elastic Stack, as it has been joined by Beats, which is a collection of specific agents. Today, Logstash would take an ETL or data enrichment approach in such systems. This enrichment is done through metadata in the form of Mapped Diagnostic Contexts, MDCs.

There are two main anti-patterns to avoid when logging: outputting the logs on the standard or error stream using the built-in print function of the technology and strongly coupling the logging technology with the code so that it becomes less and less replaceable.

Beats

To avoid these patterns, Beats agents listen for log files and send data to Logstash. The configuration is very simple as it requires to specify and the collection of file paths to listen to and that the log type is “log”. But there might be an issue: when a log entry contains multiple lines like stack traces, this configuration creates as many events as there are lines in the output!

A simple solution to the multiline problem is to match a pattern. If the log does not follow the matched format, do not commit the current event. To simplify pattern declaration and matching, Beats uses Grok patterns. Kibana even possesses its own Grok debugger (source). This solution is not very different from the previous one but requires imposing a format to the logs and uses regular expression patterns which is impractical and difficult to maintain.

What if logs were sent by the application at its own pace, then? The application would not have to store logs or traces, can use custom formats and still centralize entries to Logstash. Unfortunately, as the number of applications scale, you would not be protected against a downtime from Logstash which would cause an unrecoverable loss of data points. This solution is also coupled with the central logging instance, its formatting, its location.

A few paragraphs ago, it was mentioned that Logstash would serve as an ETL but not more. Since Logstash can become a real single point of failure, we could bypass it entirely if the data is well formatted from the start by using JSON as the input format with the “log” type, defined log file paths and messageKey. To go further, it is recommended adding a hash of the errors and stack traces in order to prioritize the errors that happen the most often. Nevertheless, this solution adds some overhead and requires serializing JSON from the logger side.

It is to note that Beats agents wait for new line characters before sending logs.

Beats in Kubernetes

The naive way to put Beats agents in Kubernetes would be a sidecar but even though the agents are light, this solution is inefficient. A DeamonSet would be more appropriate as it would create a defined number of Beats Pods per Kubernetes node.

FileBeats can detect new workloads on Kubernetes by listening to the Kubernetes API. Hosting Beats in Kubernetes will bring additional advantages: it adds Kubernetes metadata to what is already provided by the application and it is possible to conditionally process events using a filter syntax. By using annotations, the log processing logic is reversed; the application does not declare where it is logging anymore, it just declares that it does and the orchestrator handles the underlying logic. This requires some configuration with hints to explain how inputs should be handled.

For a tutorial with Beats, it is recommended starting here.

Beats components can also do health checking with HeartBeats. System metrics can be handled by MetricBeats. The Beats family becomes more complete every day but it remains nonetheless quite complex to figure out.

Questions and Answers session

Is it possible to log more than just system metrics with MetricBeats?

This should not be possible. It is worth trying, though.

How does it look from the performance perspective, considering log size?

In production, this system would be fronted by a Kafka cluster and different types of Beats agents would be made available to process most of the incoming log types efficiently.