Deploying a headless Logstash on Kubernetes

August 2022

Top 2 reasons to deploy Logstash on Kubernetes:

  • You can easily scale up and down to deal with throughput
  • You can easily deploy ETL pipelines to hundreds of instances with a single click

A “headless” Logstash is a Logstash that doesnt contain any pipeline logic in itself, instead it will fetch the pipelines definitions from a centralized location (Elasticsearch itself), so we can create our ETL pipelines through Kibana and deploy them automatically to several Logstash instances. This feature is called Centralized Pipeline Management.

TLDR

Jump to the full Kubernetes manifest file here

We are then going to create a Secret to store credentials, a ConfigMap to configure Logstash, a Deployment that will expose some container ports and finally a Service so Logstash is reachable from the outside.

Elasticsearch API Key

We first need to create an API Key so Logstash can communicate to Elasticsearch to fetch pipeline definitions.

Let’s define two roles: my_role which allows Logstash to fetch the pipeline definitions from Elasticsearch and my_write_role which will allow us to write to a datastream from our output. Those could be two different API Keys, but we are using just one to keep it simple.

Open up Kibana and then run the following command:

POST /_security/api_key
{
  "name": "logstash",   
  "role_descriptors": { 
    "my_role": {
      "cluster": ["monitor" ,"manage_logstash_pipelines"]
    },
    "my_write_role": {
      "index": [
        {
          "names": ["logs-*"],
          "privileges": ["all"]
        }
      ]
    }
  }
}

The response will have the encoded field, copy it:

{
  ...
  "encoded": "RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw=="
}

Secret

Next, we need to create a Kubernetes Secret with our API key, paste the value from the encoded field to a variable we are calling ELASTICSEARCH_API_KEY:

apiVersion: v1
kind: Secret
metadata:
  name: logstash-secrets
type: Opaque
data:
  ELASTICSEARCH_API_KEY: RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw==

ConfigMap

The ELASTICSEARCH_API_KEY variable will be used in a ConfigMap that will represent our logstash.yml with a very simple configuration that tells Logstash:

1) Where to fetch the pipeline configs (the xpack.management.elasticsearch.hosts)

2) What are the pipeline names to fetch (xpack.management.pipeline.id), in our case it will be anything starting with my-pipeline-*

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-configmap
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    log.level: info
    xpack.management.enabled: true
    xpack.management.elasticsearch.hosts: ["https://my-cluster.es.us-east-1.aws.found.io:443"]  
    xpack.management.elasticsearch.api_key: "${ELASTICSEARCH_API_KEY}"
    xpack.management.logstash.poll_interval: 5s
    xpack.management.pipeline.id: ["my-pipeline-*"]

Deployment

Next, we create a Deployment that will expose a couple containerPort, use the ELASTICSEARCH_API_KEY from the Secret logstash-secrets as an environment variable, and use our ConfigMap.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash-deployment
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: docker.elastic.co/logstash/logstash:8.3.3
        ports:
        - containerPort: 5044
        - containerPort: 5045
        resources:
            limits:
              memory: "2Gi"
              cpu: "2500m"
            requests: 
              memory: "1Gi"
              cpu: "300m"
        env: 
          - name: ELASTICSEARCH_API_KEY
            valueFrom:
              secretKeyRef:
                name: logstash-secrets
                key: ELASTICSEARCH_API_KEY
        volumeMounts:
          - name: config-volume
            mountPath: /usr/share/logstash/config
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml

Service

The service will expose the ports 5044 and 5045 to the outside through a LoadBalancer, depending on the provider that is enough - in GKE you will get public ClusterIP automatically assigned.

kind: Service
apiVersion: v1
metadata:
  name: logstash-service
  labels: 
    app: logstash
spec:
  type: LoadBalancer
  selector:
    app: logstash
  ports:
  - protocol: TCP
    port: 5044
    targetPort: 5044
    name: my-pipeline-1
  - protocol: TCP
    port: 5045
    targetPort: 5045
    name: my-pipeline-2

Logstash Pipeline

Our pipeline can open up a HTTP input, Beats input even Syslog on the available ports, for instance:

input {
  http {
    port => 5045
    codec => json
    user => "webhook_admin"
    password => "verysecretpassword"
  }
}

filter { }

output {
    elasticsearch {
        hosts => "https://my-cluster.es.us-east-1.aws.found.io:443"
        ssl => true
        api_key => "${ELASTICSEARCH_API_KEY}"
        data_stream => true
        data_stream_type => "logs"
        data_stream_dataset => "my-datastream"
        data_stream_namespace => "dev"
    }
}

All-in-one manifest

---
apiVersion: v1
kind: Secret
metadata:
  name: logstash-secrets
type: Opaque
data:
  ELASTICSEARCH_API_KEY: RlNXaEU0TUJXQWVtRnlHU3p6d0o6STkzVE9yX1RSTy03TGdiMHU5YWlZZw==

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: logstash-deployment
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app: logstash
  template:
    metadata:
      labels:
        app: logstash
    spec:
      containers:
      - name: logstash
        image: docker.elastic.co/logstash/logstash:8.4.2
        ports:
        - containerPort: 5044
        - containerPort: 5045
        resources:
            limits:
              memory: "2Gi"
              cpu: "2500m"
            requests: 
              memory: "1Gi"
              cpu: "300m"
        env: 
          - name: ELASTICSEARCH_API_KEY
            valueFrom:
              secretKeyRef:
                name: logstash-secrets
                key: ELASTICSEARCH_API_KEY
        volumeMounts:
          - name: config-volume
            mountPath: /usr/share/logstash/config
      volumes:
      - name: config-volume
        configMap:
          name: logstash-configmap
          items:
            - key: logstash.yml
              path: logstash.yml
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: logstash-configmap
data:
  logstash.yml: |
    http.host: "0.0.0.0"
    log.level: info
    xpack.management.enabled: true
    xpack.management.elasticsearch.hosts: ["https://my-cluster.es.us-east-1.aws.found.io:443"]  
    xpack.management.elasticsearch.api_key: "${ELASTICSEARCH_API_KEY}"
    xpack.management.logstash.poll_interval: 5s
    xpack.management.pipeline.id: ["my-pipeline-*"]

---
kind: Service
apiVersion: v1
metadata:
  name: logstash-service
  labels: 
    app: logstash
spec:
  type: LoadBalancer
  selector:
    app: logstash
  ports:
  - protocol: TCP
    port: 5044
    targetPort: 5044
    name: my-pipeline-1
  - protocol: TCP
    port: 5045
    targetPort: 5045
    name: my-pipeline-2