Logstash 8.x - Deploying, Ingesting and Testing the right way

July 2022

The L in ELK, Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite “stash”, currently more than 50 outputs are supported, from Elasticsearch to Syslog.

Logstash is also currently the only option within the Elastic Stack if you want to fetch data that lives in a relational database, or if you want to write the same event to multiple outputs. But all that power also comes with a cost, a Logstash instance will require a lot more resources than an Elastic Agent with Filebeat, for instance.

Here are some of the best practices for deploying Logstash in production in 8.x version, separated in three areas: Deployment, Data Ingestion and Testing

Deploying Logstash

We have several options for deploying Logstash

As standalone

Deploy Logstash always as a system service, using the official packages for YUM or APT based distributions:

Installing Logstash from Package Repositories

In containers

The main advantage here is that you can run several logstash instances at the same time, they might even have different versions. You also have a better process isolation at the system-level.

- Bind-mounted config files

In Kubernetes

Data Ingestion

Use Datastreams instead of indices

Special type of index, data streams are well-suited for logs, events, metrics, and other continuously generated data that are rarely, if ever, updated.
standardized names

a common set of best practices for mappings

elasticsearch {
  hosts => "elasticsearch:9200"
  user => "elastic"
  password => "..."
  data_stream => true
  data_stream_type => "logs"
  data_stream_dataset => "hasura"
  data_stream_namespace => "%{log_type}"
}

Pipelines
- Maintain order? if order is not important, disable it
- Dissect instead of Grok whenever possible

Testing

Testing
- Use a Load Generator for testing