Loki Log Aggregation

What is Loki?

Loki is a log aggregation system designed by Grafana Labs that works like Prometheus but for logs. Key features:

Label-based indexing: Index logs by labels instead of full text (reducing storage costs)
LogQL: Powerful query language similar to Prometheus PromQL
Scalability: Processes multi-terabyte log volumes efficiently
Grafana integration: Native datasource support for visualization
Multiple scrapers: Promtail, Filebeat, Fluentd compatibility
Cost-effective: Lower resource usage compared to traditional log stacks

Docker Compose Installation

Create a complete Loki stack with Promtail and Grafana:

version: '3.8'

services:
  loki:
    image: grafana/loki:latest
    container_name: loki
    ports:
      - "3100:3100"
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki_data:/loki
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - logging

  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    volumes:
      - ./promtail-config.yaml:/etc/promtail/config.yml
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /var/run/docker.sock:/var/run/docker.sock
    command: -config.file=/etc/promtail/config.yml
    depends_on:
      - loki
    networks:
      - logging

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
    volumes:
      - grafana_data:/var/lib/grafana
    depends_on:
      - loki
    networks:
      - logging

volumes:
  loki_data:
  grafana_data:

networks:
  logging:

Deploy the stack:

docker-compose up -d

Loki Configuration

Create loki-config.yaml:

auth_enabled: false

ingester:
  chunk_idle_period: 3m
  max_chunk_age: 1h
  max_streams_per_user: 10000
  lifecycler:
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  filesystem:
    directory: /loki/chunks
  boltdb_shipper:
    active_index_directory: /loki/boltdb-shipper-active
    shared_store: filesystem

retention_config:
  enabled: true
  retention_deletes_enabled: true
  retention_period: 720h  # 30 days

limits_config:
  ingestion_rate_mb: 128
  ingestion_burst_size_mb: 256
  max_line_length: 262144
  reject_old_samples: true
  reject_old_samples_max_age: 168h

server:
  http_listen_port: 3100
  log_level: info

Promtail Configuration

Create promtail-config.yaml:

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Scrape system logs
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: system
          __path__: /var/log/*.log
    pipeline_stages:
      - multiline:
          line_start_pattern: '^\d{4}-\d{2}-\d{2}'
      - regex:
          expression: '(?P<timestamp>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2})\s+(?P<level>\w+)\s+(?P<message>.*)'
      - timestamp:
          source: timestamp
          format: "2006-01-02 15:04:05"
      - labels:
          level:

  # Scrape nginx access logs
  - job_name: nginx
    static_configs:
      - targets:
          - localhost
        labels:
          job: nginx
          __path__: /var/log/nginx/access.log
    pipeline_stages:
      - regex:
          expression: '(?P<ip>\S+) - (?P<user>\S+) \[(?P<timestamp>[^\]]+)\] "(?P<method>\w+) (?P<path>\S+) (?P<protocol>\S+)" (?P<status>\d+) (?P<size>\d+)'
      - timestamp:
          source: timestamp
          format: "02/Jan/2006:15:04:05 -0700"
      - labels:
          job: nginx
          status:
          method:

  # Docker container logs
  - job_name: docker
    static_configs:
      - targets:
          - localhost
        labels:
          job: docker
    docker:
      host: unix:///var/run/docker.sock
      labels:
        container_name:
        image_name:
    pipeline_stages:
      - json:
          expressions:
            level: level
            message: msg
      - labels:
          container_name:
          level:

Add Loki Datasource to Grafana

Open Grafana: http://localhost:3000 (default: admin/admin)
Navigate to Connections → Add new connection
Search for Loki
Click Create a Loki data source
Configure:
- Name: Loki
- URL: http://loki:3100
- Skip TLS Verify: true (for local/testing)
Click Save & test

LogQL Query Language

Basic Queries

# Select all logs from a job
{job="nginx"}

# Filter by label value
{job="nginx", status="500"}

# Pattern matching
{job="nginx"} |= "error"

# Regex matching
{job="nginx"} |~ "5\d\d"

# Exclude pattern
{job="nginx"} != "200"

Metric Queries

Convert logs to metrics:

# Count logs per second
rate({job="nginx"} [1m])

# Count 5xx errors per minute
sum by (method) (rate({job="nginx", status=~"5.."} [5m]))

# Bytes per second (requires parsing)
sum by (job) (rate({job="nginx"} | json | unwrap size [1m]))

Log Parsing

Extract fields from logs:

# JSON parsing
{job="app"} | json | level="error"

# Regex parsing with extraction
{job="nginx"} | regexp "status=(?P<status>\d+)" | status="500"

# Pattern parsing
{job="app"} | pattern "<_> - <_> [<_>] \"<method> <path> <_>\" <status> <size>"

Advanced Queries

# Top 10 status codes
topk(10, count_over_time({job="nginx"} | json [5m]))

# Error rate percentage
(
  sum(rate({job="app"} |= "error" [5m]))
  /
  sum(rate({job="app"} [5m]))
) * 100

# Logs with parsing multiple fields
{job="app"}
| json
| line_format "{{.timestamp}} [{{.level}}] {{.message}}"
| level="ERROR"

Docker Logging Driver

Enable Loki logging driver for all containers:

# Install plugin (if not using native driver)
docker plugin install grafana/loki-docker-driver:latest --alias loki --grant-all-permissions

Configure in docker-compose.yml:

services:
  myapp:
    image: myapp:latest
    logging:
      driver: loki
      options:
        loki-url: "http://loki:3100/loki/api/v1/push"
        loki-batch-size: "400"
        labels: "job=myapp,env=production"
        max-buffer-size: "4m"

Or configure in /etc/docker/daemon.json for all containers:

{
  "log-driver": "loki",
  "log-opts": {
    "loki-url": "http://localhost:3100/loki/api/v1/push",
    "labels": "job=docker"
  }
}

Nginx Access Log Pipeline

Advanced Promtail pipeline for parsing Nginx logs:

- job_name: nginx
  static_configs:
    - targets:
        - localhost
      labels:
        job: nginx
        __path__: /var/log/nginx/access.log
  pipeline_stages:
    # Parse nginx combined log format
    - regex:
        expression: |
          ^(?P<remote_addr>[\w\.\-]+) (?P<remote_user>[\w\.\-]+|-) \[(?P<time_local>[^\]]+)\]
          \"(?P<method>\w+) (?P<uri>[^ ]+) (?P<protocol>[^ ]+)\"
          (?P<status>\d+) (?P<bytes_sent>\d+|-)
          \"(?P<http_referer>[^\"]*|-)\" \"(?P<http_user_agent>[^\"]*|-)\"

    # Extract timestamp
    - timestamp:
        source: time_local
        format: "02/Jan/2006:15:04:05 -0700"

    # Convert bytes_sent to number
    - metrics:
        bytes_sent:
          type: counter
          description: "Total bytes sent"
          source: bytes_sent

    # Add labels
    - labels:
        status:
        method:
        remote_addr:

Retention Policy Configuration

Automatically clean up old logs:

In loki-config.yaml:

retention_config:
  enabled: true
  retention_deletes_enabled: true
  retention_period: 720h  # 30 days

compactor:
  working_directory: /loki/boltdb-shipper-compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 10m
  retention_delete_worker_count: 10

Scaling Considerations

For production deployments:

# Increase ingestion limits
limits_config:
  ingestion_rate_mb: 512
  ingestion_burst_size_mb: 1024
  max_streams_per_user: 50000

# Tune chunk settings
ingester:
  chunk_idle_period: 1h
  max_chunk_age: 2h
  chunk_encoding: snappy

# Configure persistent storage
storage_config:
  s3:
    s3: "s3://bucket-name/path"
    endpoint: "s3.amazonaws.com"
    region: "us-east-1"

Alloy is the modern replacement for Promtail. It's a vendor-neutral distribution of the OpenTelemetry Collector:

Better performance and resource efficiency
Support for metrics, traces, and logs (not just logs)
More flexible pipeline configuration
Replaces Promtail, Fluentd, and Telegraf

Install: docker run grafana/alloy:latest

Troubleshooting

Check Loki Health

# From inside container
curl http://loki:3100/ready

# Check metrics
curl http://loki:3100/metrics

Verify Promtail Connection

# Check Promtail logs
docker logs promtail

# Verify position file is updating
docker exec promtail cat /tmp/positions.yaml

Query Empty Results

Verify labels are set correctly in scrape config
Check Promtail is scraping files: look at /tmp/positions.yaml
Test LogQL query with simpler pattern: {job="nginx"}
Check retention hasn't deleted logs: sum(rate({job="nginx"}[5m]))

Next Steps

Set up alerting rules in Grafana for error logs
Create dashboards for application-specific metrics
Implement log sampling for high-volume applications
Configure S3-compatible storage for long-term retention
Migrate from ELK stack to Loki for cost savings

On this page