Skill

Docker Compose

name: docker-compose-skill description: Best practices for production-grade Docker Compose multi-service architectures. Use when building docker-compose.yml files with (1) GPU passthrough for AI workloads, (2) health checks and dependency management, (3) secrets injection without environment variables, (4) resource limits and reservations, (5) network isolation, or (6) persistent volume strategies. Triggers on Docker Compose, container orchestration, service dependencies, GPU containers.

Docker Compose Production Patterns

Core Principles

Never put secrets in environment variables - Use Docker secrets with file mounts
Always define health checks - Services should prove they're ready
Explicit resource limits - Prevent runaway containers
Network isolation - Services only see what they need

Service Template

services:
  service-name:
    image: image:tag
    container_name: service-name
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    networks:
      - internal
    depends_on:
      dependency:
        condition: service_healthy

GPU Passthrough (NVIDIA)

services:
  gpu-worker:
    image: ai-worker:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1  # or 'all'
              capabilities: [gpu]
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility

Verify GPU access: docker compose exec gpu-worker nvidia-smi

Secrets Management

# Define secrets at top level
secrets:
  postgres_password:
    file: ./secrets/postgres_password.txt
  jwt_secret:
    file: ./secrets/jwt_secret.txt

services:
  api:
    secrets:
      - postgres_password
      - jwt_secret
    # Secrets appear as files in /run/secrets/

Read in application:

def read_secret(name: str) -> str:
    with open(f"/run/secrets/{name}") as f:
        return f.read().strip()

Health Check Patterns

Service Type	Health Check Command
HTTP API	`curl -f http://localhost:PORT/health`
PostgreSQL	`pg_isready -U postgres`
Redis	`redis-cli ping`
TCP Service	`nc -z localhost PORT`

Dependency Management

services:
  api:
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      
  postgres:
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

Network Isolation

networks:
  frontend:    # Public-facing services
  backend:     # API and workers
  data:        # Database access only

services:
  caddy:
    networks: [frontend, backend]
  api:
    networks: [backend, data]
  postgres:
    networks: [data]  # Never on frontend

Volume Best Practices

volumes:
  postgres_data:
    driver: local
  redis_data:
    driver: local

services:
  postgres:
    volumes:
      - postgres_data:/var/lib/postgresql/data  # Named volume
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro  # Read-only bind

Logging Configuration

services:
  api:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
        labels: "service"

Multi-File Composition

# Base + GPU services + monitoring
docker compose -f docker-compose.yml \
               -f docker-compose.gpu.yml \
               -f docker-compose.monitoring.yml \
               up -d

Common Anti-Patterns

Anti-Pattern	Better Approach
`privileged: true`	Specific capabilities only
Secrets in `environment:`	Use `secrets:`
`depends_on:` without condition	Use `condition: service_healthy`
No resource limits	Always set limits and reservations
`network_mode: host`	Define specific networks
`restart: always`	Use `unless-stopped`

Production Checklist

All secrets use Docker secrets, not env vars
Health checks defined for all services
Resource limits set for all services
Networks isolate data tier from frontend
Logging configured with rotation
GPU passthrough tested (nvidia-smi in container)
Depends_on uses service_healthy condition
No privileged: true without justification

ProYaro AI Infrastructure Documentation • Version 1.2