Skill

Docker Compose


name: docker-compose-skill description: Best practices for production-grade Docker Compose multi-service architectures. Use when building docker-compose.yml files with (1) GPU passthrough for AI workloads, (2) health checks and dependency management, (3) secrets injection without environment variables, (4) resource limits and reservations, (5) network isolation, or (6) persistent volume strategies. Triggers on Docker Compose, container orchestration, service dependencies, GPU containers.

Docker Compose Production Patterns

Core Principles

  1. Never put secrets in environment variables - Use Docker secrets with file mounts
  2. Always define health checks - Services should prove they're ready
  3. Explicit resource limits - Prevent runaway containers
  4. Network isolation - Services only see what they need

Service Template

services:
  service-name:
    image: image:tag
    container_name: service-name
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 4G
        reservations:
          cpus: '1'
          memory: 2G
    networks:
      - internal
    depends_on:
      dependency:
        condition: service_healthy

GPU Passthrough (NVIDIA)

services:
  gpu-worker:
    image: ai-worker:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1  # or 'all'
              capabilities: [gpu]
    environment:
      - NVIDIA_VISIBLE_DEVICES=all
      - NVIDIA_DRIVER_CAPABILITIES=compute,utility

Verify GPU access: docker compose exec gpu-worker nvidia-smi

Secrets Management

# Define secrets at top level
secrets:
  postgres_password:
    file: ./secrets/postgres_password.txt
  jwt_secret:
    file: ./secrets/jwt_secret.txt

services:
  api:
    secrets:
      - postgres_password
      - jwt_secret
    # Secrets appear as files in /run/secrets/

Read in application:

def read_secret(name: str) -> str:
    with open(f"/run/secrets/{name}") as f:
        return f.read().strip()

Health Check Patterns

Service TypeHealth Check Command
HTTP APIcurl -f http://localhost:PORT/health
PostgreSQLpg_isready -U postgres
Redisredis-cli ping
TCP Servicenc -z localhost PORT

Dependency Management

services:
  api:
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      
  postgres:
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 5s
      retries: 5

Network Isolation

networks:
  frontend:    # Public-facing services
  backend:     # API and workers
  data:        # Database access only

services:
  caddy:
    networks: [frontend, backend]
  api:
    networks: [backend, data]
  postgres:
    networks: [data]  # Never on frontend

Volume Best Practices

volumes:
  postgres_data:
    driver: local
  redis_data:
    driver: local

services:
  postgres:
    volumes:
      - postgres_data:/var/lib/postgresql/data  # Named volume
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro  # Read-only bind

Logging Configuration

services:
  api:
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
        labels: "service"

Multi-File Composition

# Base + GPU services + monitoring
docker compose -f docker-compose.yml \
               -f docker-compose.gpu.yml \
               -f docker-compose.monitoring.yml \
               up -d

Common Anti-Patterns

Anti-PatternBetter Approach
privileged: trueSpecific capabilities only
Secrets in environment:Use secrets:
depends_on: without conditionUse condition: service_healthy
No resource limitsAlways set limits and reservations
network_mode: hostDefine specific networks
restart: alwaysUse unless-stopped

Production Checklist

  • All secrets use Docker secrets, not env vars
  • Health checks defined for all services
  • Resource limits set for all services
  • Networks isolate data tier from frontend
  • Logging configured with rotation
  • GPU passthrough tested (nvidia-smi in container)
  • Depends_on uses service_healthy condition
  • No privileged: true without justification

ProYaro AI Infrastructure Documentation • Version 1.2