Skill
Docker Compose
name: docker-compose-skill description: Best practices for production-grade Docker Compose multi-service architectures. Use when building docker-compose.yml files with (1) GPU passthrough for AI workloads, (2) health checks and dependency management, (3) secrets injection without environment variables, (4) resource limits and reservations, (5) network isolation, or (6) persistent volume strategies. Triggers on Docker Compose, container orchestration, service dependencies, GPU containers.
Docker Compose Production Patterns
Core Principles
- Never put secrets in environment variables - Use Docker secrets with file mounts
- Always define health checks - Services should prove they're ready
- Explicit resource limits - Prevent runaway containers
- Network isolation - Services only see what they need
Service Template
services:
service-name:
image: image:tag
container_name: service-name
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
deploy:
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
networks:
- internal
depends_on:
dependency:
condition: service_healthy
GPU Passthrough (NVIDIA)
services:
gpu-worker:
image: ai-worker:latest
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1 # or 'all'
capabilities: [gpu]
environment:
- NVIDIA_VISIBLE_DEVICES=all
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
Verify GPU access: docker compose exec gpu-worker nvidia-smi
Secrets Management
# Define secrets at top level
secrets:
postgres_password:
file: ./secrets/postgres_password.txt
jwt_secret:
file: ./secrets/jwt_secret.txt
services:
api:
secrets:
- postgres_password
- jwt_secret
# Secrets appear as files in /run/secrets/
Read in application:
def read_secret(name: str) -> str:
with open(f"/run/secrets/{name}") as f:
return f.read().strip()
Health Check Patterns
| Service Type | Health Check Command |
|---|---|
| HTTP API | curl -f http://localhost:PORT/health |
| PostgreSQL | pg_isready -U postgres |
| Redis | redis-cli ping |
| TCP Service | nc -z localhost PORT |
Dependency Management
services:
api:
depends_on:
postgres:
condition: service_healthy
redis:
condition: service_healthy
postgres:
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 5s
retries: 5
Network Isolation
networks:
frontend: # Public-facing services
backend: # API and workers
data: # Database access only
services:
caddy:
networks: [frontend, backend]
api:
networks: [backend, data]
postgres:
networks: [data] # Never on frontend
Volume Best Practices
volumes:
postgres_data:
driver: local
redis_data:
driver: local
services:
postgres:
volumes:
- postgres_data:/var/lib/postgresql/data # Named volume
- ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro # Read-only bind
Logging Configuration
services:
api:
logging:
driver: "json-file"
options:
max-size: "10m"
max-file: "3"
labels: "service"
Multi-File Composition
# Base + GPU services + monitoring
docker compose -f docker-compose.yml \
-f docker-compose.gpu.yml \
-f docker-compose.monitoring.yml \
up -d
Common Anti-Patterns
| Anti-Pattern | Better Approach |
|---|---|
privileged: true | Specific capabilities only |
Secrets in environment: | Use secrets: |
depends_on: without condition | Use condition: service_healthy |
| No resource limits | Always set limits and reservations |
network_mode: host | Define specific networks |
restart: always | Use unless-stopped |
Production Checklist
- All secrets use Docker secrets, not env vars
- Health checks defined for all services
- Resource limits set for all services
- Networks isolate data tier from frontend
- Logging configured with rotation
- GPU passthrough tested (
nvidia-smiin container) - Depends_on uses
service_healthycondition - No
privileged: truewithout justification
ProYaro AI Infrastructure Documentation • Version 1.2