Docker Production Optimization: Advanced Security, Performance & Best Practices

Taking Docker containers from development to production requires careful consideration of security, performance, and operational concerns. This comprehensive guide covers advanced techniques for optimizing Docker deployments in enterprise environments, ensuring your containerized applications are secure, efficient, and maintainable at scale.

Production-Ready Container Architecture
Advanced Multi-Stage Build Optimization
Container Security Hardening
Performance Optimization Strategies
Resource Management and Limits
Image Optimization and Registry Management
Monitoring and Observability
CI/CD Pipeline Integration
High Availability and Scaling
Troubleshooting Production Issues

Production-Ready Container Architecture

Designing for Production

Production containers require a fundamentally different approach than development environments. Here's how to architect containers for enterprise use:

Twelve-Factor App Principles for Containers:

# Production-optimized Node.js application
FROM node:18.19.0-alpine AS base

# Install security updates
RUN apk update && apk upgrade && \
    apk add --no-cache dumb-init && \
    rm -rf /var/cache/apk/*

# Create application directory with proper permissions
WORKDIR /usr/src/app

# Create non-privileged user
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001 -G nodejs

# Dependencies stage
FROM base AS deps
COPY package*.json ./
RUN npm ci --only=production --ignore-scripts && \
    npm cache clean --force

# Build stage
FROM base AS builder
COPY package*.json ./
RUN npm ci --ignore-scripts
COPY . .
RUN npm run build && \
    npm prune --production

# Production stage
FROM base AS runner

# Copy built application
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/dist ./dist
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/node_modules ./node_modules
COPY --from=builder --chown=nextjs:nodejs /usr/src/app/package.json ./package.json

# Switch to non-root user
USER nextjs

# Health check with proper timeout and retries
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

# Use dumb-init for proper signal handling
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/server.js"]

# Metadata labels
LABEL org.opencontainers.image.title="My Production App" \
      org.opencontainers.image.description="Production-ready Node.js application" \
      org.opencontainers.image.version="1.0.0" \
      org.opencontainers.image.created="2025-01-24T00:00:00Z" \
      org.opencontainers.image.source="https://github.com/user/repo"

Environment-Specific Configuration

# Production environment variables
cat > .env.production << EOF
NODE_ENV=production
PORT=3000
LOG_LEVEL=info
DATABASE_SSL=true
REDIS_TLS=true
METRICS_ENABLED=true
CORS_ORIGIN=https://yourdomain.com
SESSION_SECURE=true
HELMET_ENABLED=true
EOF

# Development environment variables
cat > .env.development << EOF
NODE_ENV=development
PORT=3000
LOG_LEVEL=debug
DATABASE_SSL=false
REDIS_TLS=false
METRICS_ENABLED=false
CORS_ORIGIN=*
SESSION_SECURE=false
HELMET_ENABLED=false
EOF

Advanced Multi-Stage Build Optimization

Sophisticated Build Patterns

1. Parallel Build Stages

# Multi-language application with parallel builds
FROM node:18-alpine AS frontend-deps
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci --only=production

FROM node:18-alpine AS frontend-build
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci
COPY frontend/ .
RUN npm run build

FROM golang:1.21-alpine AS backend-deps
WORKDIR /app/backend
COPY backend/go.mod backend/go.sum ./
RUN go mod download

FROM golang:1.21-alpine AS backend-build
WORKDIR /app/backend
COPY backend/ .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o main .

# Final production image
FROM alpine:3.18
RUN apk --no-cache add ca-certificates tzdata
WORKDIR /root/

# Copy built artifacts
COPY --from=backend-build /app/backend/main .
COPY --from=frontend-build /app/frontend/dist ./static

# Create non-root user
RUN adduser -D -s /bin/sh appuser
USER appuser

CMD ["./main"]

2. Cached Dependency Layers

# Advanced dependency caching strategy
FROM node:18-alpine AS base
WORKDIR /app

# Copy package files for caching
COPY package*.json ./
COPY packages/shared/package*.json ./packages/shared/
COPY packages/api/package*.json ./packages/api/
COPY packages/web/package*.json ./packages/web/

# Install dependencies with cache mount
RUN --mount=type=cache,target=/root/.npm \
    npm ci --only=production

# Development dependencies stage
FROM base AS dev-deps
RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Build stage
FROM dev-deps AS builder
COPY . .
RUN npm run build:all

# Production runtime
FROM node:18-alpine AS runtime
WORKDIR /app

# Copy production files
COPY --from=base /app/node_modules ./node_modules
COPY --from=builder /app/dist ./dist
COPY package.json ./

# Security updates
RUN apk update && apk upgrade && rm -rf /var/cache/apk/*

USER node
CMD ["npm", "start"]

Build Optimization Techniques

1. BuildKit Features

# syntax=docker/dockerfile:1.6

FROM python:3.11-slim AS base

# Use BuildKit cache mounts
RUN --mount=type=cache,target=/var/cache/apt \
    --mount=type=cache,target=/var/lib/apt \
    apt-get update && \
    apt-get install -y --no-install-recommends \
        build-essential \
        gcc && \
    rm -rf /var/lib/apt/lists/*

# Cache pip dependencies
FROM base AS deps
WORKDIR /app
COPY requirements.txt .
RUN --mount=type=cache,target=/root/.cache/pip \
    pip install -r requirements.txt

# Multi-platform builds
FROM base AS runner
COPY --from=deps /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=deps /usr/local/bin /usr/local/bin
COPY . .

CMD ["python", "app.py"]

2. Build Context Optimization

# .dockerignore for minimal build context
node_modules
npm-debug.log
.git
.gitignore
README.md
.env
.nyc_output
coverage
.cache
.pytest_cache
__pycache__
*.pyc
*.pyo
*.pyd
.vscode
.idea
*.swp
*.swo
.DS_Store

Container Security Hardening

Security-First Container Design

1. Minimal Base Images and Distroless

# Using distroless images for maximum security
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -ldflags '-extldflags "-static"' -o main .

# Distroless runtime
FROM gcr.io/distroless/static-debian11:nonroot
COPY --from=builder /app/main /
USER nonroot:nonroot
ENTRYPOINT ["/main"]

2. Security Scanning Integration

#!/bin/bash
# Security scanning script for CI/CD

set -e

IMAGE_NAME=$1
SEVERITY_THRESHOLD="HIGH"

echo "🔍 Scanning image for vulnerabilities..."

# Trivy security scan
trivy image \
  --exit-code 1 \
  --severity ${SEVERITY_THRESHOLD},CRITICAL \
  --no-progress \
  --format json \
  --output trivy-report.json \
  ${IMAGE_NAME}

# Snyk container scan
snyk container test ${IMAGE_NAME} \
  --severity-threshold=high \
  --json > snyk-report.json

# Docker Scout scan
docker scout cves ${IMAGE_NAME} \
  --format json \
  --output scout-report.json

echo "✅ Security scan completed"

# Check for critical vulnerabilities
if [ $? -eq 0 ]; then
    echo "✅ No critical vulnerabilities found"
else
    echo "❌ Critical vulnerabilities detected"
    exit 1
fi

3. Runtime Security Configuration

# Production container with security hardening
docker run -d \
  --name secure-app \
  --user 1001:1001 \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=100m \
  --tmpfs /var/run:rw,noexec,nosuid,size=50m \
  --cap-drop=ALL \
  --cap-add=NET_BIND_SERVICE \
  --security-opt=no-new-privileges:true \
  --security-opt=apparmor:docker-default \
  --ulimit nofile=65536:65536 \
  --ulimit nproc=4096:4096 \
  --memory=512m \
  --memory-swap=512m \
  --cpus="1.0" \
  --restart=unless-stopped \
  --health-cmd="curl -f http://localhost:3000/health || exit 1" \
  --health-interval=30s \
  --health-timeout=10s \
  --health-retries=3 \
  -p 3000:3000 \
  my-secure-app:latest

Secrets Management

1. External Secrets Integration

# docker-compose.prod.yml with external secrets
version: '3.8'

services:
  app:
    image: my-app:latest
    environment:
      - NODE_ENV=production
    secrets:
      - db_password
      - api_key
      - jwt_secret
    deploy:
      resources:
        limits:
          memory: 512M
          cpus: '1.0'
        reservations:
          memory: 256M
          cpus: '0.5'

secrets:
  db_password:
    external: true
    name: app_db_password_v1
  api_key:
    external: true
    name: app_api_key_v1
  jwt_secret:
    external: true
    name: app_jwt_secret_v1

2. Runtime Secrets Injection

# Using init containers for secrets
apiVersion: v1
kind: Pod
spec:
  initContainers:
  - name: secret-fetcher
    image: vault:latest
    command: ['sh', '-c']
    args:
    - |
      vault auth -method=kubernetes
      vault kv get -field=password secret/myapp/db > /shared/db_password
      vault kv get -field=key secret/myapp/api > /shared/api_key
    volumeMounts:
    - name: shared-secrets
      mountPath: /shared
  containers:
  - name: app
    image: my-app:latest
    volumeMounts:
    - name: shared-secrets
      mountPath: /etc/secrets
      readOnly: true
  volumes:
  - name: shared-secrets
    emptyDir:
      medium: Memory

Performance Optimization Strategies

Image Size Optimization

1. Layer Optimization Techniques

# Optimized Python application
FROM python:3.11-slim AS base

# Install system dependencies in single layer
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    gcc \
    && rm -rf /var/lib/apt/lists/* \
    && apt-get clean

# Multi-stage with wheels
FROM base AS wheels
COPY requirements.txt .
RUN pip wheel --no-cache-dir --no-deps --wheel-dir /usr/src/app/wheels -r requirements.txt

# Final stage
FROM python:3.11-slim
RUN apt-get update && apt-get install -y --no-install-recommends \
    && rm -rf /var/lib/apt/lists/*

COPY --from=wheels /usr/src/app/wheels /wheels
COPY requirements.txt .

RUN pip install --no-cache /wheels/* \
    && rm -rf /wheels \
    && rm requirements.txt

# Create non-root user
RUN useradd --create-home --shell /bin/bash app
USER app
WORKDIR /home/app

COPY --chown=app:app . .

CMD ["python", "app.py"]

2. Binary Optimization

# Optimized Go binary with UPX compression
FROM golang:1.21-alpine AS builder

WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download

COPY . .

# Build optimized binary
RUN CGO_ENABLED=0 GOOS=linux go build \
    -a -installsuffix cgo \
    -ldflags="-w -s -X main.version=1.0.0 -X main.buildTime=$(date -u +%Y%m%d.%H%M%S)" \
    -o main .

# Compress binary with UPX
FROM alpine:3.18 AS compressor
RUN apk add --no-cache upx
COPY --from=builder /app/main /tmp/main
RUN upx --best --lzma /tmp/main -o /tmp/main-compressed

# Final minimal image
FROM scratch
COPY --from=compressor /tmp/main-compressed /main
COPY --from=alpine:latest /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/

ENTRYPOINT ["/main"]

Runtime Performance

1. Memory Optimization

# Memory-optimized container configuration
docker run -d \
  --name optimized-app \
  --memory=512m \
  --memory-swap=512m \
  --memory-swappiness=0 \
  --oom-kill-disable=false \
  --kernel-memory=50m \
  -e NODE_OPTIONS="--max-old-space-size=400" \
  -e MALLOC_ARENA_MAX=2 \
  my-app:latest

2. CPU Optimization

# CPU-optimized configuration
docker run -d \
  --name cpu-optimized-app \
  --cpus="2.0" \
  --cpu-shares=1024 \
  --cpuset-cpus="0,1" \
  --cpu-quota=100000 \
  --cpu-period=50000 \
  my-app:latest

3. Network Performance

# Network-optimized configuration
FROM nginx:alpine

# Optimize nginx configuration
COPY <<EOF /etc/nginx/nginx.conf
worker_processes auto;
worker_cpu_affinity auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    use epoll;
    multi_accept on;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    keepalive_requests 100;
    
    gzip on;
    gzip_comp_level 6;
    gzip_types text/plain text/css application/json application/javascript text/xml application/xml;
    
    # Buffer optimization
    client_body_buffer_size 128k;
    client_max_body_size 10m;
    client_header_buffer_size 1k;
    large_client_header_buffers 4 4k;
    output_buffers 1 32k;
    postpone_output 1460;
}
EOF

EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]

Resource Management and Limits

Comprehensive Resource Control

1. Advanced Docker Limits

# Production resource limits
docker run -d \
  --name production-app \
  # Memory limits
  --memory=1g \
  --memory-swap=1g \
  --memory-reservation=512m \
  --kernel-memory=100m \
  --oom-kill-disable=false \
  # CPU limits
  --cpus="2.0" \
  --cpu-shares=1024 \
  --cpuset-cpus="0-3" \
  --cpu-quota=200000 \
  --cpu-period=100000 \
  # I/O limits
  --blkio-weight=500 \
  --device-read-bps=/dev/sda:50mb \
  --device-write-bps=/dev/sda:50mb \
  # Process limits
  --pids-limit=100 \
  # Network limits
  --ulimit nofile=65536:65536 \
  --ulimit nproc=4096:4096 \
  my-app:latest

2. cgroups v2 Configuration

# Advanced cgroups v2 resource management
docker run -d \
  --name cgroup-optimized \
  --cgroup-parent="/sys/fs/cgroup/production.slice" \
  --memory=512m \
  --cpus="1.5" \
  --cgroupns=private \
  my-app:latest

# Custom cgroup slice
cat > /etc/systemd/system/production.slice << EOF
[Unit]
Description=Production Application Slice
Before=slices.target

[Slice]
MemoryAccounting=yes
MemoryMax=2G
CPUAccounting=yes
CPUQuota=300%
IOAccounting=yes
IOWeight=200
EOF

systemctl daemon-reload
systemctl start production.slice

Monitoring Resource Usage

1. Real-time Resource Monitoring

#!/bin/bash
# Container resource monitoring script

CONTAINER_NAME=$1
INTERVAL=${2:-5}

echo "Monitoring container: $CONTAINER_NAME"
echo "Update interval: ${INTERVAL}s"
echo "Time,CPU%,MemUsage,MemLimit,MemPercent,NetInput,NetOutput,BlockInput,BlockOutput,Pids"

while true; do
    stats=$(docker stats $CONTAINER_NAME --no-stream --format "table {{.CPUPerc}},{{.MemUsage}},{{.MemPerc}},{{.NetIO}},{{.BlockIO}},{{.PIDs}}" | tail -n 1)
    timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    echo "$timestamp,$stats"
    sleep $INTERVAL
done

2. Resource Alerting

#!/bin/bash
# Resource threshold alerting

CONTAINER_NAME=$1
CPU_THRESHOLD=80
MEMORY_THRESHOLD=85
WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"

check_resources() {
    local stats=$(docker stats $CONTAINER_NAME --no-stream --format "{{.CPUPerc}} {{.MemPerc}}")
    local cpu_usage=$(echo $stats | awk '{print $1}' | sed 's/%//')
    local mem_usage=$(echo $stats | awk '{print $2}' | sed 's/%//')
    
    if (( $(echo "$cpu_usage > $CPU_THRESHOLD" | bc -l) )); then
        send_alert "HIGH CPU" "CPU usage: ${cpu_usage}%"
    fi
    
    if (( $(echo "$mem_usage > $MEMORY_THRESHOLD" | bc -l) )); then
        send_alert "HIGH MEMORY" "Memory usage: ${mem_usage}%"
    fi
}

send_alert() {
    local type=$1
    local message=$2
    
    curl -X POST -H 'Content-type: application/json' \
        --data "{\"text\":\"🚨 $type Alert - Container: $CONTAINER_NAME - $message\"}" \
        $WEBHOOK_URL
}

while true; do
    check_resources
    sleep 30
done

Image Optimization and Registry Management

Advanced Image Management

1. Multi-Architecture Builds

# Build multi-architecture images
docker buildx create --name multiarch-builder --use
docker buildx inspect --bootstrap

# Build for multiple platforms
docker buildx build \
  --platform linux/amd64,linux/arm64,linux/arm/v7 \
  --tag my-registry.com/my-app:latest \
  --push \
  .

# Verify multi-arch manifest
docker buildx imagetools inspect my-registry.com/my-app:latest

2. Image Signing and Verification

# Sign images with Docker Content Trust
export DOCKER_CONTENT_TRUST=1
export DOCKER_CONTENT_TRUST_SERVER=https://notary.docker.io

# Push signed image
docker push my-registry.com/my-app:v1.0.0

# Verify image signature
docker trust inspect my-registry.com/my-app:v1.0.0

# Cosign signing (alternative)
cosign generate-key-pair
cosign sign --key cosign.key my-registry.com/my-app:v1.0.0
cosign verify --key cosign.pub my-registry.com/my-app:v1.0.0

3. Registry Optimization

# Harbor registry configuration
version: '3.8'

services:
  registry:
    image: goharbor/harbor-core:v2.9.0
    environment:
      - HARBOR_ADMIN_PASSWORD=HarborPassword123
      - DATABASE_TYPE=postgresql
      - DATABASE_HOST=postgresql
      - DATABASE_PORT=5432
      - DATABASE_USERNAME=postgres
      - DATABASE_PASSWORD=password
      - DATABASE_NAME=registry
      - REDIS_HOST=redis
      - REDIS_PORT=6379
    volumes:
      - harbor-data:/data
    depends_on:
      - postgresql
      - redis

  postgresql:
    image: postgres:15-alpine
    environment:
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=registry
    volumes:
      - postgres-data:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    volumes:
      - redis-data:/data

volumes:
  harbor-data:
  postgres-data:
  redis-data:

Image Cleanup and Lifecycle Management

#!/bin/bash
# Automated image cleanup script

# Configuration
REGISTRY="my-registry.com"
NAMESPACE="my-app"
KEEP_LATEST=5
DRY_RUN=${DRY_RUN:-true}

# Get all tags
tags=$(curl -s "https://$REGISTRY/v2/$NAMESPACE/tags/list" | jq -r '.tags[]' | sort -V)

# Keep only latest N versions
tags_to_delete=$(echo "$tags" | head -n -$KEEP_LATEST)

for tag in $tags_to_delete; do
    if [ "$DRY_RUN" = "true" ]; then
        echo "Would delete: $REGISTRY/$NAMESPACE:$tag"
    else
        echo "Deleting: $REGISTRY/$NAMESPACE:$tag"
        # Get manifest digest
        digest=$(curl -s -H "Accept: application/vnd.docker.distribution.manifest.v2+json" \
                 "https://$REGISTRY/v2/$NAMESPACE/manifests/$tag" | jq -r '.config.digest')
        
        # Delete manifest
        curl -X DELETE "https://$REGISTRY/v2/$NAMESPACE/manifests/$digest"
    fi
done

# Garbage collection (if registry supports it)
if [ "$DRY_RUN" = "false" ]; then
    curl -X POST "https://$REGISTRY/api/v2.0/system/gc/schedule"
fi

Monitoring and Observability

Comprehensive Container Monitoring

1. Prometheus Metrics Integration

# Application with Prometheus metrics
FROM node:18-alpine

WORKDIR /app

# Install dependencies
COPY package*.json ./
RUN npm ci --only=production

# Add monitoring dependencies
RUN npm install prom-client express-prometheus-middleware

COPY . .

# Expose metrics endpoint
EXPOSE 3000 9090

# Health check with metrics
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1

CMD ["npm", "start"]

2. Structured Logging

// Enhanced logging configuration
const winston = require('winston');
const { ElasticsearchTransport } = require('winston-elasticsearch');

const logger = winston.createLogger({
  level: process.env.LOG_LEVEL || 'info',
  format: winston.format.combine(
    winston.format.timestamp(),
    winston.format.errors({ stack: true }),
    winston.format.json()
  ),
  defaultMeta: {
    service: process.env.SERVICE_NAME || 'my-app',
    version: process.env.APP_VERSION || '1.0.0',
    environment: process.env.NODE_ENV || 'development',
    containerId: process.env.HOSTNAME,
  },
  transports: [
    new winston.transports.Console({
      format: winston.format.combine(
        winston.format.colorize(),
        winston.format.simple()
      )
    }),
    new ElasticsearchTransport({
      level: 'info',
      clientOpts: { node: process.env.ELASTICSEARCH_URL },
      index: 'application-logs'
    })
  ]
});

module.exports = logger;

3. Distributed Tracing

// OpenTelemetry tracing setup
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { JaegerExporter } = require('@opentelemetry/exporter-jaeger');
const { Resource } = require('@opentelemetry/resources');
const { SemanticResourceAttributes } = require('@opentelemetry/semantic-conventions');

const jaegerExporter = new JaegerExporter({
  endpoint: process.env.JAEGER_ENDPOINT || 'http://jaeger:14268/api/traces',
});

const sdk = new NodeSDK({
  resource: new Resource({
    [SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
    [SemanticResourceAttributes.SERVICE_VERSION]: process.env.APP_VERSION || '1.0.0',
    [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV || 'development',
  }),
  traceExporter: jaegerExporter,
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();

CI/CD Pipeline Integration

Advanced Docker CI/CD

1. Multi-Stage Pipeline

# .github/workflows/docker-production.yml
name: Production Docker Build and Deploy

on:
  push:
    branches: [main]
    tags: ['v*']

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Run Hadolint
        uses: hadolint/hadolint-[email protected]
        with:
          dockerfile: Dockerfile
          
      - name: Run Trivy vulnerability scanner
        uses: aquasecurity/trivy-action@master
        with:
          scan-type: 'fs'
          scan-ref: '.'
          format: 'sarif'
          output: 'trivy-results.sarif'
          
      - name: Upload Trivy scan results
        uses: github/codeql-action/upload-sarif@v2
        with:
          sarif_file: 'trivy-results.sarif'

  build-and-test:
    runs-on: ubuntu-latest
    needs: security-scan
    strategy:
      matrix:
        platform: [linux/amd64, linux/arm64]
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3
        
      - name: Log in to Container Registry
        uses: docker/login-action@v3
        with:
          registry: ${{ env.REGISTRY }}
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}
          
      - name: Extract metadata
        id: meta
        uses: docker/metadata-action@v5
        with:
          images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
          tags: |
            type=ref,event=branch
            type=ref,event=pr
            type=semver,pattern={{version}}
            type=semver,pattern={{major}}.{{minor}}
            type=sha,prefix={{branch}}-
            
      - name: Build and test
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platform }}
          push: false
          target: test
          cache-from: type=gha
          cache-to: type=gha,mode=max
          
      - name: Build and push production
        uses: docker/build-push-action@v5
        with:
          context: .
          platforms: ${{ matrix.platform }}
          push: true
          target: production
          tags: ${{ steps.meta.outputs.tags }}
          labels: ${{ steps.meta.outputs.labels }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

  deploy:
    runs-on: ubuntu-latest
    needs: build-and-test
    if: github.ref == 'refs/heads/main'
    environment: production
    steps:
      - name: Deploy to production
        run: |
          # Deploy using your preferred method (Kubernetes, Docker Swarm, etc.)
          echo "Deploying ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:main"

2. Container Scanning in Pipeline

#!/bin/bash
# comprehensive-security-scan.sh

set -e

IMAGE_NAME=$1
SEVERITY_THRESHOLD="HIGH"
SCAN_OUTPUT_DIR="security-reports"

mkdir -p $SCAN_OUTPUT_DIR

echo "🔍 Running comprehensive security scan for $IMAGE_NAME"

# Hadolint - Dockerfile linting
echo "📋 Running Hadolint..."
hadolint Dockerfile > "$SCAN_OUTPUT_DIR/hadolint-report.txt" || true

# Trivy - Vulnerability scanning
echo "🔍 Running Trivy vulnerability scan..."
trivy image \
  --format json \
  --output "$SCAN_OUTPUT_DIR/trivy-report.json" \
  --severity $SEVERITY_THRESHOLD,CRITICAL \
  $IMAGE_NAME

# Snyk - Container analysis
echo "🛡️ Running Snyk container scan..."
snyk container test $IMAGE_NAME \
  --json > "$SCAN_OUTPUT_DIR/snyk-report.json" || true

# Grype - Vulnerability scanning
echo "🔎 Running Grype scan..."
grype $IMAGE_NAME \
  -o json \
  --file "$SCAN_OUTPUT_DIR/grype-report.json"

# Syft - SBOM generation
echo "📦 Generating SBOM..."
syft $IMAGE_NAME \
  -o spdx-json \
  --file "$SCAN_OUTPUT_DIR/sbom.spdx.json"

# Generate summary report
echo "📊 Generating summary report..."
cat > "$SCAN_OUTPUT_DIR/summary.md" << EOF
# Security Scan Summary

**Image:** $IMAGE_NAME  
**Scan Date:** $(date)  
**Threshold:** $SEVERITY_THRESHOLD and above

## Scan Results

- ✅ Hadolint: Dockerfile linting completed
- ✅ Trivy: Vulnerability scan completed
- ✅ Snyk: Container analysis completed
- ✅ Grype: Vulnerability scan completed
- ✅ Syft: SBOM generation completed

## Reports Generated

- \`hadolint-report.txt\` - Dockerfile best practices
- \`trivy-report.json\` - Vulnerability details
- \`snyk-report.json\` - Security analysis
- \`grype-report.json\` - Vulnerability scan
- \`sbom.spdx.json\` - Software Bill of Materials

EOF

echo "✅ Security scan completed. Reports available in $SCAN_OUTPUT_DIR/"

# Check for critical vulnerabilities
critical_vulns=$(jq '.Results[]?.Vulnerabilities[]? | select(.Severity == "CRITICAL") | .VulnerabilityID' "$SCAN_OUTPUT_DIR/trivy-report.json" 2>/dev/null | wc -l)

if [ "$critical_vulns" -gt 0 ]; then
    echo "❌ Found $critical_vulns critical vulnerabilities"
    exit 1
else
    echo "✅ No critical vulnerabilities found"
fi

High Availability and Scaling

Container Orchestration Preparation

1. Health Check Strategies

# Advanced health checking
FROM nginx:alpine

# Install health check dependencies
RUN apk add --no-cache curl jq

# Copy health check script
COPY healthcheck.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/healthcheck.sh

# Multi-layered health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
  CMD /usr/local/bin/healthcheck.sh

# Copy application files
COPY nginx.conf /etc/nginx/nginx.conf
COPY html/ /usr/share/nginx/html/

EXPOSE 80

CMD ["nginx", "-g", "daemon off;"]

#!/bin/bash
# healthcheck.sh - Comprehensive health check

# Basic connectivity check
if ! curl -f -s http://localhost:80/health >/dev/null; then
    echo "Health endpoint unreachable"
    exit 1
fi

# Application-specific checks
health_response=$(curl -s http://localhost:80/health)

# Check response format
if ! echo "$health_response" | jq -e '.status == "healthy"' >/dev/null 2>&1; then
    echo "Health check failed: Invalid response"
    exit 1
fi

# Check dependencies
if echo "$health_response" | jq -e '.dependencies[] | select(.status != "healthy")' >/dev/null 2>&1; then
    echo "Health check failed: Dependency issues"
    exit 1
fi

echo "Health check passed"
exit 0

2. Graceful Shutdown Handling

// graceful-shutdown.js
const express = require('express');
const app = express();

let server;
let isShuttingDown = false;

// Graceful shutdown handling
const gracefulShutdown = (signal) => {
  console.log(`Received ${signal}. Starting graceful shutdown...`);
  isShuttingDown = true;
  
  server.close((err) => {
    if (err) {
      console.error('Error during server close:', err);
      process.exit(1);
    }
    
    console.log('HTTP server closed.');
    
    // Close database connections, clean up resources
    Promise.all([
      closeDatabase(),
      cleanupResources(),
      flushLogs()
    ]).then(() => {
      console.log('Graceful shutdown completed.');
      process.exit(0);
    }).catch((err) => {
      console.error('Error during graceful shutdown:', err);
      process.exit(1);
    });
  });
  
  // Force shutdown after 30 seconds
  setTimeout(() => {
    console.error('Forced shutdown after timeout');
    process.exit(1);
  }, 30000);
};

// Health check endpoint
app.get('/health', (req, res) => {
  if (isShuttingDown) {
    return res.status(503).json({ status: 'shutting_down' });
  }
  
  res.json({ 
    status: 'healthy',
    timestamp: new Date().toISOString(),
    uptime: process.uptime()
  });
});

// Start server
server = app.listen(3000, () => {
  console.log('Server running on port 3000');
});

// Handle shutdown signals
process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));

Troubleshooting Production Issues

Advanced Debugging Techniques

1. Production Debugging Setup

# Debug container with enhanced capabilities
docker run -it \
  --name debug-container \
  --pid=container:production-app \
  --network=container:production-app \
  --cap-add=SYS_PTRACE \
  --security-opt apparmor=unconfined \
  nicolaka/netshoot

# Inside debug container
# Network troubleshooting
ss -tulpn
netstat -i
iptables -L
tcpdump -i any port 3000

# Process analysis
ps aux
top -p $(pidof node)
strace -p $(pidof node)

# File system analysis
lsof -p $(pidof node)
find /proc/$(pidof node)/fd -type l -exec ls -l {} \;

2. Performance Profiling

#!/bin/bash
# container-profiling.sh

CONTAINER_NAME=$1
DURATION=${2:-60}
OUTPUT_DIR="profiling-$(date +%Y%m%d-%H%M%S)"

mkdir -p $OUTPUT_DIR

echo "🔍 Profiling container $CONTAINER_NAME for ${DURATION}s"

# CPU profiling
echo "📊 CPU profiling..."
docker exec $CONTAINER_NAME top -b -n1 > "$OUTPUT_DIR/cpu-snapshot.txt"

# Memory analysis
echo "🧠 Memory analysis..."
docker exec $CONTAINER_NAME cat /proc/meminfo > "$OUTPUT_DIR/meminfo.txt"
docker exec $CONTAINER_NAME ps aux --sort=-%mem > "$OUTPUT_DIR/memory-usage.txt"

# Network statistics
echo "🌐 Network statistics..."
docker exec $CONTAINER_NAME ss -s > "$OUTPUT_DIR/network-stats.txt"
docker exec $CONTAINER_NAME netstat -i > "$OUTPUT_DIR/network-interfaces.txt"

# File descriptor usage
echo "📁 File descriptor analysis..."
docker exec $CONTAINER_NAME lsof | wc -l > "$OUTPUT_DIR/fd-count.txt"
docker exec $CONTAINER_NAME cat /proc/sys/fs/file-nr > "$OUTPUT_DIR/system-fd.txt"

# Container stats over time
echo "📈 Collecting runtime stats..."
for i in $(seq 1 $DURATION); do
    docker stats $CONTAINER_NAME --no-stream >> "$OUTPUT_DIR/runtime-stats.txt"
    sleep 1
done

echo "✅ Profiling completed. Results in $OUTPUT_DIR/"

3. Log Analysis and Debugging

#!/bin/bash
# log-analysis.sh

CONTAINER_NAME=$1
HOURS_BACK=${2:-1}

echo "📋 Analyzing logs for $CONTAINER_NAME (last ${HOURS_BACK}h)"

# Recent errors
echo "🚨 Recent errors:"
docker logs $CONTAINER_NAME --since="${HOURS_BACK}h" 2>&1 | grep -i error | tail -20

# Performance indicators
echo "⚡ Performance indicators:"
docker logs $CONTAINER_NAME --since="${HOURS_BACK}h" 2>&1 | grep -E "(slow|timeout|memory|cpu)" | tail -10

# Request patterns
echo "🔄 Request patterns:"
docker logs $CONTAINER_NAME --since="${HOURS_BACK}h" 2>&1 | grep -oE "HTTP/[0-9.]+ [0-9]+" | sort | uniq -c | sort -nr

# Memory warnings
echo "🧠 Memory warnings:"
docker logs $CONTAINER_NAME --since="${HOURS_BACK}h" 2>&1 | grep -i "memory\|oom\|heap" | tail -10

Conclusion

Optimizing Docker for production requires a holistic approach covering security, performance, monitoring, and operational concerns. The techniques and strategies outlined in this guide provide a comprehensive foundation for running containers successfully at enterprise scale.

Key Takeaways

🔒 Security First

Use minimal base images and distroless containers
Implement comprehensive vulnerability scanning
Apply runtime security configurations
Manage secrets properly

⚡ Performance Optimization

Optimize image sizes with multi-stage builds
Configure appropriate resource limits
Implement effective caching strategies
Monitor and profile continuously

🔍 Observability

Implement structured logging
Add comprehensive monitoring
Use distributed tracing
Create meaningful health checks

🚀 Operational Excellence

Automate security scanning in CI/CD
Implement graceful shutdown handling
Plan for high availability
Prepare debugging tools and procedures

Next Steps

Audit Current Deployments: Apply these optimizations to existing containers
Implement Monitoring: Set up comprehensive observability
Security Hardening: Regular vulnerability scanning and updates
Performance Testing: Load test optimized containers
Documentation: Document your production practices
Team Training: Share knowledge with your development team

Remember, production optimization is an ongoing process. Continuously monitor, measure, and improve your containerized applications to maintain optimal performance and security.

Additional Resources

Happy containerizing! 🐳🚀