Deploying LangChain to Production: Complete DevOps Guide

Deploying LangChain applications to production requires careful consideration of containerization, orchestration, monitoring, and scalability. This comprehensive guide covers enterprise-grade deployment strategies, from Docker containerization to Kubernetes orchestration, complete with CI/CD pipelines and disaster recovery planning.

Production Architecture Overview
Docker Containerization
Kubernetes Deployment
CI/CD Pipeline Implementation
Monitoring with Prometheus and Grafana
Load Balancing and Auto-scaling
Environment and Secrets Management
Blue-Green Deployments
Disaster Recovery Planning
Production Best Practices

Production Architecture Overview

A production-ready LangChain deployment consists of multiple layers working together to ensure reliability, scalability, and maintainability. The architecture includes containerized applications, orchestration platforms, monitoring systems, and automated deployment pipelines.

Key Components

# production-architecture.yaml
components:
  application:
    - LangChain API service
    - Vector database (Pinecone/Weaviate/Chroma)
    - Redis for caching
    - PostgreSQL for metadata
  
  infrastructure:
    - Docker containers
    - Kubernetes cluster
    - Load balancer (NGINX/HAProxy)
    - Service mesh (Istio/Linkerd)
  
  monitoring:
    - Prometheus metrics collection
    - Grafana dashboards
    - ELK stack for logs
    - Jaeger for tracing
  
  deployment:
    - GitHub Actions CI/CD
    - ArgoCD for GitOps
    - Helm charts
    - Blue-green deployment strategy

Docker Containerization

Creating efficient Docker containers for LangChain applications requires optimizing for size, security, and performance. Here's a production-ready Dockerfile:

# Dockerfile
# Multi-stage build for optimized image size
FROM python:3.11-slim AS builder

# Install build dependencies
RUN apt-get update && apt-get install -y \
    build-essential \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"

# Copy and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Production stage
FROM python:3.11-slim

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    libpq5 \
    curl \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -r langchain && useradd -r -g langchain langchain

# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv

# Set environment variables
ENV PATH="/opt/venv/bin:$PATH" \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1 \
    LANGCHAIN_TRACING_V2=true \
    LANGCHAIN_ENDPOINT="https://api.langchain.plus"

# Create app directory
WORKDIR /app

# Copy application code
COPY --chown=langchain:langchain . .

# Switch to non-root user
USER langchain

# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=40s --retries=3 \
  CMD curl -f http://localhost:8000/health || exit 1

# Expose port
EXPOSE 8000

# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Docker Compose for Local Development

# docker-compose.yml
version: '3.8'

services:
  langchain-api:
    build:
      context: .
      dockerfile: Dockerfile
    ports:
      - "8000:8000"
    environment:
      - DATABASE_URL=postgresql://langchain:password@postgres:5432/langchain
      - REDIS_URL=redis://redis:6379
      - VECTOR_DB_URL=http://weaviate:8080
    depends_on:
      - postgres
      - redis
      - weaviate
    volumes:
      - ./logs:/app/logs
    networks:
      - langchain-network

  postgres:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: langchain
      POSTGRES_PASSWORD: password
      POSTGRES_DB: langchain
    volumes:
      - postgres_data:/var/lib/postgresql/data
    networks:
      - langchain-network

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    networks:
      - langchain-network

  weaviate:
    image: semitechnologies/weaviate:latest
    environment:
      QUERY_DEFAULTS_LIMIT: 25
      AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
      PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
    volumes:
      - weaviate_data:/var/lib/weaviate
    networks:
      - langchain-network

volumes:
  postgres_data:
  redis_data:
  weaviate_data:

networks:
  langchain-network:
    driver: bridge

Kubernetes Deployment

Deploying LangChain on Kubernetes provides scalability, self-healing, and declarative configuration management. Here's a comprehensive Kubernetes deployment:

Namespace and ConfigMap

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: langchain-prod
  labels:
    name: langchain-prod
    environment: production

---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: langchain-config
  namespace: langchain-prod
data:
  APP_NAME: "langchain-api"
  LOG_LEVEL: "INFO"
  MAX_WORKERS: "4"
  VECTOR_DB_HOST: "weaviate-service"
  REDIS_HOST: "redis-service"

Secrets Management

# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: langchain-secrets
  namespace: langchain-prod
type: Opaque
stringData:
  DATABASE_URL: "postgresql://langchain:password@postgres-service:5432/langchain"
  LANGCHAIN_API_KEY: "your-api-key-here"
  OPENAI_API_KEY: "your-openai-key-here"

Deployment Configuration

# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api
  namespace: langchain-prod
  labels:
    app: langchain-api
    version: v1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain-api
  template:
    metadata:
      labels:
        app: langchain-api
        version: v1
    spec:
      serviceAccountName: langchain-sa
      containers:
      - name: langchain-api
        image: your-registry/langchain-api:latest
        imagePullPolicy: Always
        ports:
        - containerPort: 8000
          name: http
        envFrom:
        - configMapRef:
            name: langchain-config
        - secretRef:
            name: langchain-secrets
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 5
        volumeMounts:
        - name: logs
          mountPath: /app/logs
      volumes:
      - name: logs
        emptyDir: {}

Service and Ingress

# service.yaml
apiVersion: v1
kind: Service
metadata:
  name: langchain-service
  namespace: langchain-prod
  labels:
    app: langchain-api
spec:
  selector:
    app: langchain-api
  ports:
  - port: 80
    targetPort: 8000
    protocol: TCP
    name: http
  type: ClusterIP

---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: langchain-ingress
  namespace: langchain-prod
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
    nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
  tls:
  - hosts:
    - api.langchain.example.com
    secretName: langchain-tls
  rules:
  - host: api.langchain.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: langchain-service
            port:
              number: 80

Horizontal Pod Autoscaler

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: langchain-hpa
  namespace: langchain-prod
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langchain-api
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 15
      - type: Pods
        value: 4
        periodSeconds: 15
      selectPolicy: Max

CI/CD Pipeline Implementation

A robust CI/CD pipeline ensures consistent and reliable deployments. Here's a complete GitHub Actions workflow:

# .github/workflows/deploy.yml
name: Deploy LangChain to Production

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}/langchain-api
  KUBERNETES_CLUSTER: langchain-prod
  KUBERNETES_NAMESPACE: langchain-prod

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.11'
    
    - name: Cache dependencies
      uses: actions/cache@v3
      with:
        path: ~/.cache/pip
        key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
        restore-keys: |
          ${{ runner.os }}-pip-
    
    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
    
    - name: Run tests
      run: |
        pytest tests/ --cov=app --cov-report=xml
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
    
    - name: Run security scan
      run: |
        pip install bandit safety
        bandit -r app/
        safety check

  build:
    needs: test
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v2
    
    - name: Log in to container registry
      uses: docker/login-action@v2
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v4
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=semver,pattern={{major}}.{{minor}}
          type=sha,prefix={{date 'YYYYMMDD'}}-
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v4
      with:
        context: .
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
        platforms: linux/amd64,linux/arm64

  deploy:
    if: github.event_name == 'push' && github.ref == 'refs/heads/main'
    needs: build
    runs-on: ubuntu-latest
    
    steps:
    - uses: actions/checkout@v3
    
    - name: Configure kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'latest'
    
    - name: Set up Kubeconfig
      run: |
        echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > kubeconfig
        export KUBECONFIG=$(pwd)/kubeconfig
    
    - name: Deploy to Kubernetes
      run: |
        kubectl set image deployment/langchain-api \
          langchain-api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }} \
          -n ${{ env.KUBERNETES_NAMESPACE }}
        
        kubectl rollout status deployment/langchain-api \
          -n ${{ env.KUBERNETES_NAMESPACE }}
    
    - name: Run smoke tests
      run: |
        kubectl run smoke-test --rm -i --restart=Never \
          --image=curlimages/curl:latest \
          -- curl -f http://langchain-service/health
    
    - name: Notify deployment
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        text: 'LangChain deployed to production'
        webhook_url: ${{ secrets.SLACK_WEBHOOK }}
      if: always()

Monitoring with Prometheus and Grafana

Comprehensive monitoring is crucial for production deployments. Here's how to set up Prometheus and Grafana:

Prometheus Configuration

# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: langchain-prod
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    scrape_configs:
    - job_name: 'langchain-api'
      kubernetes_sd_configs:
      - role: pod
        namespaces:
          names:
          - langchain-prod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: langchain-api
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__

Custom Metrics in LangChain Application

# metrics.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from functools import wraps
import time

# Define metrics
request_count = Counter(
    'langchain_requests_total',
    'Total number of requests',
    ['method', 'endpoint', 'status']
)

request_duration = Histogram(
    'langchain_request_duration_seconds',
    'Request duration in seconds',
    ['method', 'endpoint']
)

active_chains = Gauge(
    'langchain_active_chains',
    'Number of active LangChain instances'
)

llm_tokens_used = Counter(
    'langchain_llm_tokens_total',
    'Total tokens used by LLM',
    ['model', 'operation']
)

vector_db_operations = Counter(
    'langchain_vector_db_operations_total',
    'Vector database operations',
    ['operation', 'status']
)

# Decorator for timing requests
def track_request_metrics(endpoint):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            start_time = time.time()
            status = 'success'
            
            try:
                result = await func(*args, **kwargs)
                return result
            except Exception as e:
                status = 'error'
                raise
            finally:
                duration = time.time() - start_time
                request_count.labels(
                    method='POST',
                    endpoint=endpoint,
                    status=status
                ).inc()
                request_duration.labels(
                    method='POST',
                    endpoint=endpoint
                ).observe(duration)
        
        return wrapper
    return decorator

# FastAPI integration
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/metrics")
async def metrics():
    return Response(
        content=generate_latest(),
        media_type="text/plain"
    )

Grafana Dashboard Configuration

{
  "dashboard": {
    "id": null,
    "title": "LangChain Production Metrics",
    "panels": [
      {
        "title": "Request Rate",
        "targets": [
          {
            "expr": "rate(langchain_requests_total[5m])",
            "legendFormat": "{{method}} {{endpoint}}"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Response Time (95th percentile)",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(langchain_request_duration_seconds_bucket[5m]))",
            "legendFormat": "{{endpoint}}"
          }
        ],
        "type": "graph"
      },
      {
        "title": "LLM Token Usage",
        "targets": [
          {
            "expr": "rate(langchain_llm_tokens_total[1h])",
            "legendFormat": "{{model}} - {{operation}}"
          }
        ],
        "type": "graph"
      },
      {
        "title": "Active Chains",
        "targets": [
          {
            "expr": "langchain_active_chains",
            "legendFormat": "Active Chains"
          }
        ],
        "type": "stat"
      }
    ]
  }
}

Load Balancing and Auto-scaling

Implementing effective load balancing and auto-scaling ensures your LangChain application can handle varying loads:

NGINX Load Balancer Configuration

# nginx.conf
upstream langchain_backend {
    least_conn;
    
    server langchain-pod-1:8000 weight=1 max_fails=3 fail_timeout=30s;
    server langchain-pod-2:8000 weight=1 max_fails=3 fail_timeout=30s;
    server langchain-pod-3:8000 weight=1 max_fails=3 fail_timeout=30s;
    
    keepalive 32;
}

server {
    listen 80;
    server_name api.langchain.example.com;
    
    # Rate limiting
    limit_req_zone $binary_remote_addr zone=langchain_limit:10m rate=10r/s;
    limit_req zone=langchain_limit burst=20 nodelay;
    
    # Connection limiting
    limit_conn_zone $binary_remote_addr zone=addr:10m;
    limit_conn addr 10;
    
    location / {
        proxy_pass http://langchain_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        
        # Timeouts
        proxy_connect_timeout 60s;
        proxy_send_timeout 60s;
        proxy_read_timeout 60s;
        
        # Buffering
        proxy_buffering on;
        proxy_buffer_size 4k;
        proxy_buffers 8 4k;
        proxy_busy_buffers_size 8k;
    }
    
    location /health {
        access_log off;
        proxy_pass http://langchain_backend/health;
    }
}

Kubernetes Vertical Pod Autoscaler

# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: langchain-vpa
  namespace: langchain-prod
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: langchain-api
  updatePolicy:
    updateMode: "Auto"
  resourcePolicy:
    containerPolicies:
    - containerName: langchain-api
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 2
        memory: 4Gi
      controlledResources: ["cpu", "memory"]

Environment and Secrets Management

Secure management of environment variables and secrets is critical for production deployments:

HashiCorp Vault Integration

# vault-injector.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api-vault
  namespace: langchain-prod
spec:
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-inject: "true"
        vault.hashicorp.com/role: "langchain-role"
        vault.hashicorp.com/agent-inject-secret-api-keys: "secret/data/langchain/api-keys"
        vault.hashicorp.com/agent-inject-template-api-keys: |
          {{- with secret "secret/data/langchain/api-keys" -}}
          export OPENAI_API_KEY="{{ .Data.data.openai_key }}"
          export LANGCHAIN_API_KEY="{{ .Data.data.langchain_key }}"
          {{- end }}

Environment Configuration Management

# config.py
from pydantic import BaseSettings, Field, validator
from typing import Optional
import os

class Settings(BaseSettings):
    # Application settings
    app_name: str = "LangChain API"
    environment: str = Field(..., env="ENVIRONMENT")
    debug: bool = Field(False, env="DEBUG")
    
    # API Keys
    openai_api_key: str = Field(..., env="OPENAI_API_KEY")
    langchain_api_key: str = Field(..., env="LANGCHAIN_API_KEY")
    
    # Database
    database_url: str = Field(..., env="DATABASE_URL")
    redis_url: str = Field(..., env="REDIS_URL")
    
    # Vector Database
    vector_db_type: str = Field("weaviate", env="VECTOR_DB_TYPE")
    vector_db_url: str = Field(..., env="VECTOR_DB_URL")
    
    # Performance
    max_workers: int = Field(4, env="MAX_WORKERS")
    request_timeout: int = Field(60, env="REQUEST_TIMEOUT")
    
    # Security
    cors_origins: list[str] = Field(
        ["https://app.example.com"],
        env="CORS_ORIGINS"
    )
    api_rate_limit: int = Field(100, env="API_RATE_LIMIT")
    
    @validator("environment")
    def validate_environment(cls, v):
        allowed = ["development", "staging", "production"]
        if v not in allowed:
            raise ValueError(f"Environment must be one of {allowed}")
        return v
    
    class Config:
        env_file = ".env"
        case_sensitive = False

# Load settings
settings = Settings()

Blue-Green Deployments

Implementing blue-green deployments ensures zero-downtime updates:

# blue-green-deployment.yaml
apiVersion: v1
kind: Service
metadata:
  name: langchain-service
  namespace: langchain-prod
spec:
  selector:
    app: langchain-api
    version: green  # Switch between blue and green
  ports:
  - port: 80
    targetPort: 8000

---
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api-blue
  namespace: langchain-prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain-api
      version: blue
  template:
    metadata:
      labels:
        app: langchain-api
        version: blue
    spec:
      containers:
      - name: langchain-api
        image: your-registry/langchain-api:v1.0.0
        # ... rest of configuration

---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: langchain-api-green
  namespace: langchain-prod
spec:
  replicas: 3
  selector:
    matchLabels:
      app: langchain-api
      version: green
  template:
    metadata:
      labels:
        app: langchain-api
        version: green
    spec:
      containers:
      - name: langchain-api
        image: your-registry/langchain-api:v1.1.0
        # ... rest of configuration

Blue-Green Switch Script

#!/bin/bash
# switch-deployment.sh

NAMESPACE="langchain-prod"
SERVICE="langchain-service"
NEW_VERSION=$1

if [ -z "$NEW_VERSION" ]; then
    echo "Usage: ./switch-deployment.sh [blue|green]"
    exit 1
fi

# Verify new deployment is ready
echo "Checking $NEW_VERSION deployment status..."
kubectl rollout status deployment/langchain-api-$NEW_VERSION -n $NAMESPACE

# Switch traffic
echo "Switching traffic to $NEW_VERSION..."
kubectl patch service $SERVICE -n $NAMESPACE -p '{"spec":{"selector":{"version":"'$NEW_VERSION'"}}}'

# Verify switch
echo "Verifying service endpoints..."
kubectl get endpoints $SERVICE -n $NAMESPACE

echo "Deployment switched to $NEW_VERSION successfully!"

Disaster Recovery Planning

A comprehensive disaster recovery plan ensures business continuity:

Backup Strategy

# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: langchain-backup
  namespace: langchain-prod
spec:
  schedule: "0 */6 * * *"  # Every 6 hours
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: your-registry/backup-tool:latest
            env:
            - name: BACKUP_TARGETS
              value: "postgres,redis,vector-db"
            - name: S3_BUCKET
              value: "langchain-backups"
            command:
            - /bin/bash
            - -c
            - |
              # Backup PostgreSQL
              pg_dump $DATABASE_URL | gzip > postgres-$(date +%Y%m%d-%H%M%S).sql.gz
              aws s3 cp postgres-*.sql.gz s3://$S3_BUCKET/postgres/
              
              # Backup Redis
              redis-cli --rdb /tmp/redis-backup.rdb
              gzip /tmp/redis-backup.rdb
              aws s3 cp /tmp/redis-backup.rdb.gz s3://$S3_BUCKET/redis/redis-$(date +%Y%m%d-%H%M%S).rdb.gz
              
              # Backup vector database
              curl -X POST http://weaviate:8080/v1/backups -d '{"id": "backup-'$(date +%Y%m%d-%H%M%S)'"}'
          restartPolicy: OnFailure

Disaster Recovery Runbook

# LangChain Disaster Recovery Runbook

## Recovery Time Objective (RTO): 30 minutes
## Recovery Point Objective (RPO): 6 hours

### Phase 1: Assessment (5 minutes)
1. Identify the failure type:
   - [ ] Application failure
   - [ ] Database corruption
   - [ ] Infrastructure failure
   - [ ] Security breach

2. Check monitoring dashboards:
   - [ ] Prometheus alerts
   - [ ] Grafana metrics
   - [ ] Application logs

### Phase 2: Immediate Response (10 minutes)
1. Activate incident response team
2. Switch to disaster recovery site (if available)
3. Enable maintenance mode
4. Notify stakeholders

### Phase 3: Recovery (15 minutes)
1. **Application Recovery:**
   ```bash
   # Scale down current deployment
   kubectl scale deployment langchain-api --replicas=0 -n langchain-prod
   
   # Deploy last known good version
   kubectl set image deployment/langchain-api \
     langchain-api=your-registry/langchain-api:last-known-good \
     -n langchain-prod
   
   # Scale up
   kubectl scale deployment langchain-api --replicas=3 -n langchain-prod

Database Recovery:

# Restore PostgreSQL
aws s3 cp s3://langchain-backups/postgres/latest.sql.gz .
gunzip latest.sql.gz
psql $DATABASE_URL < latest.sql

# Restore Redis
aws s3 cp s3://langchain-backups/redis/latest.rdb.gz .
gunzip latest.rdb.gz
redis-cli --rdb latest.rdb

Vector Database Recovery:

curl -X POST http://weaviate:8080/v1/backups/restore \
  -d '{"id": "latest-backup"}'

Phase 4: Validation

Run health checks
Execute smoke tests
Verify data integrity
Monitor error rates

Phase 5: Post-Recovery

Document incident
Update runbook
Schedule post-mortem
Implement preventive measures


## Production Best Practices

### Security Hardening

```yaml
# security-policies.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: langchain-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  hostNetwork: false
  hostIPC: false
  hostPID: false
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'
  readOnlyRootFilesystem: true

Network Policies

# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: langchain-network-policy
  namespace: langchain-prod
spec:
  podSelector:
    matchLabels:
      app: langchain-api
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: langchain-prod
    - podSelector:
        matchLabels:
          app: nginx-ingress
    ports:
    - protocol: TCP
      port: 8000
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: langchain-prod
    ports:
    - protocol: TCP
      port: 5432  # PostgreSQL
    - protocol: TCP
      port: 6379  # Redis
    - protocol: TCP
      port: 8080  # Weaviate
  - to:
    - namespaceSelector: {}
    ports:
    - protocol: TCP
      port: 443  # HTTPS for external APIs

Performance Optimization

# performance.py
import asyncio
from functools import lru_cache
from typing import Optional
import redis.asyncio as redis
import json

class CacheManager:
    def __init__(self, redis_url: str):
        self.redis_client = redis.from_url(redis_url)
        self.default_ttl = 3600  # 1 hour
    
    async def get_or_set(self, key: str, func, ttl: Optional[int] = None):
        """Get value from cache or compute and set it"""
        # Try to get from cache
        cached = await self.redis_client.get(key)
        if cached:
            return json.loads(cached)
        
        # Compute value
        result = await func()
        
        # Cache the result
        await self.redis_client.setex(
            key,
            ttl or self.default_ttl,
            json.dumps(result)
        )
        
        return result

# Connection pooling for vector database
class VectorDBPool:
    def __init__(self, url: str, pool_size: int = 10):
        self.url = url
        self.pool = asyncio.Queue(maxsize=pool_size)
        self.pool_size = pool_size
    
    async def initialize(self):
        for _ in range(self.pool_size):
            connection = await self._create_connection()
            await self.pool.put(connection)
    
    async def _create_connection(self):
        # Create vector DB connection
        return await create_vector_db_connection(self.url)
    
    async def acquire(self):
        return await self.pool.get()
    
    async def release(self, connection):
        await self.pool.put(connection)

# Request batching for LLM calls
class LLMBatcher:
    def __init__(self, batch_size: int = 10, wait_time: float = 0.1):
        self.batch_size = batch_size
        self.wait_time = wait_time
        self.pending_requests = []
        self.results = {}
        self.batch_task = None
    
    async def add_request(self, request_id: str, prompt: str):
        future = asyncio.Future()
        self.pending_requests.append((request_id, prompt, future))
        
        if len(self.pending_requests) >= self.batch_size:
            await self._process_batch()
        elif not self.batch_task:
            self.batch_task = asyncio.create_task(self._batch_timer())
        
        return await future
    
    async def _batch_timer(self):
        await asyncio.sleep(self.wait_time)
        await self._process_batch()
        self.batch_task = None
    
    async def _process_batch(self):
        if not self.pending_requests:
            return
        
        batch = self.pending_requests[:self.batch_size]
        self.pending_requests = self.pending_requests[self.batch_size:]
        
        # Process batch
        prompts = [prompt for _, prompt, _ in batch]
        results = await process_llm_batch(prompts)
        
        # Distribute results
        for (request_id, _, future), result in zip(batch, results):
            future.set_result(result)

Conclusion

Deploying LangChain to production requires careful consideration of containerization, orchestration, monitoring, and disaster recovery. This guide provides a comprehensive foundation for building enterprise-grade LangChain deployments that are scalable, reliable, and maintainable.

Key takeaways for production deployment:

Containerization: Use multi-stage Docker builds for optimal image size and security
Orchestration: Leverage Kubernetes for scalability and self-healing capabilities
CI/CD: Implement automated pipelines with comprehensive testing and security scanning
Monitoring: Set up detailed metrics collection and alerting with Prometheus and Grafana
Scaling: Configure both horizontal and vertical autoscaling based on actual usage patterns
Security: Implement proper secrets management, network policies, and security scanning
Disaster Recovery: Maintain regular backups and tested recovery procedures
Performance: Optimize with caching, connection pooling, and request batching

By following these practices and configurations, you can ensure your LangChain applications run reliably in production environments, handling enterprise-scale workloads while maintaining high availability and performance standards.