Deploying LangChain to Production: Complete DevOps Guide
Deploying LangChain applications to production requires careful consideration of containerization, orchestration, monitoring, and scalability. This comprehensive guide covers enterprise-grade deployment strategies, from Docker containerization to Kubernetes orchestration, complete with CI/CD pipelines and disaster recovery planning.
Table of Contents
- Production Architecture Overview
- Docker Containerization
- Kubernetes Deployment
- CI/CD Pipeline Implementation
- Monitoring with Prometheus and Grafana
- Load Balancing and Auto-scaling
- Environment and Secrets Management
- Blue-Green Deployments
- Disaster Recovery Planning
- Production Best Practices
Production Architecture Overview
A production-ready LangChain deployment consists of multiple layers working together to ensure reliability, scalability, and maintainability. The architecture includes containerized applications, orchestration platforms, monitoring systems, and automated deployment pipelines.
Key Components
# production-architecture.yaml
components:
application:
- LangChain API service
- Vector database (Pinecone/Weaviate/Chroma)
- Redis for caching
- PostgreSQL for metadata
infrastructure:
- Docker containers
- Kubernetes cluster
- Load balancer (NGINX/HAProxy)
- Service mesh (Istio/Linkerd)
monitoring:
- Prometheus metrics collection
- Grafana dashboards
- ELK stack for logs
- Jaeger for tracing
deployment:
- GitHub Actions CI/CD
- ArgoCD for GitOps
- Helm charts
- Blue-green deployment strategy
Docker Containerization
Creating efficient Docker containers for LangChain applications requires optimizing for size, security, and performance. Here's a production-ready Dockerfile:
# Dockerfile
# Multi-stage build for optimized image size
FROM python:3.11-slim AS builder
# Install build dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy and install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim
# Install runtime dependencies
RUN apt-get update && apt-get install -y \
libpq5 \
curl \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN groupadd -r langchain && useradd -r -g langchain langchain
# Copy virtual environment from builder
COPY /opt/venv /opt/venv
# Set environment variables
ENV PATH="/opt/venv/bin:$PATH" \
PYTHONUNBUFFERED=1 \
PYTHONDONTWRITEBYTECODE=1 \
LANGCHAIN_TRACING_V2=true \
LANGCHAIN_ENDPOINT="https://api.langchain.plus"
# Create app directory
WORKDIR /app
# Copy application code
COPY . .
# Switch to non-root user
USER langchain
# Health check
HEALTHCHECK \
CMD curl -f http://localhost:8000/health || exit 1
# Expose port
EXPOSE 8000
# Run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]
Docker Compose for Local Development
# docker-compose.yml
version: '3.8'
services:
langchain-api:
build:
context: .
dockerfile: Dockerfile
ports:
- "8000:8000"
environment:
- DATABASE_URL=postgresql://langchain:password@postgres:5432/langchain
- REDIS_URL=redis://redis:6379
- VECTOR_DB_URL=http://weaviate:8080
depends_on:
- postgres
- redis
- weaviate
volumes:
- ./logs:/app/logs
networks:
- langchain-network
postgres:
image: postgres:15-alpine
environment:
POSTGRES_USER: langchain
POSTGRES_PASSWORD: password
POSTGRES_DB: langchain
volumes:
- postgres_data:/var/lib/postgresql/data
networks:
- langchain-network
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
networks:
- langchain-network
weaviate:
image: semitechnologies/weaviate:latest
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'false'
PERSISTENCE_DATA_PATH: '/var/lib/weaviate'
volumes:
- weaviate_data:/var/lib/weaviate
networks:
- langchain-network
volumes:
postgres_data:
redis_data:
weaviate_data:
networks:
langchain-network:
driver: bridge
Kubernetes Deployment
Deploying LangChain on Kubernetes provides scalability, self-healing, and declarative configuration management. Here's a comprehensive Kubernetes deployment:
Namespace and ConfigMap
# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: langchain-prod
labels:
name: langchain-prod
environment: production
---
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: langchain-config
namespace: langchain-prod
data:
APP_NAME: "langchain-api"
LOG_LEVEL: "INFO"
MAX_WORKERS: "4"
VECTOR_DB_HOST: "weaviate-service"
REDIS_HOST: "redis-service"
Secrets Management
# secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: langchain-secrets
namespace: langchain-prod
type: Opaque
stringData:
DATABASE_URL: "postgresql://langchain:password@postgres-service:5432/langchain"
LANGCHAIN_API_KEY: "your-api-key-here"
OPENAI_API_KEY: "your-openai-key-here"
Deployment Configuration
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-api
namespace: langchain-prod
labels:
app: langchain-api
version: v1
spec:
replicas: 3
selector:
matchLabels:
app: langchain-api
template:
metadata:
labels:
app: langchain-api
version: v1
spec:
serviceAccountName: langchain-sa
containers:
- name: langchain-api
image: your-registry/langchain-api:latest
imagePullPolicy: Always
ports:
- containerPort: 8000
name: http
envFrom:
- configMapRef:
name: langchain-config
- secretRef:
name: langchain-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8000
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: logs
mountPath: /app/logs
volumes:
- name: logs
emptyDir: {}
Service and Ingress
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: langchain-service
namespace: langchain-prod
labels:
app: langchain-api
spec:
selector:
app: langchain-api
ports:
- port: 80
targetPort: 8000
protocol: TCP
name: http
type: ClusterIP
---
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: langchain-ingress
namespace: langchain-prod
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rate-limit: "100"
spec:
tls:
- hosts:
- api.langchain.example.com
secretName: langchain-tls
rules:
- host: api.langchain.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: langchain-service
port:
number: 80
Horizontal Pod Autoscaler
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: langchain-hpa
namespace: langchain-prod
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: langchain-api
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
CI/CD Pipeline Implementation
A robust CI/CD pipeline ensures consistent and reliable deployments. Here's a complete GitHub Actions workflow:
# .github/workflows/deploy.yml
name: Deploy LangChain to Production
on:
push:
branches: [main]
pull_request:
branches: [main]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}/langchain-api
KUBERNETES_CLUSTER: langchain-prod
KUBERNETES_NAMESPACE: langchain-prod
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
- name: Cache dependencies
uses: actions/cache@v3
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install -r requirements-dev.txt
- name: Run tests
run: |
pytest tests/ --cov=app --cov-report=xml
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage.xml
- name: Run security scan
run: |
pip install bandit safety
bandit -r app/
safety check
build:
needs: test
runs-on: ubuntu-latest
permissions:
contents: read
packages: write
steps:
- uses: actions/checkout@v3
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v2
- name: Log in to container registry
uses: docker/login-action@v2
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=sha,prefix={{date 'YYYYMMDD'}}-
- name: Build and push Docker image
uses: docker/build-push-action@v4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
platforms: linux/amd64,linux/arm64
deploy:
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
needs: build
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Configure kubectl
uses: azure/setup-kubectl@v3
with:
version: 'latest'
- name: Set up Kubeconfig
run: |
echo "${{ secrets.KUBE_CONFIG }}" | base64 -d > kubeconfig
export KUBECONFIG=$(pwd)/kubeconfig
- name: Deploy to Kubernetes
run: |
kubectl set image deployment/langchain-api \
langchain-api=${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:sha-${{ github.sha }} \
-n ${{ env.KUBERNETES_NAMESPACE }}
kubectl rollout status deployment/langchain-api \
-n ${{ env.KUBERNETES_NAMESPACE }}
- name: Run smoke tests
run: |
kubectl run smoke-test --rm -i --restart=Never \
--image=curlimages/curl:latest \
-- curl -f http://langchain-service/health
- name: Notify deployment
uses: 8398a7/action-slack@v3
with:
status: ${{ job.status }}
text: 'LangChain deployed to production'
webhook_url: ${{ secrets.SLACK_WEBHOOK }}
if: always()
Monitoring with Prometheus and Grafana
Comprehensive monitoring is crucial for production deployments. Here's how to set up Prometheus and Grafana:
Prometheus Configuration
# prometheus-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: langchain-prod
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'langchain-api'
kubernetes_sd_configs:
- role: pod
namespaces:
names:
- langchain-prod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
action: keep
regex: langchain-api
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
Custom Metrics in LangChain Application
# metrics.py
from prometheus_client import Counter, Histogram, Gauge, generate_latest
from functools import wraps
import time
# Define metrics
request_count = Counter(
'langchain_requests_total',
'Total number of requests',
['method', 'endpoint', 'status']
)
request_duration = Histogram(
'langchain_request_duration_seconds',
'Request duration in seconds',
['method', 'endpoint']
)
active_chains = Gauge(
'langchain_active_chains',
'Number of active LangChain instances'
)
llm_tokens_used = Counter(
'langchain_llm_tokens_total',
'Total tokens used by LLM',
['model', 'operation']
)
vector_db_operations = Counter(
'langchain_vector_db_operations_total',
'Vector database operations',
['operation', 'status']
)
# Decorator for timing requests
def track_request_metrics(endpoint):
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
start_time = time.time()
status = 'success'
try:
result = await func(*args, **kwargs)
return result
except Exception as e:
status = 'error'
raise
finally:
duration = time.time() - start_time
request_count.labels(
method='POST',
endpoint=endpoint,
status=status
).inc()
request_duration.labels(
method='POST',
endpoint=endpoint
).observe(duration)
return wrapper
return decorator
# FastAPI integration
from fastapi import FastAPI, Response
app = FastAPI()
@app.get("/metrics")
async def metrics():
return Response(
content=generate_latest(),
media_type="text/plain"
)
Grafana Dashboard Configuration
{
"dashboard": {
"id": null,
"title": "LangChain Production Metrics",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(langchain_requests_total[5m])",
"legendFormat": "{{method}} {{endpoint}}"
}
],
"type": "graph"
},
{
"title": "Response Time (95th percentile)",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(langchain_request_duration_seconds_bucket[5m]))",
"legendFormat": "{{endpoint}}"
}
],
"type": "graph"
},
{
"title": "LLM Token Usage",
"targets": [
{
"expr": "rate(langchain_llm_tokens_total[1h])",
"legendFormat": "{{model}} - {{operation}}"
}
],
"type": "graph"
},
{
"title": "Active Chains",
"targets": [
{
"expr": "langchain_active_chains",
"legendFormat": "Active Chains"
}
],
"type": "stat"
}
]
}
}
Load Balancing and Auto-scaling
Implementing effective load balancing and auto-scaling ensures your LangChain application can handle varying loads:
NGINX Load Balancer Configuration
# nginx.conf
upstream langchain_backend {
least_conn;
server langchain-pod-1:8000 weight=1 max_fails=3 fail_timeout=30s;
server langchain-pod-2:8000 weight=1 max_fails=3 fail_timeout=30s;
server langchain-pod-3:8000 weight=1 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
listen 80;
server_name api.langchain.example.com;
# Rate limiting
limit_req_zone $binary_remote_addr zone=langchain_limit:10m rate=10r/s;
limit_req zone=langchain_limit burst=20 nodelay;
# Connection limiting
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 10;
location / {
proxy_pass http://langchain_backend;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
# Timeouts
proxy_connect_timeout 60s;
proxy_send_timeout 60s;
proxy_read_timeout 60s;
# Buffering
proxy_buffering on;
proxy_buffer_size 4k;
proxy_buffers 8 4k;
proxy_busy_buffers_size 8k;
}
location /health {
access_log off;
proxy_pass http://langchain_backend/health;
}
}
Kubernetes Vertical Pod Autoscaler
# vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: langchain-vpa
namespace: langchain-prod
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: langchain-api
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: langchain-api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
controlledResources: ["cpu", "memory"]
Environment and Secrets Management
Secure management of environment variables and secrets is critical for production deployments:
HashiCorp Vault Integration
# vault-injector.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-api-vault
namespace: langchain-prod
spec:
template:
metadata:
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "langchain-role"
vault.hashicorp.com/agent-inject-secret-api-keys: "secret/data/langchain/api-keys"
vault.hashicorp.com/agent-inject-template-api-keys: |
{{- with secret "secret/data/langchain/api-keys" -}}
export OPENAI_API_KEY="{{ .Data.data.openai_key }}"
export LANGCHAIN_API_KEY="{{ .Data.data.langchain_key }}"
{{- end }}
Environment Configuration Management
# config.py
from pydantic import BaseSettings, Field, validator
from typing import Optional
import os
class Settings(BaseSettings):
# Application settings
app_name: str = "LangChain API"
environment: str = Field(..., env="ENVIRONMENT")
debug: bool = Field(False, env="DEBUG")
# API Keys
openai_api_key: str = Field(..., env="OPENAI_API_KEY")
langchain_api_key: str = Field(..., env="LANGCHAIN_API_KEY")
# Database
database_url: str = Field(..., env="DATABASE_URL")
redis_url: str = Field(..., env="REDIS_URL")
# Vector Database
vector_db_type: str = Field("weaviate", env="VECTOR_DB_TYPE")
vector_db_url: str = Field(..., env="VECTOR_DB_URL")
# Performance
max_workers: int = Field(4, env="MAX_WORKERS")
request_timeout: int = Field(60, env="REQUEST_TIMEOUT")
# Security
cors_origins: list[str] = Field(
["https://app.example.com"],
env="CORS_ORIGINS"
)
api_rate_limit: int = Field(100, env="API_RATE_LIMIT")
@validator("environment")
def validate_environment(cls, v):
allowed = ["development", "staging", "production"]
if v not in allowed:
raise ValueError(f"Environment must be one of {allowed}")
return v
class Config:
env_file = ".env"
case_sensitive = False
# Load settings
settings = Settings()
Blue-Green Deployments
Implementing blue-green deployments ensures zero-downtime updates:
# blue-green-deployment.yaml
apiVersion: v1
kind: Service
metadata:
name: langchain-service
namespace: langchain-prod
spec:
selector:
app: langchain-api
version: green # Switch between blue and green
ports:
- port: 80
targetPort: 8000
---
# Blue deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-api-blue
namespace: langchain-prod
spec:
replicas: 3
selector:
matchLabels:
app: langchain-api
version: blue
template:
metadata:
labels:
app: langchain-api
version: blue
spec:
containers:
- name: langchain-api
image: your-registry/langchain-api:v1.0.0
# ... rest of configuration
---
# Green deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-api-green
namespace: langchain-prod
spec:
replicas: 3
selector:
matchLabels:
app: langchain-api
version: green
template:
metadata:
labels:
app: langchain-api
version: green
spec:
containers:
- name: langchain-api
image: your-registry/langchain-api:v1.1.0
# ... rest of configuration
Blue-Green Switch Script
#!/bin/bash
# switch-deployment.sh
NAMESPACE="langchain-prod"
SERVICE="langchain-service"
NEW_VERSION=$1
if [ -z "$NEW_VERSION" ]; then
echo "Usage: ./switch-deployment.sh [blue|green]"
exit 1
fi
# Verify new deployment is ready
echo "Checking $NEW_VERSION deployment status..."
kubectl rollout status deployment/langchain-api-$NEW_VERSION -n $NAMESPACE
# Switch traffic
echo "Switching traffic to $NEW_VERSION..."
kubectl patch service $SERVICE -n $NAMESPACE -p '{"spec":{"selector":{"version":"'$NEW_VERSION'"}}}'
# Verify switch
echo "Verifying service endpoints..."
kubectl get endpoints $SERVICE -n $NAMESPACE
echo "Deployment switched to $NEW_VERSION successfully!"
Disaster Recovery Planning
A comprehensive disaster recovery plan ensures business continuity:
Backup Strategy
# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: langchain-backup
namespace: langchain-prod
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: your-registry/backup-tool:latest
env:
- name: BACKUP_TARGETS
value: "postgres,redis,vector-db"
- name: S3_BUCKET
value: "langchain-backups"
command:
- /bin/bash
- -c
- |
# Backup PostgreSQL
pg_dump $DATABASE_URL | gzip > postgres-$(date +%Y%m%d-%H%M%S).sql.gz
aws s3 cp postgres-*.sql.gz s3://$S3_BUCKET/postgres/
# Backup Redis
redis-cli --rdb /tmp/redis-backup.rdb
gzip /tmp/redis-backup.rdb
aws s3 cp /tmp/redis-backup.rdb.gz s3://$S3_BUCKET/redis/redis-$(date +%Y%m%d-%H%M%S).rdb.gz
# Backup vector database
curl -X POST http://weaviate:8080/v1/backups -d '{"id": "backup-'$(date +%Y%m%d-%H%M%S)'"}'
restartPolicy: OnFailure
Disaster Recovery Runbook
# LangChain Disaster Recovery Runbook
## Recovery Time Objective (RTO): 30 minutes
## Recovery Point Objective (RPO): 6 hours
### Phase 1: Assessment (5 minutes)
1. Identify the failure type:
- [ ] Application failure
- [ ] Database corruption
- [ ] Infrastructure failure
- [ ] Security breach
2. Check monitoring dashboards:
- [ ] Prometheus alerts
- [ ] Grafana metrics
- [ ] Application logs
### Phase 2: Immediate Response (10 minutes)
1. Activate incident response team
2. Switch to disaster recovery site (if available)
3. Enable maintenance mode
4. Notify stakeholders
### Phase 3: Recovery (15 minutes)
1. **Application Recovery:**
```bash
# Scale down current deployment
kubectl scale deployment langchain-api --replicas=0 -n langchain-prod
# Deploy last known good version
kubectl set image deployment/langchain-api \
langchain-api=your-registry/langchain-api:last-known-good \
-n langchain-prod
# Scale up
kubectl scale deployment langchain-api --replicas=3 -n langchain-prod
-
Database Recovery:
# Restore PostgreSQL aws s3 cp s3://langchain-backups/postgres/latest.sql.gz . gunzip latest.sql.gz psql $DATABASE_URL < latest.sql # Restore Redis aws s3 cp s3://langchain-backups/redis/latest.rdb.gz . gunzip latest.rdb.gz redis-cli --rdb latest.rdb
-
Vector Database Recovery:
curl -X POST http://weaviate:8080/v1/backups/restore \ -d '{"id": "latest-backup"}'
Phase 4: Validation
- Run health checks
- Execute smoke tests
- Verify data integrity
- Monitor error rates
Phase 5: Post-Recovery
- Document incident
- Update runbook
- Schedule post-mortem
- Implement preventive measures
## Production Best Practices
### Security Hardening
```yaml
# security-policies.yaml
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
name: langchain-psp
spec:
privileged: false
allowPrivilegeEscalation: false
requiredDropCapabilities:
- ALL
volumes:
- 'configMap'
- 'emptyDir'
- 'projected'
- 'secret'
- 'downwardAPI'
- 'persistentVolumeClaim'
hostNetwork: false
hostIPC: false
hostPID: false
runAsUser:
rule: 'MustRunAsNonRoot'
seLinux:
rule: 'RunAsAny'
fsGroup:
rule: 'RunAsAny'
readOnlyRootFilesystem: true
Network Policies
# network-policy.yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: langchain-network-policy
namespace: langchain-prod
spec:
podSelector:
matchLabels:
app: langchain-api
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: langchain-prod
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- protocol: TCP
port: 8000
egress:
- to:
- namespaceSelector:
matchLabels:
name: langchain-prod
ports:
- protocol: TCP
port: 5432 # PostgreSQL
- protocol: TCP
port: 6379 # Redis
- protocol: TCP
port: 8080 # Weaviate
- to:
- namespaceSelector: {}
ports:
- protocol: TCP
port: 443 # HTTPS for external APIs
Performance Optimization
# performance.py
import asyncio
from functools import lru_cache
from typing import Optional
import redis.asyncio as redis
import json
class CacheManager:
def __init__(self, redis_url: str):
self.redis_client = redis.from_url(redis_url)
self.default_ttl = 3600 # 1 hour
async def get_or_set(self, key: str, func, ttl: Optional[int] = None):
"""Get value from cache or compute and set it"""
# Try to get from cache
cached = await self.redis_client.get(key)
if cached:
return json.loads(cached)
# Compute value
result = await func()
# Cache the result
await self.redis_client.setex(
key,
ttl or self.default_ttl,
json.dumps(result)
)
return result
# Connection pooling for vector database
class VectorDBPool:
def __init__(self, url: str, pool_size: int = 10):
self.url = url
self.pool = asyncio.Queue(maxsize=pool_size)
self.pool_size = pool_size
async def initialize(self):
for _ in range(self.pool_size):
connection = await self._create_connection()
await self.pool.put(connection)
async def _create_connection(self):
# Create vector DB connection
return await create_vector_db_connection(self.url)
async def acquire(self):
return await self.pool.get()
async def release(self, connection):
await self.pool.put(connection)
# Request batching for LLM calls
class LLMBatcher:
def __init__(self, batch_size: int = 10, wait_time: float = 0.1):
self.batch_size = batch_size
self.wait_time = wait_time
self.pending_requests = []
self.results = {}
self.batch_task = None
async def add_request(self, request_id: str, prompt: str):
future = asyncio.Future()
self.pending_requests.append((request_id, prompt, future))
if len(self.pending_requests) >= self.batch_size:
await self._process_batch()
elif not self.batch_task:
self.batch_task = asyncio.create_task(self._batch_timer())
return await future
async def _batch_timer(self):
await asyncio.sleep(self.wait_time)
await self._process_batch()
self.batch_task = None
async def _process_batch(self):
if not self.pending_requests:
return
batch = self.pending_requests[:self.batch_size]
self.pending_requests = self.pending_requests[self.batch_size:]
# Process batch
prompts = [prompt for _, prompt, _ in batch]
results = await process_llm_batch(prompts)
# Distribute results
for (request_id, _, future), result in zip(batch, results):
future.set_result(result)
Conclusion
Deploying LangChain to production requires careful consideration of containerization, orchestration, monitoring, and disaster recovery. This guide provides a comprehensive foundation for building enterprise-grade LangChain deployments that are scalable, reliable, and maintainable.
Key takeaways for production deployment:
- Containerization: Use multi-stage Docker builds for optimal image size and security
- Orchestration: Leverage Kubernetes for scalability and self-healing capabilities
- CI/CD: Implement automated pipelines with comprehensive testing and security scanning
- Monitoring: Set up detailed metrics collection and alerting with Prometheus and Grafana
- Scaling: Configure both horizontal and vertical autoscaling based on actual usage patterns
- Security: Implement proper secrets management, network policies, and security scanning
- Disaster Recovery: Maintain regular backups and tested recovery procedures
- Performance: Optimize with caching, connection pooling, and request batching
By following these practices and configurations, you can ensure your LangChain applications run reliably in production environments, handling enterprise-scale workloads while maintaining high availability and performance standards.