title: 'Building a SaaS with LangChain: Architecture and Scaling' publishedAt: '2025-01-11' summary: 'Learn how to build a production-ready multi-tenant SaaS application with LangChain, covering architecture, scaling strategies, billing integration, and real-world challenges from 0 to 10k customers.' keywords: ['langchain saas', 'langchain architecture', 'multi-tenant langchain', 'langchain scaling', 'langchain production', 'ai saas architecture'] author: 'Fenil Sonani'
Building a Software-as-a-Service (SaaS) application with LangChain presents unique challenges beyond typical web applications. You're not just dealing with user authentication and data storage – you're managing AI model costs, rate limiting, tenant isolation, and complex billing based on token usage. This guide walks through building a production-ready LangChain SaaS architecture that can scale from your first customer to 10,000 and beyond.
Table of Contents
- Architecture Overview
- Multi-Tenant Design Patterns
- API Gateway and Rate Limiting
- Billing Integration with Stripe
- Usage Tracking and Quotas
- Tenant Isolation Strategies
- Scaling from 0 to 10k Customers
- Complete Example Application
- Deployment and Operations
- Lessons Learned and Best Practices
Architecture Overview
A production LangChain SaaS requires careful consideration of multiple layers. Here's the high-level architecture that has proven successful for scaling AI applications:
graph TB
subgraph "Client Layer"
WEB[Web App]
API[API Clients]
SDK[SDKs]
end
subgraph "API Gateway"
GW[Kong/AWS API Gateway]
AUTH[Auth Service]
RATE[Rate Limiter]
end
subgraph "Application Layer"
APP1[App Server 1]
APP2[App Server 2]
APP3[App Server N]
QUEUE[Job Queue]
end
subgraph "AI Layer"
LC[LangChain Service]
CACHE[Vector Cache]
EMB[Embeddings Service]
end
subgraph "Data Layer"
PG[(PostgreSQL)]
REDIS[(Redis)]
S3[(S3/Object Storage)]
VECTOR[(Vector DB)]
end
subgraph "Monitoring"
LOG[Logging]
METRIC[Metrics]
TRACE[Tracing]
end
WEB --> GW
API --> GW
SDK --> GW
GW --> AUTH
GW --> RATE
GW --> APP1
GW --> APP2
GW --> APP3
APP1 --> LC
APP2 --> LC
APP3 --> LC
APP1 --> QUEUE
APP2 --> QUEUE
APP3 --> QUEUE
LC --> CACHE
LC --> EMB
LC --> VECTOR
APP1 --> PG
APP2 --> PG
APP3 --> PG
APP1 --> REDIS
APP2 --> REDIS
APP3 --> REDIS
LC --> S3
APP1 --> LOG
APP2 --> LOG
APP3 --> LOG
LC --> METRIC
Key Components
- API Gateway: Central entry point handling authentication, rate limiting, and request routing
- Application Servers: Stateless Node.js/Python servers running your business logic
- LangChain Service: Dedicated service layer for AI operations
- Data Stores: PostgreSQL for relational data, Redis for caching, S3 for documents, Vector DB for embeddings
- Monitoring Stack: Comprehensive logging, metrics, and distributed tracing
Multi-Tenant Design Patterns
Multi-tenancy is crucial for SaaS applications. With LangChain, you need to consider both data isolation and AI resource isolation.
Database-Level Isolation
// Database schema with tenant isolation
interface TenantSchema {
id: string;
name: string;
plan: 'starter' | 'professional' | 'enterprise';
settings: {
maxTokensPerMonth: number;
maxConcurrentRequests: number;
allowedModels: string[];
customPrompts: boolean;
dataRetentionDays: number;
};
createdAt: Date;
updatedAt: Date;
}
interface UserSchema {
id: string;
tenantId: string; // Foreign key to tenant
email: string;
role: 'admin' | 'user' | 'viewer';
apiKeys: ApiKey[];
}
interface ConversationSchema {
id: string;
tenantId: string; // Ensures data isolation
userId: string;
messages: Message[];
tokenUsage: {
promptTokens: number;
completionTokens: number;
totalCost: number;
};
metadata: Record<string, any>;
createdAt: Date;
}
Application-Level Tenant Context
// Middleware for tenant context injection
export class TenantContextMiddleware {
async use(req: Request, res: Response, next: NextFunction) {
try {
// Extract tenant from JWT or API key
const tenantId = await this.extractTenantId(req);
if (!tenantId) {
return res.status(401).json({ error: 'Invalid tenant context' });
}
// Load tenant configuration
const tenant = await this.tenantService.getTenant(tenantId);
// Inject tenant context
req.context = {
tenantId: tenant.id,
tenant: tenant,
limits: {
maxTokens: tenant.settings.maxTokensPerMonth,
remainingTokens: await this.getRemaining Tokens(tenant.id),
concurrentRequests: tenant.settings.maxConcurrentRequests
}
};
next();
} catch (error) {
res.status(500).json({ error: 'Failed to establish tenant context' });
}
}
private async extractTenantId(req: Request): Promise<string | null> {
// Check API key header
const apiKey = req.headers['x-api-key'];
if (apiKey) {
return await this.tenantService.getTenantIdFromApiKey(apiKey);
}
// Check JWT token
const token = req.headers.authorization?.split(' ')[1];
if (token) {
const decoded = jwt.verify(token, process.env.JWT_SECRET);
return decoded.tenantId;
}
return null;
}
}
LangChain Tenant Isolation
// Tenant-aware LangChain service
export class TenantLangChainService {
private chains: Map<string, ConversationChain> = new Map();
async getChain(tenantId: string): Promise<ConversationChain> {
// Check if chain exists for tenant
if (this.chains.has(tenantId)) {
return this.chains.get(tenantId)!;
}
// Load tenant-specific configuration
const config = await this.loadTenantConfig(tenantId);
// Create tenant-specific LLM instance
const llm = new ChatOpenAI({
openAIApiKey: config.apiKey || process.env.OPENAI_API_KEY,
modelName: config.model || 'gpt-3.5-turbo',
temperature: config.temperature || 0.7,
maxTokens: config.maxTokens || 1000,
callbacks: [
new TokenUsageCallback(tenantId),
new TenantRateLimitCallback(tenantId)
]
});
// Create tenant-specific vector store
const vectorStore = await this.createTenantVectorStore(tenantId);
// Create conversation chain with memory
const memory = new BufferMemory({
memoryKey: 'chat_history',
returnMessages: true,
inputKey: 'question',
outputKey: 'answer'
});
const chain = new ConversationalRetrievalQAChain({
llm,
vectorStore,
memory,
returnSourceDocuments: true
});
this.chains.set(tenantId, chain);
return chain;
}
private async createTenantVectorStore(tenantId: string): Promise<VectorStore> {
// Create isolated vector store namespace
return new PineconeStore({
pineconeIndex: this.pineconeIndex,
namespace: `tenant_${tenantId}`,
textKey: 'text',
embeddingModel: new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
})
});
}
}
API Gateway and Rate Limiting
A robust API gateway is essential for managing multi-tenant traffic and enforcing limits.
Rate Limiting Strategy
// Rate limiting configuration per tenant tier
const rateLimitConfig = {
starter: {
windowMs: 60 * 1000, // 1 minute
maxRequests: 10,
maxTokensPerMinute: 10000,
maxConcurrent: 2
},
professional: {
windowMs: 60 * 1000,
maxRequests: 60,
maxTokensPerMinute: 50000,
maxConcurrent: 5
},
enterprise: {
windowMs: 60 * 1000,
maxRequests: 600,
maxTokensPerMinute: 500000,
maxConcurrent: 20
}
};
// Redis-based rate limiter
export class TenantRateLimiter {
constructor(private redis: Redis) {}
async checkLimit(tenantId: string, type: 'request' | 'token', amount: number = 1): Promise<RateLimitResult> {
const tenant = await this.getTenant(tenantId);
const config = rateLimitConfig[tenant.plan];
const key = `rate_limit:${tenantId}:${type}`;
const window = config.windowMs;
const limit = type === 'request' ? config.maxRequests : config.maxTokensPerMinute;
// Sliding window implementation
const now = Date.now();
const windowStart = now - window;
// Remove old entries
await this.redis.zremrangebyscore(key, '-inf', windowStart);
// Count current usage
const currentUsage = await this.redis.zcard(key);
if (currentUsage + amount > limit) {
return {
allowed: false,
limit,
remaining: Math.max(0, limit - currentUsage),
resetAt: new Date(now + window)
};
}
// Add new entry
await this.redis.zadd(key, now, `${now}:${amount}`);
await this.redis.expire(key, Math.ceil(window / 1000));
return {
allowed: true,
limit,
remaining: limit - currentUsage - amount,
resetAt: new Date(now + window)
};
}
async checkConcurrent(tenantId: string): Promise<boolean> {
const tenant = await this.getTenant(tenantId);
const config = rateLimitConfig[tenant.plan];
const key = `concurrent:${tenantId}`;
const current = await this.redis.get(key);
if (parseInt(current || '0') >= config.maxConcurrent) {
return false;
}
await this.redis.incr(key);
await this.redis.expire(key, 300); // 5 minute expiry
return true;
}
async releaseConcurrent(tenantId: string): Promise<void> {
const key = `concurrent:${tenantId}`;
await this.redis.decr(key);
}
}
API Gateway Implementation
// Express middleware for API gateway
export class ApiGateway {
constructor(
private rateLimiter: TenantRateLimiter,
private usageTracker: UsageTracker
) {}
async handleRequest(req: Request, res: Response, next: NextFunction) {
const tenantId = req.context.tenantId;
// Check request rate limit
const requestLimit = await this.rateLimiter.checkLimit(tenantId, 'request');
if (!requestLimit.allowed) {
return res.status(429).json({
error: 'Rate limit exceeded',
retryAfter: requestLimit.resetAt
});
}
// Check concurrent request limit
const canProceed = await this.rateLimiter.checkConcurrent(tenantId);
if (!canProceed) {
return res.status(429).json({
error: 'Concurrent request limit exceeded'
});
}
// Track request
const requestId = uuidv4();
req.context.requestId = requestId;
// Set rate limit headers
res.setHeader('X-RateLimit-Limit', requestLimit.limit);
res.setHeader('X-RateLimit-Remaining', requestLimit.remaining);
res.setHeader('X-RateLimit-Reset', requestLimit.resetAt.toISOString());
// Handle response completion
res.on('finish', async () => {
await this.rateLimiter.releaseConcurrent(tenantId);
// Track usage if LangChain was used
if (req.context.tokenUsage) {
await this.usageTracker.trackUsage({
tenantId,
requestId,
tokens: req.context.tokenUsage,
cost: req.context.cost,
timestamp: new Date()
});
}
});
next();
}
}
Billing Integration with Stripe
Integrating billing requires careful tracking of usage and flexible pricing models.
Stripe Setup and Price Models
// Stripe product and price configuration
export const stripePricing = {
products: {
starter: 'prod_starter123',
professional: 'prod_prof456',
enterprise: 'prod_ent789'
},
prices: {
starter: {
monthly: 'price_starter_monthly',
usage: {
tokens: 'price_starter_tokens', // $0.01 per 1k tokens after included
documents: 'price_starter_docs' // $0.10 per document after included
}
},
professional: {
monthly: 'price_prof_monthly',
usage: {
tokens: 'price_prof_tokens', // $0.008 per 1k tokens
documents: 'price_prof_docs' // $0.08 per document
}
},
enterprise: {
monthly: 'price_ent_monthly',
usage: {
tokens: 'price_ent_tokens', // $0.006 per 1k tokens
documents: 'price_ent_docs' // $0.06 per document
}
}
}
};
// Billing service implementation
export class BillingService {
private stripe: Stripe;
constructor() {
this.stripe = new Stripe(process.env.STRIPE_SECRET_KEY!, {
apiVersion: '2023-10-16'
});
}
async createCustomer(tenant: TenantSchema): Promise<string> {
const customer = await this.stripe.customers.create({
name: tenant.name,
email: tenant.billingEmail,
metadata: {
tenantId: tenant.id,
plan: tenant.plan
}
});
return customer.id;
}
async createSubscription(tenantId: string, plan: string): Promise<Stripe.Subscription> {
const tenant = await this.getTenant(tenantId);
// Create subscription with base plan
const subscription = await this.stripe.subscriptions.create({
customer: tenant.stripeCustomerId,
items: [
{
price: stripePricing.prices[plan].monthly
},
{
price: stripePricing.prices[plan].usage.tokens,
quantity: 0 // Usage-based, will be reported later
},
{
price: stripePricing.prices[plan].usage.documents,
quantity: 0
}
],
metadata: {
tenantId
}
});
return subscription;
}
async reportUsage(tenantId: string, usage: UsageReport): Promise<void> {
const tenant = await this.getTenant(tenantId);
const subscription = await this.getActiveSubscription(tenant.stripeCustomerId);
// Find usage-based subscription items
const tokenItem = subscription.items.data.find(
item => item.price.id === stripePricing.prices[tenant.plan].usage.tokens
);
const docItem = subscription.items.data.find(
item => item.price.id === stripePricing.prices[tenant.plan].usage.documents
);
// Report token usage
if (tokenItem && usage.tokens > 0) {
await this.stripe.subscriptionItems.createUsageRecord(
tokenItem.id,
{
quantity: Math.ceil(usage.tokens / 1000), // Billed per 1k tokens
timestamp: Math.floor(usage.timestamp.getTime() / 1000),
action: 'increment'
}
);
}
// Report document usage
if (docItem && usage.documents > 0) {
await this.stripe.subscriptionItems.createUsageRecord(
docItem.id,
{
quantity: usage.documents,
timestamp: Math.floor(usage.timestamp.getTime() / 1000),
action: 'increment'
}
);
}
}
}
Webhook Handling
// Stripe webhook handler
export class StripeWebhookHandler {
async handleWebhook(req: Request, res: Response) {
const sig = req.headers['stripe-signature'] as string;
let event: Stripe.Event;
try {
event = this.stripe.webhooks.constructEvent(
req.body,
sig,
process.env.STRIPE_WEBHOOK_SECRET!
);
} catch (err) {
return res.status(400).send(`Webhook Error: ${err.message}`);
}
switch (event.type) {
case 'customer.subscription.created':
case 'customer.subscription.updated':
await this.handleSubscriptionChange(event.data.object as Stripe.Subscription);
break;
case 'customer.subscription.deleted':
await this.handleSubscriptionCancellation(event.data.object as Stripe.Subscription);
break;
case 'invoice.payment_succeeded':
await this.handlePaymentSuccess(event.data.object as Stripe.Invoice);
break;
case 'invoice.payment_failed':
await this.handlePaymentFailure(event.data.object as Stripe.Invoice);
break;
}
res.json({ received: true });
}
private async handleSubscriptionChange(subscription: Stripe.Subscription) {
const tenantId = subscription.metadata.tenantId;
// Update tenant plan based on subscription
const plan = this.extractPlanFromSubscription(subscription);
await this.tenantService.updatePlan(tenantId, plan);
// Update limits
await this.updateTenantLimits(tenantId, plan);
}
private async handlePaymentFailure(invoice: Stripe.Invoice) {
const tenantId = invoice.subscription_details?.metadata?.tenantId;
if (tenantId) {
// Suspend tenant after grace period
await this.tenantService.scheduleSupension(tenantId, 7); // 7 day grace period
// Send notification
await this.notificationService.sendPaymentFailureNotification(tenantId);
}
}
}
Usage Tracking and Quotas
Accurate usage tracking is critical for billing and enforcing quotas.
Token Usage Tracking
// Comprehensive usage tracking system
export class UsageTracker {
constructor(
private db: Database,
private redis: Redis,
private billing: BillingService
) {}
async trackTokenUsage(params: {
tenantId: string;
userId: string;
requestId: string;
promptTokens: number;
completionTokens: number;
model: string;
cost: number;
}): Promise<void> {
const timestamp = new Date();
// Store detailed usage record
await this.db.usage.create({
...params,
totalTokens: params.promptTokens + params.completionTokens,
timestamp
});
// Update real-time counters in Redis
const dailyKey = `usage:${params.tenantId}:daily:${this.getDateKey()}`;
const monthlyKey = `usage:${params.tenantId}:monthly:${this.getMonthKey()}`;
const pipeline = this.redis.pipeline();
// Increment counters
pipeline.hincrby(dailyKey, 'tokens', params.promptTokens + params.completionTokens);
pipeline.hincrby(dailyKey, 'requests', 1);
pipeline.hincrbyfloat(dailyKey, 'cost', params.cost);
pipeline.hincrby(monthlyKey, 'tokens', params.promptTokens + params.completionTokens);
pipeline.hincrby(monthlyKey, 'requests', 1);
pipeline.hincrbyfloat(monthlyKey, 'cost', params.cost);
// Set expiry
pipeline.expire(dailyKey, 60 * 60 * 24 * 7); // 7 days
pipeline.expire(monthlyKey, 60 * 60 * 24 * 35); // 35 days
await pipeline.exec();
// Check quotas
await this.checkAndEnforceQuotas(params.tenantId);
}
async checkAndEnforceQuotas(tenantId: string): Promise<QuotaStatus> {
const tenant = await this.getTenant(tenantId);
const monthlyUsage = await this.getMonthlyUsage(tenantId);
const quotaStatus: QuotaStatus = {
tokensUsed: monthlyUsage.tokens,
tokensLimit: tenant.settings.maxTokensPerMonth,
tokensRemaining: Math.max(0, tenant.settings.maxTokensPerMonth - monthlyUsage.tokens),
percentUsed: (monthlyUsage.tokens / tenant.settings.maxTokensPerMonth) * 100,
willExceedAt: this.predictExceedance(monthlyUsage, tenant.settings.maxTokensPerMonth)
};
// Send alerts at thresholds
if (quotaStatus.percentUsed >= 80 && !tenant.alerts.sent80) {
await this.sendQuotaAlert(tenantId, 80);
}
if (quotaStatus.percentUsed >= 90 && !tenant.alerts.sent90) {
await this.sendQuotaAlert(tenantId, 90);
}
if (quotaStatus.percentUsed >= 100) {
await this.enforceQuotaLimit(tenantId);
}
return quotaStatus;
}
private async enforceQuotaLimit(tenantId: string) {
// Set quota exceeded flag
await this.redis.set(`quota_exceeded:${tenantId}`, '1', 'EX', 3600);
// Notify tenant
await this.notificationService.sendQuotaExceededNotification(tenantId);
// Log event
await this.auditLog.log({
tenantId,
event: 'quota_exceeded',
timestamp: new Date()
});
}
}
Document and Storage Tracking
// Document usage tracking
export class DocumentTracker {
async trackDocument(params: {
tenantId: string;
documentId: string;
size: number;
type: string;
operation: 'upload' | 'process' | 'delete';
}): Promise<void> {
// Record document operation
await this.db.documentOperations.create(params);
// Update storage metrics
if (params.operation === 'upload') {
await this.redis.hincrby(
`storage:${params.tenantId}`,
'totalBytes',
params.size
);
await this.redis.hincrby(
`storage:${params.tenantId}`,
'documentCount',
1
);
} else if (params.operation === 'delete') {
await this.redis.hincrby(
`storage:${params.tenantId}`,
'totalBytes',
-params.size
);
await this.redis.hincrby(
`storage:${params.tenantId}`,
'documentCount',
-1
);
}
// Check storage quotas
await this.checkStorageQuotas(params.tenantId);
}
async getStorageMetrics(tenantId: string): Promise<StorageMetrics> {
const data = await this.redis.hgetall(`storage:${tenantId}`);
return {
totalBytes: parseInt(data.totalBytes || '0'),
documentCount: parseInt(data.documentCount || '0'),
averageSize: data.documentCount ?
parseInt(data.totalBytes || '0') / parseInt(data.documentCount || '1') : 0
};
}
}
Tenant Isolation Strategies
Ensuring complete isolation between tenants is crucial for security and compliance.
Vector Store Isolation
// Isolated vector stores per tenant
export class TenantVectorStoreManager {
private vectorStores: Map<string, VectorStore> = new Map();
async getVectorStore(tenantId: string): Promise<VectorStore> {
if (this.vectorStores.has(tenantId)) {
return this.vectorStores.get(tenantId)!;
}
// Create isolated namespace in Pinecone
const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
}),
{
pineconeIndex: this.pineconeIndex,
namespace: `tenant_${tenantId}`, // Isolated namespace
filter: { tenantId } // Additional filter for safety
}
);
this.vectorStores.set(tenantId, vectorStore);
return vectorStore;
}
async addDocuments(
tenantId: string,
documents: Document[]
): Promise<void> {
const vectorStore = await this.getVectorStore(tenantId);
// Add tenant metadata to all documents
const taggedDocuments = documents.map(doc => ({
...doc,
metadata: {
...doc.metadata,
tenantId,
indexedAt: new Date().toISOString()
}
}));
await vectorStore.addDocuments(taggedDocuments);
// Track document count
await this.documentTracker.trackDocuments({
tenantId,
count: documents.length,
operation: 'index'
});
}
async search(
tenantId: string,
query: string,
k: number = 4
): Promise<Document[]> {
const vectorStore = await this.getVectorStore(tenantId);
// Search with tenant filter
const results = await vectorStore.similaritySearch(
query,
k,
{ tenantId } // Ensure tenant isolation
);
return results;
}
}
Memory and Cache Isolation
// Tenant-isolated memory management
export class TenantMemoryManager {
private memories: Map<string, BaseMemory> = new Map();
getMemoryKey(tenantId: string, conversationId: string): string {
return `${tenantId}:${conversationId}`;
}
async getMemory(
tenantId: string,
conversationId: string
): Promise<BufferMemory> {
const key = this.getMemoryKey(tenantId, conversationId);
if (this.memories.has(key)) {
return this.memories.get(key) as BufferMemory;
}
// Load memory from Redis with tenant isolation
const memory = new BufferMemory({
returnMessages: true,
memoryKey: 'chat_history',
chatHistory: new RedisChatMessageHistory({
sessionId: key,
client: this.redis,
keyPrefix: `memory:${tenantId}:` // Tenant prefix
})
});
this.memories.set(key, memory);
return memory;
}
async clearMemory(tenantId: string, conversationId: string): Promise<void> {
const key = this.getMemoryKey(tenantId, conversationId);
// Clear from cache
this.memories.delete(key);
// Clear from Redis
await this.redis.del(`memory:${tenantId}:${conversationId}`);
}
async clearAllTenantMemories(tenantId: string): Promise<void> {
// Find all memories for tenant
const keys = await this.redis.keys(`memory:${tenantId}:*`);
if (keys.length > 0) {
await this.redis.del(...keys);
}
// Clear from local cache
for (const [key, _] of this.memories) {
if (key.startsWith(tenantId)) {
this.memories.delete(key);
}
}
}
}
Scaling from 0 to 10k Customers
Scaling a LangChain SaaS requires careful planning at each growth stage.
Stage 1: 0-100 Customers (MVP)
// Simple architecture for early stage
export class MVPArchitecture {
// Single server setup
async initialize() {
const app = express();
// Basic middleware
app.use(express.json());
app.use(cors());
// Simple in-memory rate limiting
const rateLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});
app.use('/api/', rateLimiter);
// Single LangChain instance
const llm = new ChatOpenAI({
modelName: 'gpt-3.5-turbo',
temperature: 0.7
});
// Single vector store
const vectorStore = await HNSWLib.fromTexts(
[''],
[{}],
new OpenAIEmbeddings()
);
// Basic API endpoints
app.post('/api/chat', async (req, res) => {
try {
const { message, tenantId } = req.body;
// Simple tenant check
const tenant = await db.tenants.findUnique({ where: { id: tenantId }});
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' });
}
// Process with LangChain
const response = await llm.call([
new HumanMessage(message)
]);
// Track usage
await db.usage.create({
data: {
tenantId,
tokens: response.llmOutput?.tokenUsage?.totalTokens || 0,
cost: calculateCost(response.llmOutput?.tokenUsage)
}
});
res.json({ response: response.content });
} catch (error) {
res.status(500).json({ error: error.message });
}
});
app.listen(3000);
}
}
Stage 2: 100-1000 Customers (Growth)
// Architecture for growth stage
export class GrowthArchitecture {
async initialize() {
// Load balancer with multiple app servers
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers
for (let i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', (worker, code, signal) => {
console.log(`Worker ${worker.process.pid} died`);
cluster.fork(); // Replace dead workers
});
} else {
// Worker process
const app = express();
// Redis for distributed rate limiting
const redisClient = new Redis({
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD
});
// Distributed rate limiter
const rateLimiter = new RateLimiterRedis({
storeClient: redisClient,
keyPrefix: 'rl',
points: 100,
duration: 900 // 15 minutes
});
// Connection pooling for databases
const pgPool = new Pool({
connectionString: process.env.DATABASE_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000
});
// Shared vector store with connection pooling
const pinecone = new PineconeClient();
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENV
});
// Queue for heavy operations
const bullQueue = new Bull('langchain-jobs', {
redis: {
host: process.env.REDIS_HOST,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD
}
});
// Process jobs in background
bullQueue.process(async (job) => {
const { type, data } = job.data;
switch (type) {
case 'process_documents':
await processDocuments(data);
break;
case 'generate_embeddings':
await generateEmbeddings(data);
break;
}
});
app.listen(3000);
}
}
}
Stage 3: 1000-10k Customers (Scale)
// Enterprise-grade architecture
export class EnterpriseArchitecture {
async initialize() {
// Kubernetes deployment configuration
const k8sDeployment = {
apiVersion: 'apps/v1',
kind: 'Deployment',
metadata: {
name: 'langchain-saas-api',
labels: {
app: 'langchain-saas'
}
},
spec: {
replicas: 10, // Start with 10 replicas
selector: {
matchLabels: {
app: 'langchain-saas'
}
},
template: {
metadata: {
labels: {
app: 'langchain-saas'
}
},
spec: {
containers: [{
name: 'api',
image: 'langchain-saas:latest',
ports: [{
containerPort: 3000
}],
resources: {
requests: {
memory: '2Gi',
cpu: '1000m'
},
limits: {
memory: '4Gi',
cpu: '2000m'
}
},
env: [
{
name: 'NODE_ENV',
value: 'production'
},
{
name: 'DATABASE_URL',
valueFrom: {
secretKeyRef: {
name: 'database-secret',
key: 'url'
}
}
}
]
}]
}
}
}
};
// Horizontal Pod Autoscaler
const hpa = {
apiVersion: 'autoscaling/v2',
kind: 'HorizontalPodAutoscaler',
metadata: {
name: 'langchain-saas-hpa'
},
spec: {
scaleTargetRef: {
apiVersion: 'apps/v1',
kind: 'Deployment',
name: 'langchain-saas-api'
},
minReplicas: 10,
maxReplicas: 100,
metrics: [
{
type: 'Resource',
resource: {
name: 'cpu',
target: {
type: 'Utilization',
averageUtilization: 70
}
}
},
{
type: 'Resource',
resource: {
name: 'memory',
target: {
type: 'Utilization',
averageUtilization: 80
}
}
}
]
}
};
// Multi-region database setup
const databaseConfig = {
primary: {
host: 'db-primary.us-east-1.rds.amazonaws.com',
database: 'langchain_saas',
max: 100,
idleTimeoutMillis: 30000
},
replicas: [
{
host: 'db-replica-1.us-west-2.rds.amazonaws.com',
database: 'langchain_saas',
max: 50,
idleTimeoutMillis: 30000
},
{
host: 'db-replica-2.eu-west-1.rds.amazonaws.com',
database: 'langchain_saas',
max: 50,
idleTimeoutMillis: 30000
}
]
};
// Global CDN for static assets
const cdnConfig = {
provider: 'cloudflare',
zones: ['us', 'eu', 'asia'],
caching: {
'api/embeddings': 3600, // 1 hour
'api/documents': 86400 // 24 hours
}
};
}
}
Complete Example Application
Here's a complete example of a production-ready LangChain SaaS application:
// Main application entry point
import express from 'express';
import { ChatOpenAI } from 'langchain/chat_models/openai';
import { ConversationalRetrievalQAChain } from 'langchain/chains';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { OpenAIEmbeddings } from 'langchain/embeddings/openai';
import { Document } from 'langchain/document';
import Bull from 'bull';
import Stripe from 'stripe';
import { createClient } from 'redis';
import { Pool } from 'pg';
// Initialize services
const app = express();
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
const redis = createClient({ url: process.env.REDIS_URL });
const pgPool = new Pool({ connectionString: process.env.DATABASE_URL });
const jobQueue = new Bull('langchain-jobs', process.env.REDIS_URL!);
// Middleware
app.use(express.json());
app.use(express.urlencoded({ extended: true }));
// Health check endpoint
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
version: process.env.APP_VERSION,
timestamp: new Date().toISOString()
});
});
// Main chat endpoint
app.post('/api/v1/chat', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string;
const apiKey = req.headers['x-api-key'] as string;
try {
// Validate API key and get tenant
const tenant = await validateApiKey(apiKey, tenantId);
if (!tenant) {
return res.status(401).json({ error: 'Invalid API key' });
}
// Check rate limits
const rateLimitOk = await checkRateLimit(tenant.id);
if (!rateLimitOk) {
return res.status(429).json({ error: 'Rate limit exceeded' });
}
// Check token quota
const quotaOk = await checkTokenQuota(tenant.id);
if (!quotaOk) {
return res.status(402).json({ error: 'Token quota exceeded' });
}
// Get or create conversation chain
const chain = await getConversationChain(tenant.id);
// Process the chat request
const { question, conversationId } = req.body;
const startTime = Date.now();
// Get conversation memory
const memory = await getConversationMemory(tenant.id, conversationId);
// Execute chain
const response = await chain.call({
question,
chat_history: memory
});
// Calculate token usage
const tokenUsage = response.llmOutput?.tokenUsage || {
promptTokens: 0,
completionTokens: 0,
totalTokens: 0
};
// Track usage
await trackUsage({
tenantId: tenant.id,
conversationId,
tokenUsage,
duration: Date.now() - startTime,
timestamp: new Date()
});
// Update conversation memory
await updateConversationMemory(tenant.id, conversationId, {
question,
answer: response.text
});
// Return response
res.json({
answer: response.text,
sources: response.sourceDocuments,
usage: {
promptTokens: tokenUsage.promptTokens,
completionTokens: tokenUsage.completionTokens,
totalTokens: tokenUsage.totalTokens
},
conversationId
});
} catch (error) {
console.error('Chat error:', error);
res.status(500).json({
error: 'Internal server error',
message: process.env.NODE_ENV === 'development' ? error.message : undefined
});
}
});
// Document upload endpoint
app.post('/api/v1/documents', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string;
try {
// Validate tenant
const tenant = await getTenant(tenantId);
if (!tenant) {
return res.status(404).json({ error: 'Tenant not found' });
}
// Check document quota
const quotaOk = await checkDocumentQuota(tenant.id);
if (!quotaOk) {
return res.status(402).json({ error: 'Document quota exceeded' });
}
// Queue document processing job
const job = await jobQueue.add('process_document', {
tenantId: tenant.id,
documentUrl: req.body.documentUrl,
metadata: req.body.metadata
});
res.json({
jobId: job.id,
status: 'processing',
message: 'Document queued for processing'
});
} catch (error) {
console.error('Document upload error:', error);
res.status(500).json({ error: 'Failed to upload document' });
}
});
// Usage analytics endpoint
app.get('/api/v1/usage', async (req, res) => {
const tenantId = req.headers['x-tenant-id'] as string;
const { startDate, endDate } = req.query;
try {
const usage = await getUsageAnalytics(tenantId, {
startDate: new Date(startDate as string),
endDate: new Date(endDate as string)
});
res.json({
period: {
start: startDate,
end: endDate
},
tokens: {
total: usage.totalTokens,
prompt: usage.promptTokens,
completion: usage.completionTokens
},
requests: usage.requestCount,
documents: usage.documentCount,
cost: usage.totalCost,
dailyBreakdown: usage.dailyBreakdown
});
} catch (error) {
console.error('Usage analytics error:', error);
res.status(500).json({ error: 'Failed to fetch usage data' });
}
});
// Billing webhook
app.post('/webhook/stripe', express.raw({ type: 'application/json' }), async (req, res) => {
const sig = req.headers['stripe-signature'] as string;
try {
const event = stripe.webhooks.constructEvent(
req.body,
sig,
process.env.STRIPE_WEBHOOK_SECRET!
);
await handleStripeWebhook(event);
res.json({ received: true });
} catch (error) {
console.error('Stripe webhook error:', error);
res.status(400).json({ error: 'Webhook error' });
}
});
// Job processing
jobQueue.process('process_document', async (job) => {
const { tenantId, documentUrl, metadata } = job.data;
try {
// Download document
const document = await downloadDocument(documentUrl);
// Split into chunks
const chunks = await splitDocument(document);
// Generate embeddings
const embeddings = new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
});
// Get tenant vector store
const vectorStore = await getTenantVectorStore(tenantId);
// Add documents with tenant metadata
const documents = chunks.map(chunk => new Document({
pageContent: chunk.text,
metadata: {
...metadata,
tenantId,
source: documentUrl,
chunkIndex: chunk.index,
processedAt: new Date().toISOString()
}
}));
await vectorStore.addDocuments(documents);
// Update document count
await incrementDocumentCount(tenantId, documents.length);
// Track completion
await job.progress(100);
} catch (error) {
console.error('Document processing error:', error);
throw error;
}
});
// Helper functions
async function getConversationChain(tenantId: string) {
const tenant = await getTenant(tenantId);
const llm = new ChatOpenAI({
openAIApiKey: process.env.OPENAI_API_KEY,
modelName: tenant.settings.model || 'gpt-3.5-turbo',
temperature: tenant.settings.temperature || 0.7,
maxTokens: tenant.settings.maxTokens || 1000,
callbacks: [
{
handleLLMEnd: async (output) => {
// Track token usage in real-time
await trackTokenUsage(tenantId, output.llmOutput?.tokenUsage);
}
}
]
});
const vectorStore = await getTenantVectorStore(tenantId);
const chain = ConversationalRetrievalQAChain.fromLLM(
llm,
vectorStore.asRetriever(),
{
returnSourceDocuments: true,
qaChainOptions: {
type: 'stuff',
prompt: tenant.settings.customPrompt || undefined
}
}
);
return chain;
}
async function getTenantVectorStore(tenantId: string) {
const pinecone = new PineconeClient();
await pinecone.init({
apiKey: process.env.PINECONE_API_KEY!,
environment: process.env.PINECONE_ENV!
});
const index = pinecone.Index(process.env.PINECONE_INDEX!);
return await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings({
openAIApiKey: process.env.OPENAI_API_KEY
}),
{
pineconeIndex: index,
namespace: `tenant_${tenantId}`,
filter: { tenantId }
}
);
}
// Start server
const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
console.log(`LangChain SaaS API running on port ${PORT}`);
});
// Graceful shutdown
process.on('SIGTERM', async () => {
console.log('SIGTERM received, shutting down gracefully');
// Close job queue
await jobQueue.close();
// Close database connections
await pgPool.end();
// Close Redis connection
await redis.quit();
process.exit(0);
});
Deployment and Operations
For production deployment, consider these critical aspects:
Docker Configuration
# Multi-stage build for optimized image
FROM node:18-alpine AS builder
WORKDIR /app
# Copy package files
COPY package*.json ./
COPY tsconfig.json ./
# Install dependencies
RUN npm ci --only=production
# Copy source code
COPY src ./src
# Build TypeScript
RUN npm run build
# Production stage
FROM node:18-alpine
WORKDIR /app
# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init
# Create non-root user
RUN addgroup -g 1001 -S nodejs
RUN adduser -S nodejs -u 1001
# Copy built application
COPY /app/dist ./dist
COPY /app/node_modules ./node_modules
COPY /app/package*.json ./
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 3000
# Use dumb-init to handle signals
ENTRYPOINT ["dumb-init", "--"]
CMD ["node", "dist/index.js"]
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: langchain-saas-api
namespace: production
spec:
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
selector:
matchLabels:
app: langchain-saas-api
template:
metadata:
labels:
app: langchain-saas-api
spec:
containers:
- name: api
image: langchain-saas:latest
ports:
- containerPort: 3000
name: http
env:
- name: NODE_ENV
value: "production"
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-credentials
key: url
- name: REDIS_URL
valueFrom:
secretKeyRef:
name: redis-credentials
key: url
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openai-credentials
key: api-key
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: langchain-saas-api
namespace: production
spec:
selector:
app: langchain-saas-api
ports:
- port: 80
targetPort: http
protocol: TCP
type: ClusterIP
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: langchain-saas-api-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: langchain-saas-api
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
Lessons Learned and Best Practices
After building and scaling multiple LangChain SaaS applications, here are key lessons learned:
1. Cost Management
- Monitor token usage religiously: Set up alerts for unusual spikes
- Implement smart caching: Cache embeddings and common queries
- Use appropriate models: Don't use GPT-4 when GPT-3.5 suffices
- Batch operations: Process multiple requests together when possible
2. Performance Optimization
- Vector store partitioning: Split large indices by date or category
- Connection pooling: Maintain pools for all external services
- Async processing: Use queues for non-real-time operations
- Edge caching: Cache static responses at CDN level
3. Security Best Practices
- API key rotation: Implement automatic key rotation
- Tenant data encryption: Encrypt sensitive data at rest
- Audit logging: Log all data access and modifications
- Input validation: Sanitize all user inputs before processing
4. Operational Excellence
- Comprehensive monitoring: Track every aspect of the system
- Automated testing: Test tenant isolation regularly
- Disaster recovery: Regular backups and recovery drills
- Documentation: Keep runbooks updated
5. Customer Success
- Usage dashboards: Provide real-time usage visibility
- Cost predictability: Offer usage alerts and projections
- API documentation: Maintain excellent API docs with examples
- Support integration: Build debugging tools for support team
Conclusion
Building a production-ready LangChain SaaS requires careful attention to architecture, scaling, and operational concerns. By following the patterns and practices outlined in this guide, you can build a system that scales efficiently from your first customer to thousands while maintaining reliability, security, and cost-effectiveness.
Remember that every SaaS is unique – adapt these patterns to your specific use case and requirements. Start simple, measure everything, and iterate based on real customer needs.
For a complete deployment guide with infrastructure as code, monitoring setup, and CI/CD pipelines, check out our LangChain SaaS Deployment Guide.
Happy building!