SEG

STR

EML

APR

Launch

home campaigns

4/4 LIVE

campaign-graph · autonomail

$ langgraph run --brief "Winter sale · urban shoppers · ₹1500+ AOV"

✓

brief-parser-agentdone

Product: Winter Collection · Goal: Sales Conversion · CTA extracted

✓

segmentation-agentdone

RFM scored · k=4 clusters · silhouette=0.61 · 4 segments validated

✓

strategy-agentdone

Quality gate passed · Send: Tue 10AM · Budget allocated

◉

content-gen-agentrunning

Parallel fan-out · 4 segments · spam score: 0.8 · CTR pred: 3.1%

○

approval-agentqueued

interrupt() · awaiting human review

○

execution-agentqueued

Pending approval

○

monitoring-agentqueued

Event-driven · Redis Streams subscriber

○

optimization-agentqueued

Chi-square test · feature delta · memory retrieval

Not a pipeline.A graph ofautonomous agents.

Autonomail deploys eight specialized agents equipped with real tools — that cycle on failure, run in parallel, and accumulate cross-campaign memory.

Launch a campaign View campaigns

Conditional Graph Topology

Every edge has a condition.
Every failure has a cycle.

LangGraph routes based on what agents compute — not a fixed sequence. The Quality Gate cycles back to Strategy on failure. Human rejection routes back to Content Gen with feedback injected into state.

01Brief Parser

→

02Segmentation

→

03Strategy

→

04Quality Gatedeterministic

→

05Content Gen ×NSend API ×N

→

06Approval �?�interrupt()

→

07Execution

→

08Monitoring

→

09Perf Gatedeterministic

→

10Optimization

↺ Quality Gate → Strategy·retry on failure (max 3)

↺ Approval → Content Gen·rejected/edits with feedback injected

↺ Optimization → Content Gen·evidence-backed brief from memory

Agent Roster

Eight agents. Each with a real tool belt.

Tools are deterministic Python functions — not LLM calls. The LLM decides which tool to invoke; Python executes it and returns real data.

Brief Parser Agent

Extracts product, audience, goals and CTA from natural language. GPT-4 with schema validation.

Segmentation Agent

Real RFM scoring + sklearn KMeans on actual customer data. Validates segment sizes for A/B significance.

Strategy Agent

Plans send timing, budget allocation, A/B test design. Routes back through the Quality Gate on failure.

Quality Gate Agent

Pure deterministic checks — no LLM. Validates sizes, budget math, spacing. Routes to Strategy on failure.

Content Gen Agent

Fan-out via LangGraph Send API. Parallel variants per segment, grounded in spam scores and CTR predictions.

Approval Agent

LangGraph interrupt() pauses the graph, serializes state to MongoDB, resumes from checkpoint on approval.

Execution Agent

Interfaces with Campaign & Email APIs. Logs recipient counts, timestamps, and delivery status into shared state.

Monitoring & Opt Agent

Event-driven via Redis Streams. Chi-square significance tests before routing underperformers to Content Gen.

Architecture Highlights

What makes it genuinely agentic.

Three architectural patterns that separate Autonomail from a sequential LLM pipeline — implemented with real code, not described in a prompt.

LangGraph Send API

Parallel Fan-Out / Fan-In

Instead of generating content for each segment sequentially, Autonomail uses LangGraph's Send API to fan out one Content Gen task per segment concurrently. Four segments in 15 seconds instead of 60. Architecturally correct — segments are independent.

Fan-out: Strategy → Send([seg-A, seg-B, seg-C, seg-D])

├─ seg-A: Content Gen ✓ spam: 0.8 CTR: 3.2%

├─ seg-B: Content Gen ✓ spam: 1.1 CTR: 2.9%

├─ seg-C: Content Gen ✓ spam: 0.6 CTR: 3.5%

└─ seg-D: Content Gen ✓ spam: 0.9 CTR: 3.1%

Fan-in: merge_variants → Human Approval

MongoDB + pgvector

Two-Layer Persistent Memory

Run-level state is checkpointed to MongoDB after every node — the graph survives server restarts. Cross-campaign pgvector store captures what worked and why, queried by semantic similarity when new campaigns start.

# Layer 1 — run-level (MongoDB checkpoint)

state.checkpoint(node='strategy', data={...})

# survives server restart at any point

# Layer 2 — cross-campaign (pgvector)

memory.query(brief, top_k=5) # semantic similarity

# → learnings from 50 past campaigns

Quality Gate + interrupt()

Conditional Graph with Cycles

Every edge has a condition. Quality Gate checks segment sizes, budget math, send-time spacing — all deterministic, no LLM. Failure routes back to Strategy (max 3 retries). Human rejection routes back to Content Gen with the feedback injected into state.

Quality Gate → Strategy # fail: segment too small

Strategy → Quality Gate # retry with merged seg

Quality Gate → Content Gen # pass: all checks ✓

Content Gen → interrupt() # pause graph, save state

Human Approval → Execution # approved

Human Approval → Content Gen # rejected + feedback

Full pipeline tracing

LangSmith Observability

Every node, every LLM call, every tool call traced with latencies and token counts. Detects prompt injection, measures per-tenant costs, and surfaces pipeline bottlenecks. Makes "how do you know it's working?" answerable with data.

run: campaign-47 · duration: 94s · $0.38

├─ segmentation-agent 12s ✓ silhouette=0.61

├─ strategy-agent 8s ✓ quality gate pass

├─ content-gen ×4 18s ✓ parallel fan-out

├─ approval 14h �?� human pending

└─ hallucination check 0 unverified claims

cost breakdown: content-gen 61% · strategy 18%

Phase 2 — Enterprise Platform

Six additions that make it a platform.

Multi-tenancy, federated learning, fine-tuned models, adversarial debate, autonomous monitoring — problems that only exist at scale. Each one a genuine architectural addition, not a feature flag.

Multi-Tenancy + Federated Memory

Complete data isolation per tenant. Abstracted patterns — normalized metrics, structural content patterns, anonymized audience tier signals — are aggregated nightly into a shared knowledge base. New tenants benefit from platform-wide learnings without accessing any existing customer's data.

Event-Driven Autonomous Monitoring

Continuous Redis Streams subscriber — not a polling job. Reacts to unsubscribe spikes, out-of-stock events, competitor campaign detections, and viral moments in real time. Each signal type has a response playbook with ordered actions by urgency and reversibility.

Outcome-Based Fine-Tuning (CCIM)

Llama 3.1 8B fine-tuned with LoRA adapters on actual campaign open rates and CTRs — not human preference labels. Deployed after 20+ campaigns, versioned in MLflow. Training labels are real behavioral outcomes, not a human rater's opinion on what sounds good.

Multi-Agent Debate Protocol

High-stakes campaigns (large budget, large audience, new category) route through a four-phase deliberation: Strategist proposes → Devil's Advocate critiques with evidence → Strategist revises → Risk Assessment Agent scores across brand safety, audience risk, financial risk, and compliance.

Natural Language Supervisor Agent

Top-level agent that accepts plain-language instructions and orchestrates any combination of agents, database operations, and scheduled tasks. Uses confidence thresholds for ambiguity resolution — proceeds autonomously above 0.85, asks one clarifying question below 0.60.

AgentEval Framework

Continuously measures four dimensions: task completion rate, decision accuracy (predicted vs actual CTR), hallucination rate (unverified factual claims from memory), and cost efficiency per node. Generates weekly reports. Answers "how do you know it's working?" with data.

System is live

Submit a brief.
Watch eight agents work.

The full pipeline — RFM clustering, quality gates, parallel content generation, human approval, statistical optimization — runs end to end from a plain-language brief.

✓Real sklearn clustering — not GPT guessing segments

✓Conditional graph with cycles and retry loops

✓LangGraph interrupt() for stateful human approval

✓Chi-square significance tests before any optimization

Start a campaign Browse campaigns