AI Coworker Development Principles

AI Coworker Development Principles

Status: Foundational specification Created: 2026-03-31 Authors: Mark Bodman, Claude (Software Engineer) References: Diversity of Thought Framework, IT4IT v3.0.1, EP-BUILD-HANDOFF spec


Purpose

This document defines the architectural principles for developing AI Coworker agents within the Digital Product Factory. It is the governing specification for all future agent design, tool assignment, memory strategy, and multi-agent orchestration.

These principles are derived from production testing, industry framework research (Anthropic Agent SDK, OpenAI Agents SDK, LangGraph, CrewAI, AutoGen), and the platform’s Diversity of Thought framework.


Principle 1: Specialization Over Generalization

A specialist with 5 focused tools outperforms a generalist with 40.

Rule

Each AI Coworker agent should have access to no more than 10 tools relevant to its current task. When tool count exceeds 15, tool selection accuracy degrades significantly, regardless of model capability.

Implementation

Tools are tagged with the phases and contexts in which they are relevant. The platform filters the tool list before each agent invocation, presenting only the tools the agent needs for its current role.

type ToolDefinition = {
  name: string;
  buildPhases?: BuildPhaseTag[];  // Only available during these phases
  // ... other fields
};

Evidence


Principle 2: Orchestrator-Worker Pattern

A coordinator routes work to specialists. Specialists do not route to each other.

Rule

Multi-step workflows use a hierarchical orchestrator-worker pattern. The build pipeline acts as the orchestrator, dispatching each phase to the appropriate specialist agent. Agents do not hand off directly to each other — the orchestrator mediates all transitions.

Implementation

Each build phase maps to a specialist agent:

Phase Agent Role IT4IT Alignment Model Tier
Ideate Product Designer §5.2.1 Conceptualize Standard (Haiku)
Plan Architect §5.2.4 Define Architecture Standard (Haiku)
Build Software Engineer §5.3.3 Design & Develop Frontier (Sonnet)
Review QA / Scrum Master §5.3.5 Accept & Publish Standard (Haiku)
Ship Operations Engineer §5.4 Deploy + §5.5 Release Standard (Haiku)

Rationale


Principle 3: Structured Handoffs, Not Conversation History

Pass decisions and context, not transcripts.

Rule

When work transitions between agents (or between phases), the outgoing agent produces a structured handoff document. The incoming agent reads only this document — never the raw conversation history from the previous phase.

Implementation

interface PhaseHandoff {
  fromPhase: BuildPhase;
  toPhase: BuildPhase;
  summary: string;              // 2-3 sentences, plain language
  evidence: Record<string, unknown>;  // Phase-specific artifacts
  openIssues: string[];         // What the next agent should know
  userPreferences: string[];    // Decisions the user made
}

Rationale


Principle 4: Diversity of Thought in Agent Design

Different agents should think differently, not just have different tools.

Rule

Each agent’s system prompt defines three cognitive components from the Diversity of Thought framework:

Component What it defines Example
Perspective How the agent frames the problem Software Engineer sees “code structure”; Ops Engineer sees “deployment safety”
Heuristics Strategies for finding solutions Engineer uses test-driven development; Ops uses rollback-first deployment
Interpretive Model What “good” means Engineer optimizes for correctness; Ops optimizes for availability

Implementation

Agent system prompts explicitly declare their perspective, heuristics, and success criteria. This is not decorative — it determines which solutions the agent considers and which it misses.

When a complex problem requires multiple perspectives (a rugged landscape in Diversity of Thought terms), the orchestrator consults multiple specialists before deciding. The combined output exceeds what any single agent would produce.

Rationale


Principle 5: Selective Memory, Not Total Recall

Remember decisions and rationale. Re-derive details from source.

Rule

The vector database (Qdrant) stores salient context — decisions, user preferences, design rationale, and cross-conversation insights. It does not store raw conversation transcripts, code content, or data that can be derived from the codebase or git history.

What to Store

Store Example Why
User decisions “User chose in-memory state over database for this demo” Informs future suggestions
Design rationale “Complaints tracker uses client-side state because it’s a demo feature” Prevents re-litigating decisions
Cross-conversation context “The promoter image is JIT-built from the portal container” Connects knowledge across sessions
Discovered constraints “Anthropic subscription only gives Haiku access” Prevents repeated failures
Quality patterns “This user prefers Tailwind over CSS modules” Personalizes agent behavior

What NOT to Store

Skip Example Why
Raw conversation “User said: build it now…” Ephemeral, bulky, low signal
Code content “The complaints page contains…” Read from sandbox or git
Build artifacts Test output, diffs, logs Stored in FeatureBuild record
Transient state “Build is in plan phase” Query the database

Implementation

Each agent stores memories at natural decision points — not after every exchange. The memory is tagged with the agent role, build phase, and topic so retrieval is contextual.

Semantic recall uses the query context (current conversation + build phase) to retrieve the 5-8 most relevant memories. This is sufficient because memories are distilled to decisions and rationale, not raw detail.

Rationale


Principle 6: Tools Must Be Self-Documenting

If the model can’t understand a tool from its schema, the schema is wrong.

Rule

Every tool definition includes:

The build phase system prompt includes a tool usage guide that maps common tasks to specific tools with parameter examples.

Implementation

TOOL GUIDE:
- To create a new file: write_sandbox_file(path, content) — content is the FULL file
- To modify existing file: read first, then edit_sandbox_file(path, old_text, new_text)
- To run commands: run_sandbox_command(command)

Rationale


Principle 7: Human-in-the-Loop at Phase Boundaries

The human approves transitions, not individual tool calls.

Rule

Human approval gates exist at phase boundaries (ideate → plan, plan → build, review → ship), not at individual tool calls within a phase. Within a phase, the agent operates autonomously using its scoped tools.

Exception: executionMode: "proposal" tools present a card for approval before executing side effects that affect production (deploying to production, registering products, modifying user data).

Implementation

Rationale


Principle 8: Fail Fast, Explain Clearly

Stop on the first error. Don’t retry blindly. Tell the user what happened.

Rule

When a tool call fails, the agent should:

  1. Report the error in plain language
  2. Explain what it was trying to do
  3. Suggest what the user can do (if applicable)
  4. Stop — do not retry the same call with the same arguments

The agentic loop enforces a tool repetition limit (3-5 calls of the same tool). This is a safety net, not a feature — agents should not need it if they handle errors correctly.

Rationale


Principle 9: Responsible Capacity Utilization

Use paid AI capacity for governed value, not empty activity.

Rule

AI coworkers should treat available paid capacity as an operating asset. When authorized work is available, idle capacity is waste. When no useful, safe, evidence-producing work is available, the coworker should record or surface the blocker rather than spend tokens to appear busy.

Useful capacity work includes:

Implementation

Capacity use should be driven by Standing Orders, calendar/availability state, safe work queues, and existing authority controls. Coworkers may continue low-risk governed work when humans are unavailable, but must stop at approval boundaries for consequential actions.

Rationale

A salaried employee who does nothing while valuable work exists wastes organizational capacity. Fixed-price or subscription AI capacity has the same economic shape. The goal is not to burn tokens. The goal is to convert available capacity into reviewed work, evidence, learning, and platform improvement.


Application

These principles apply to:

When these principles conflict with expediency, the principles win. A well-structured agent that works reliably is worth more than a quick hack that fails unpredictably.