šŸ“‹ Executive Summary

18 problems identified. 3 meta-categories. 1 brutal truth.

Key Insight: "Saint is the only quality control mechanism." Every problem was caught by Saint, not by Billy. There is no automated monitoring, no self-testing, no alerting, no verification pipeline.

šŸ”“ The 3 Meta-Problems

  • Memory Fragility — Context is ephemeral, disk writes require discipline, two AI systems don't share state. (Problems 1, 5, 11, 18)
  • Zero Monitoring — Nothing tested, nothing monitored. Skills break silently. Config changes unvalidated. No CI, no alerting. (Problems 2, 3, 4, 6, 10, 13, 14)
  • LLM Behavioral Tendencies — Deferring action, presenting guesses as facts, building instead of reusing, batching instead of streaming. (Problems 3, 7, 8, 9, 10, 12, 17)

šŸ“Š Current Status

  • āœ… 5 Fixed — Crash loops, JSON validation, streamMode, Supadata, env vars
  • āš ļø 6 Behavioral — Rules exist but enforcement is behavioral only
  • šŸ”§ 3 Partially Fixed — Compaction, voice context, email monitoring
  • āŒ 4 Unfixed — Post-call summary, skill validation, self-audit enforcement, voice recall

Memory Council convened Feb 17. QMD backend deployed. softThresholdTokens raised from 8K → 80K.

šŸ—‚ļø Problem Catalog

All 18 identified problems, color-coded by severity.

Compaction destroyed Cash Cary meeting notes#1
Billy updated a slide deck but never wrote notes to memory/. 6 compactions later, notes gone. No write-first discipline.
Criticalāœ… Mitigated
6/7 skills failed validation — broken for weeks#2
Skills were broken and nobody noticed. No automated validation, no periodic health checks.
CriticalāŒ Unfixed
1,073 restart crash loop from bad config#6
Config had ${OPENAI_API_KEY} reference but env var wasn't in systemd service file. No pre-flight validation.
Criticalāœ… Fixed
Billy not self-auditing when asked multiple times#3
Each session starts fresh. Standing orders only exist as text that may not be read. No persistent task queue.
Highāš ļø Behavioral
Post-call Telegram summary broken since OC 2.15#4
PostCall dispatch fires but no summary appears. Standalone voice bridge is separate from OpenClaw plugin.
HighāŒ Unfixed
Voice bridge stale context#5
Voice-Billy told Saint Telegram was "intentionally disabled" — wrong. Static context-briefing.md file is always stale.
Highāš ļø Partial
Emailing Trey without permission#7
Billy interpreted "message Trey" as authorization to email directly. No hard gate on outbound comms.
Highāš ļø Behavioral
Not surfacing inbound emails#8
Trey replied to an email. Billy didn't tell Saint. No automated email monitoring pipeline.
Highāš ļø Behavioral
Presenting unverified data as verified#10
LLM confidence without verification. No verification step in workflow. "Specification-first" principle not enforced.
Highāš ļø Behavioral
Stale info served after compaction#11
"Cash is coming to town" when meeting already happened. MEMORY.md not updated after events complete.
HighāŒ Unfixed
Building new instead of checking existing#12
Multiple times, Billy started building new solutions when existing tools already handled the task. Recurring despite rules.
Highāš ļø Behavioral
Deferring work instead of doing immediately#9
"I'll spin up agents tonight" — middle of the day. LLM tendency toward planning over action.
Mediumāš ļø Behavioral
JSON config with line breaks crashing system#13
Literal newlines in JSON string values. Config parse failed, gateway crashed. No validation before restart.
Mediumāœ… Fixed
ENV vars missing from systemd service file#14
openclaw.json referenced ${OPENAI_API_KEY} but systemd service file didn't have it.
Mediumāœ… Fixed
Telegram streamMode vanishing messages#15
With streamMode: "partial", messages disappeared due to Telegram edit API rate limits.
Mediumāœ… Fixed
Voice bridge in-conversation recall failure#16
Asked "what was the first word I said?" — wrong twice. 20-message rolling window, no robust in-session recall.
MediumāŒ Unfixed
Not pushing status updates proactively#17
Waiting for all 4 agents before synthesizing. Batch mentality instead of streaming mentality.
Mediumāš ļø Behavioral
Supadata API key not found (wrong paths)#18
Key at ~/.openclaw/credentials/.supadata-key but Billy looked in env vars and other paths. No credential registry.
Lowāœ… Fixed

šŸ” Root Cause Analysis

Every problem maps to one of three meta-categories.

A. Memory Fragility

Problems: #1, #5, #11, #18

The agent's memory is structurally fragile. Context is ephemeral — the 200K context window creates an illusion of infinite memory. Billy works for hours, accumulates 150K+ tokens, feels like it "remembers" everything, and never writes to disk. Then compaction hits and everything evaporates.

  • Two AI systems (main Billy + voice Billy) don't share state
  • No single source of truth that's both durable AND current
  • MEMORY.md not updated after events complete → stale info persists forever
  • Credentials scattered across files, env vars, config — no registry

B. Zero Monitoring / Verification

Problems: #2, #3, #4, #6, #10, #13, #14

Nothing is monitored. Nothing is tested. Skills break silently. Config changes aren't validated. Data isn't verified. There are no automated checks, no CI, no alerting. Everything relies on Saint catching problems manually.

  • 6/7 skills broken for weeks — nobody noticed
  • No pre-flight config validation before restarts
  • No skill health checks, no cron-based validation
  • Post-call summary broken since OC 2.15 — still broken

C. LLM Behavioral Tendencies

Problems: #3, #7, #8, #9, #10, #12, #17

The LLM has predictable failure modes that documentation alone won't fix. These are inherent to how LLMs work — they require mechanical enforcement: crons, checklists, validation gates.

  • Deferring action — "I'll do this later" feels safe, but Saint wants execution NOW
  • Presenting guesses as facts — LLM confidence without verification
  • Building instead of reusing — doesn't check TOOLS.md, reference/, skills first
  • Batching instead of streaming — completionism vs progressive updates
  • Ignoring standing orders — each session starts fresh, instructions don't carry forward
The Honest Assessment: The problems aren't primarily technical. The real problem is that Billy operates without guardrails. Saint shouldn't have to be Billy's QA department. The path forward: automate monitoring, create mechanical gates for behavioral issues, and accept that some LLM tendencies will persist — build systems that catch them.

āš–ļø Memory Council Verdicts

Three models analyzed the evidence package independently. Unanimous on key fixes.

OPUS Claude Opus — Architectural Focus

Verdict: "The architecture creates a trap. The 200K context window is too large — it gives the illusion of infinite memory."

  • Root cause: Both architectural AND behavioral, but architecture makes behavioral compliance nearly impossible
  • Recommended softThresholdTokens: 100,000 → flush at ~80K tokens (40% capacity)
  • Enable QMD with session indexing as passive backup
  • "The compaction summary is a table of contents, not a book"
  • Manual /compact after major work blocks as habit
  • Compaction summaries are fundamentally lossy — bridge via proactive disk writes + QMD search

GROK Grok — Root Cause + Specific Fixes

Verdict: "Hybrid: 60% Behavioral, 40% Architectural. Fixable with config + enforcement."

  • Recommended softThresholdTokens: 50,000 → flush at ~135K tokens
  • Enable QMD immediately — session transcript indexing fixes recall failures
  • Lower session sync deltas: 50KB/25msg → 10KB/10msg
  • Build memory-guard skill: subagent that auto-flushes on long sessions
  • Cron every 30min to check token count and trigger flush if >50K
  • Projected improvement: 95% persistence with all fixes applied

GEMINI Gemini — Architecture-First Analysis

Verdict: "Both, but architectural is primary. The architecture must FORCE the behavior."

  • Recommended softThresholdTokens: 50,000 → flush at ~120K tokens
  • Behavioral protocols ask the model to act AGAINST its training (prioritize future-self over current-task)
  • Mandatory memory checkpoints: every 10 user messages or 30min of work
  • Add memory health check to heartbeat: check freshness, test memory_search, checkpoint if >100K tokens
  • Higher text weight in hybrid search: 0.3 → 0.4 for better keyword matching
  • "The memory system is misconfigured, not broken"

šŸ¤ Council Consensus

  • āœ… Unanimous: Enable QMD backend with session indexing
  • āœ… Unanimous: Raise softThresholdTokens significantly (50K-100K range)
  • āœ… Unanimous: Lower session sync delta thresholds
  • āœ… Unanimous: Compaction summaries are fundamentally lossy — not a memory strategy
  • āœ… Unanimous: Behavioral + architectural fixes needed — neither alone is sufficient

āš™ļø Config Recommendations

All proposed config changes with JSON. Validate with python3 -m json.tool before restarting.

Compaction — Earlier Flush + More Reserve

Raise softThresholdTokens from 8K → 80K (implemented). Council recommended 50K-100K range.

openclaw.json — agents.defaults.compaction
{
  "mode": "safeguard",
  "reserveTokensFloor": 30000,
  "memoryFlush": {
    "enabled": true,
    "softThresholdTokens": 80000,
    "prompt": "CRITICAL: Write ALL important context to memory/YYYY-MM-DD.md NOW.",
    "systemPrompt": "You are about to lose context. Write EVERYTHING important to disk."
  }
}

QMD Backend — Session Indexing + Extra Paths

Swaps search engine under memory-core. BM25 + vectors + reranking. Fallback to SQLite if QMD fails.

openclaw.json — memory
{
  "backend": "qmd",
  "citations": "auto",
  "qmd": {
    "includeDefaultMemory": true,
    "sessions": { "enabled": true, "retentionDays": 30 },
    "update": { "interval": "5m", "debounceMs": 10000, "onBoot": true },
    "limits": { "maxResults": 10, "timeoutMs": 5000 },
    "paths": [
      { "name": "projects", "path": "projects", "pattern": "**/*.md" },
      { "name": "research", "path": "research", "pattern": "**/*.md" },
      { "name": "ideas", "path": "ideas", "pattern": "**/*.md" }
    ]
  }
}

Context Pruning — TTL-Based

Trim old tool results before LLM calls. Reduces cache-write costs on Anthropic.

openclaw.json — agents.defaults.contextPruning
{
  "mode": "cache-ttl",
  "ttl": "5m",
  "keepLastAssistants": 3
}

Session Sync — Lower Thresholds

Index sessions more frequently. Previous: 50KB/25msg. Now: 10KB/10msg.

openclaw.json — agents.defaults.memorySearch.sync
{
  "watch": true,
  "sessions": { "deltaBytes": 10000, "deltaMessages": 10 }
}

Heartbeat Active Hours

agents.defaults.heartbeat
{
  "every": "30m",
  "target": "last",
  "activeHours": {
    "start": "08:00",
    "end": "23:00"
  }
}

Voice-Call Plugin — Disable

Standalone bridge is the actual system. Plugin has stale config (references "Sunzi.io").

plugins.entries.voice-call
{ "enabled": false }

Model Aliases — Clean Up Duplicates

Remove 4.5 entries to avoid ambiguity. Both opus-4.5 and opus-4.6 have alias "opus".

Remove these duplicate entries
// REMOVE — superseded by 4.6 versions:
"anthropic/claude-opus-4-5": { "alias": "opus" }
"anthropic/claude-sonnet-4-5": { "alias": "sonnet" }
"openrouter/anthropic/claude-sonnet-4-5": { "alias": "or-sonnet" }

🧰 Skills & Tools Audit

Current state: 3 of 50+ available skills enabled. 4 bundled hooks ready but not explicitly enabled.

Currently Enabled (3)

openai-image-genActive
openai-whisper-apiActive
sagActive

🟢 Must Enable (High Value, Low Effort)

SkillWhyPrerequisite
session-logsSearch conversation history. Essential for continuity.Install ripgrep
githubgh already installed. Manage repos, PRs, issues.None
weatherFree, no API key, curl-based.None
healthcheckSecurity hardening guidance for VPS.None
skill-creatorMeta-skill for building better skills.None
tmuxAlready installed. Background processes.None

🟔 Should Enable (Medium Value, Some Setup)

SkillWhyPrerequisite
himalayaCLI email client — native inbox read/send.pip/cargo install + IMAP config
nano-pdfEdit PDFs with natural language.pip install (needs uv)
summarizeSummarize URLs, YouTube, podcasts.Manual Linux install
clawhubSearch/install community skills.npm i -g clawhub

šŸ”§ Hooks to Enable

HookPurposeStatus
session-memoryAuto-saves context on /new — prevents data lossReady, not enabled
command-loggerAudit trail for all commandsReady, not enabled
boot-mdRuns BOOT.md on gateway startReady, not enabled

šŸ”“ Custom Skills — Issues Found

SkillIssueFix
build-methodologyVery generic TDD guide. Model already knows this.Trim by 60%
creative-teamWell-structured but HEAVY. No quick mode.Add cost estimates + quick mode
project-tracking + task-trackingOverlapping concernsMerge into one skill
image-genUses Gemini Flash — quality is mehAdd fallback to openai-image-gen
x-scraperPuppeteer-based, fragile, needs cookiesBrowser tool may work better now

šŸ“¦ Missing Binaries

rg (ripgrep) himalaya summarize nano-pdf clawhub CLI

⚔ Deep Dive: Quick Wins

Prioritized action items from the self-improvement deep dive.

šŸ”“ TODAY (30 min total)

TodayUpdate OpenClaw to 2026.2.17 — gets 1M context beta, Sonnet 4.6, inline buttons
TodayEnable session-memory + command-logger hooks
TodayAdd heartbeat active hours (08:00-23:00 CST)
TodayAdd session-logs, github, weather, healthcheck, tmux, skill-creator to skills
TodayInstall ripgrep: sudo apt-get install -y ripgrep
TodayClean up memory-lancedb warning from config
TodayCreate scripts/validate-config.sh — prevents crash loops forever
TodayCreate STANDING-ORDERS.md — centralizes open tasks

šŸ”µ THIS WEEK

WeekFix post-call Telegram summary (direct Telegram API from voice bridge)
WeekBuild dynamic context for voice bridge (fetch from gateway API at call start)
WeekSet up Himalaya email — native CLI inbox
WeekEnable webhook ingress + n8n integration
WeekCreate BOOT.md startup checklist
WeekMerge project-tracking + task-tracking skills
WeekTrim build-methodology skill by 60%
WeekEnable 1M context beta (after OC update)

⚪ NEXT WEEK

Nextn8n → webhook integration for Live Energy
NextMulti-agent routing for Live Energy agent (separate workspace/persona)
NextCommunity skill audit via clawhub CLI

šŸ“– @ksimback Memory Optimization Guide

External expert guide mapped to our setup. Source: x.com/ksimback

Three Failure Modes

Failure ModeDescriptionOur Status
Memory not savedLLM decides what's worth saving. Important context slips through.āš ļø Mitigated (WRITE-FIRST RULE + flush tuning)
Saved but never retrievedAgent answers from context instead of searching disk.āš ļø QMD enabled, needs verification
Compaction destroys knowledgeInfo only in conversation gets summarized away.āš ļø softThresholdTokens raised to 80K

4 Basic Config Fixes → Our Mapping

@ksimback FixOur ImplementationStatus
Customize compaction flush promptCustom prompt + systemPrompt in memoryFlush configāœ… Done
Context pruning via TTLcontextPruning: cache-ttl, 5m, keepLastAssistants: 3šŸ“‹ Proposed
Hybrid memory search (vector + BM25)hybrid.enabled: true, vectorWeight: 0.7, textWeight: 0.3āœ… Done
Session transcript indexingQMD sessions.enabled: true, retentionDays: 30āœ… Done

Advanced Tools

QMD (Tobi/Shopify)

Local sidecar, BM25 + vector + reranking. Can index external docs. āœ… Deployed

Mem0 (YC-backed)

Auto-Capture + Auto-Recall outside context window. Not evaluated

Cognee

Knowledge graph from data. Docker-based, non-trivial. Not evaluated

Obsidian

External brain. Git-backed vault or QMD indexing. Not evaluated

Multi-Agent Memory Architecture

  • Layer 1: Private memory per agent (MEMORY.md + daily notes) — āœ… Active
  • Layer 2: Shared reference files (symlinked _shared/ directory) — šŸ“‹ Not implemented
  • Layer 3: QMD with shared paths (all agents search same docs) — āœ… Partially (QMD indexes projects/, research/)
  • Layer 4: Coordination agent ("Chief of Staff") — šŸ“‹ Not implemented
Key Insight: "Stop expecting memory to be automatic — it isn't. You have to configure it."

šŸ›”ļø Prevention Framework

4 tiers from mechanical (can't fail) to structural (requires development).

🟢 Tier 1: Mechanical Prevention (Can't fail if implemented)

PreventionPreventsStatus
Config validation scriptCrash loops (#6, #13, #14)šŸ“‹ Proposed
streamMode: "off"Vanishing messages (#15)āœ… Done
memory-lancedb disabledCrash from missing env varāœ… Done

šŸ”µ Tier 2: Automated Monitoring (Catches failures automatically)

PreventionPreventsStatus
Weekly skill validation cronSilent skill breakage (#2)šŸ“‹ Proposed
Email check in every heartbeatMissed inbound emails (#8)šŸ“‹ Proposed
Voice transcript watcherMissed post-call summaries (#4)šŸ“‹ Proposed
Memory freshness checkerStale info served (#11)šŸ“‹ Proposed

🟔 Tier 3: Behavioral Enforcement (Requires discipline, can fail)

PreventionPreventsStatus
WRITE-FIRST RULELost context (#1)āœ… Active
STANDING-ORDERS.md + morning cronIgnored self-audits (#3)šŸ“‹ Proposed
Pre-flight checklist in AGENTS.mdBuilding duplicates (#12)āœ… Active
Outbound comms ruleUnauthorized emails (#7)āœ… Active
Immediate execution biasDeferring work (#9)āœ… Active

🟣 Tier 4: Structural Improvements (Require development)

PreventionPreventsStatus
Dynamic voice bridge contextStale voice context (#5)šŸ“‹ Proposed
Direct Telegram notification from voice bridgeBroken post-call summary (#4)šŸ“‹ Proposed
Credential registryWrong API key paths (#18)šŸ“‹ Proposed
MEMORY.md auto-staleness detectionStale info (#11)šŸ“‹ Proposed

šŸ“Š Status Tracker

What's DONE vs PROPOSED vs UNFIXED across all changes.

āœ… DONE — Implemented Changes

DateChangeImpact
Feb 17QMD Backend Migration + Memory Flush TuningHigh
Feb 16Security Hardening — UFW, fail2ban, Postfix/CUPS disabled, SSH hardenedHigh
Feb 14Major Workspace Cleanup — 51.8KB → 14.6KB bootstrap contextHigh
Feb 14Projects Directory Reorganization — 35 → 13 activeMedium
Feb 14Git Submodule & Credentials CleanupMedium
Feb 14OpenClaw Rename RecoveryHigh
Feb 13Brain Surgery — AGENTS.md 546 → 104 lines (81% reduction)High
Feb 13Voice Bridge: Standalone Cerebras-Powered Phone SystemHigh
Feb 11Anti-Compaction Rules in AGENTS.mdHigh
Feb 11Voice Call Config — timeout 20s→45s, Chris voice confirmedMedium

šŸ“‹ PROPOSED — Awaiting Implementation

ChangeCategoryImpact
Skill-ify the System Prompt (token reduction)Skills / PromptHigh
Add Negative Routing to Skill DescriptionsSkillsMedium
Tune Compaction Threshold (120K → 80K)CompactionMedium
Credential Isolation Pattern (Sunzi future)SecurityHigh
BOOT.md Startup ChecklistArchitectureMedium
Crash Recovery — active-tasks.mdArchitectureMedium
Tune Concurrency SettingsConfigMedium
Context pruning via TTLConfigMedium
Enable Telegram streamingConfigLow
Config validation scriptArchitectureHigh
Weekly skill validation cronMonitoringHigh
Dynamic voice bridge contextArchitectureHigh

āŒ UNFIXED — Known Broken

ProblemSeverityBlocker
Post-call Telegram summaryHighStandalone bridge doesn't notify Telegram directly
Skill validation — no automated testingCriticalNo cron or CI built yet
Voice bridge in-conversation recallMediumRolling window limit in Cerebras LLM
Stale info in MEMORY.mdHighNo auto-staleness detection
memory-lancedb autoCaptureHigh@lancedb/lancedb npm dependency missing