posts
AI Isn't Something to Trust — It's Something to Design (Series Final)
Series Final. The four mechanisms covered across this series — knowledge graph, Auto Review, Self-Healing, Recurrence Prevention — plus the non-engineer-PR application that sits on top of them, all hang off a single conviction: AI isn't something to trust; it's something to design. The 'I don't trust AI to fill in the blanks for me' framing this lives inside isn't doubt about generation quality, but the clear-eyed acceptance that AI has no idea what context wasn't handed to it, and that 'ideal behavior with no spec given' is a fantasy. The starting point goes back to 2025, when I was trying to figure out how to make AI actually understand a large codebase — and ran into walls on both context window scaling (lost in the middle, attention dilution) and learning-based approaches (machine unlearning, destructive interference). GraphRAG + MCP became the way out: hand AI only the facts it needs, when it needs them, so it doesn't have to infer. From code-graph (which I burned two months on and threw away) to the current product-graph (cpg). This piece is the philosophy and the trial-and-error behind the whole series: harnesses confine where hallucinations are allowed to happen, design is translating principles into your own use cases, and Coverage 90% as a solo target breaks the implementation.
The Author Doesn't Have to Be an Engineer: How the Harness Holds Quality (Series Part 5)
Series Part 5. With the harness handling quality at the gate, the people closest to the requirements -- business-side managers, PMOs -- now open PRs to production directly, no engineer in between. Two recent examples (a deep root-cause fix and a +1,742 line feature build), the boundary of what they can and can't take on (anything on top of an existing stack vs. standing up new infrastructure), why it holds (the four mechanisms from Parts 1-4), and how the pattern carries over to consumer-facing services.
Fixed Before Anyone Notices, Stronger After Every Fix: Self-Healing + Recurrence Prevention (Series Part 4)
Series Part 4. Production alerts trigger AI investigation, fix PR, auto-review, auto-merge, auto-redeploy. The same fix PR is required to add a new Guide -- a lint rule, CI guard, type constraint, or guideline entry -- so the same anti-pattern gets auto-rejected from then on. 115 Self-Healing PRs merged in the past 30 days, and the quality gates compound over time.
Human-on-the-Loop: AI Reviewing AI PRs at cortex -- 769 PRs/month while raising the quality bar (Series Part 3)
Series Part 3. The common critiques of AI-assisted development -- 'review becomes the new bottleneck' and 'AI code drops the quality bar' -- largely don't apply when AI also does the reviewing. Full walkthrough of our pipeline: webhook -> cpg context -> AI review with [Graph]/[Doc]/[Impact] tags -> auto-fix by a separate AI -> re-review -> auto-merge -> parallel deploy. 769 PRs merged in 30 days, human review involvement per PR is near-zero.
The Heart of the AI Harness: A Knowledge Graph of the AI, by the AI, for the AI (Series Part 2)
Series Part 2: how we built cortex-product-graph (cpg) — a unified knowledge graph of code, docs, DB schemas, and infrastructure for the cortex AI platform. Build pipeline with JSDoc/Pulumi/docs as SSoT, plus the Runbook tool-design pattern that guides AI through the graph.
Building a Real AI Harness: Auto-Reviewed PRs, Self-Healing Ops, and Non-Engineer Contributors (Series Intro)
Series intro to cortex, airCloset's internal AI platform that auto-reviews PRs, self-heals ops, and lets non-engineers ship apps. Why harness engineering matters now.
Graph RAG Isn't a One-Shot Anymore — The Case for Agentic Graph RAG MCPs
Vector RAG and one-shot Graph RAG both flatten the search step. Agentic Graph RAG hands the graph to an LLM as an MCP and lets it traverse relationships iteratively.
Cutting Self-Built MCP Server Token Usage by 90% — The Parking Pattern
MCP responses fill the context window fast. The parking pattern stores heavy payloads externally and returns only a key — about 90% token savings in production.
Bridging 'I Want to Build' and 'I Want to Publish Safely' for Non-Engineers — Sandbox MCP
Non-engineers can build AI apps, but publishing safely is still gated by engineers. Sandbox MCP gives them a one-command path to deploy Web/API/DB/Cron with guardrails.
Still Measuring Initiative Impact Manually? How We Used Graph RAG + MCP to Make It Explorable
Measuring 'did that initiative actually work?' usually means manual SQL spelunking. We modeled initiatives × KPIs as a graph and let an LLM traverse it via MCP.
How We Built an Automated Meeting Intelligence System with Google Meet, Slack, and RAG
AI summaries aren't enough — context dies when a meeting ends. We pipe Google Meet recordings to Slack, transcribe everything, and make history queryable in natural language.
We Built 17 MCP Servers to Let AI Run Our Internal Operations
Overview of 17 MCP servers we built in three months at airCloset, covering DBs, infra, docs, project management, observability, CI/CD, and even non-engineer code edits.
Democratizing Internal Data — Building an MCP Server That Lets You Search 991 Tables in Natural Language
Internal data lives across 15 schemas, 991 tables, 11 SQL DBs and 6 MongoDBs. DB Graph MCP lets Claude search and query the whole thing in natural language.