Notes on building with AI
Short essays on the choices behind systems that stay grounded, auditable, and under your control.
We write about the engineering decisions that actually decide whether an AI product ships or stalls: grounding, agents, human-in-the-loop, and the architecture beneath all of it. These aren't trend pieces. They're the notes a senior engineer would hand a founder before a project starts -- what to worry about, what's hype, and where the hard parts really are. The goal is to be useful before you ever email us.
Grounding and retrieval
Most of what people call an AI problem is a data problem wearing a costume. Grounding -- giving a model the right facts at the right moment -- is where RAG systems live or die, and it's usually under-engineered. We write about retrieval boundaries, chunking that respects meaning instead of character counts, evaluation that catches silent regressions, and the difference between a model that sounds confident and one that's actually correct. These pieces are practical because the failure modes are specific: stale context, retrieval that returns plausible-but-wrong passages, no way to tell when the system degraded. Naming those failure modes early is cheaper than discovering them in production.
Agents and human-in-the-loop
Agents are powerful and easy to deploy badly. We write about where autonomy earns its keep and where it quietly creates risk you can't see until it's expensive. A lot of our writing is about the human-in-the-loop seam: which decisions a system should make alone, which need a person, and how to design the handoff so the human has enough context to be useful rather than a rubber stamp. We're skeptical of fully autonomous claims and specific about the guardrails -- approval gates, reversibility, audit trails -- that make agentic systems safe to ship. The interesting design question is never 'can it act,' it's 'what happens when it's wrong.'
Architecture as the through-line
Underneath grounding and agents sits architecture, and that's the thread tying our writing together. We come back to the same conviction: durable AI products are won on data models, contracts, and observability, not on prompt cleverness. So we write about boring, load-bearing things -- schema design, idempotency, how to make a system observable enough to debug, how to keep an LLM feature testable. These topics don't trend, but they're what separates a product that holds up from a demo that doesn't. If you read our insights and come away with sharper questions for your own team, the writing did its job.