AI API Providers Need Error Budgets
Anthropic has had 326+ outages since January 2025 — roughly one every 1.3 days. Google solved this tension decades ago with error budgets. It's time AI infrastructure providers adopted the same discipline.
Notes on engineering organizations, AI-native development, and building things that last.
Anthropic has had 326+ outages since January 2025 — roughly one every 1.3 days. Google solved this tension decades ago with error budgets. It's time AI infrastructure providers adopted the same discipline.
I used to joke that LLMs are a runtime for executing human-language instructions. Then I built a skill that analyzes changed files, groups them by feature, and commits each group separately — in zero lines of code. The joke became reality.
Dario Amodei says Anthropic engineers already stopped writing code. They edit AI output. His prediction: 6-12 months to full end-to-end. Here's why that's both true and deeply incomplete.
We confuse the goal with the route. We hold onto the plan harder than the purpose behind it. A story from the Exuma Sound, Bahamas — and what it taught me about planning projects, careers, and startups.
When a system becomes too tangled, it almost always means one thing: there was no clarity at the start. Complexity doesn't make a system mature — simplicity makes it resilient.
Everyone talks about the OODA loop as 'be faster.' But the hardest and most overlooked phase — Orient — is where most organizations fail. Two teams looking at the same data will reach different conclusions. Understanding why is the real competitive advantage.
Frameworks come and go. Systems thinking stays. Why the ability to see feedback loops, bottlenecks, and second-order effects is more critical now than ever — and how Amazon used it to generate $40-50B in annual revenue from a single investment.
A skydiving plane engine failure that wasn't supposed to happen, a crowd panic on July 4th, and a Shopify outage that coincided with our API integration. Three stories about why practicing for rare events is the most practical investment you can make.
Let's figure out what we have in common between IT projects and skydiving jumps. Lessons on environment awareness, planning, safety, priorities, and teamwork.
There is a huge gap between an application that is ready for first deployment and an application that is ready for operation in production mode. Here's what most teams miss.