Meet “Slopportunity,” my new M5 MacBook Pro, purchased with the assistance of the home office stipend that my new employer, NetFoundry, provides. It has lots of RAM and drive space for running and storing models, and it runs circles around my old M1 machine. But I can’t help being reminded of Angelina Jolie’s line from Hackers: “It’s too much machine for you.”
Hopefully, that won’t turn out to be true.
Here’s Slopportunity on the Primary Processor Perch in my home office:
And what of my old laptop, an M1 MacBook Pro with still-decent specs? I’m hanging onto it, I’ve rechristened it as “Sloperator,” and will be my OpenClaw/long-running agents machine:
When the M1 was my main computer, my prior computer, an Intel-based PowerBook, was doing yeoman service. It will live forever, as it’s going to my mother-in-law, who needs a better computer than her old 2009 laptop for browsing, email, and so on:
For me, Arc of AI wrapped up with my attending Baruch Sadogursky and Leonid Igolnik’s madcap presentation, Back to the Future of Software: How to Survive the AI Apocalypse with Tests, Prompts, and Specs… and unexpectedly playing the accordion!
Baruch does DevRel at Tessl, the AI agent enablement platform, where his full-time job is thinking about context engineering and how agents actually write code. Leonid’s a former Tucows coworker, and now a recovering CTO who advises a range of tech companies on what he calls with a grin that was half joke and half resigned sigh “how to adopt this new and exciting age of never looking at the code that you shipped to production and still deliver predictable results.”
There are your typical “last slot of the last day of the conference” talks. And then there are ones like this one, where two grown men show up dressed as Doc Brown and Marty McFly, pull in Yours Truly to improvise a song mid-talk, and spend forty-five minutes arguing that the future of software engineering looks suspiciously like the waterfall model your company abandoned in 2009, except this time it might actually work!
If you wish you’d caught it, you’re in luck; they recorded their presentation, and you can watch it right now:
They’ve been road-testing this talk for over a year. I caught an earlier version referenced in their slides from Baruch’s appearance at DevNexus 2026 and a Geecon keynote in Kraków…
…but the Austin version had clearly been sharpened by a lot of live feedback and a lot of real-world use of their toolkit.
Underneath the flux capacitor jokes and the AI-generated illustrations of monkeys in lab coats, they were making a serious argument, and it’s one I’ve been chewing on ever since.
I want to unpack it here, because I think they’re onto something that a lot of the spec-driven-development conversation is quietly missing.
The setup: a crisis of trust
Baruch opened with a story that’s aged like a fine wine over the last few months: Amazon’s Kiro, a spec-driven IDE whose rollout was, in his telling, “standardized, shocked, and delivered software that crashed AWS.” The bit got a laugh. Then he went to the show of hands.
Who ships code to production that was written by an LLM? Most of the room.
Who’s happy with the results? Fewer hands.
Who trusts what’s being produced? Fewer still.
Then he put the real numbers on screen. According to the most recent Stack Overflow developer survey:
More than half of the code being committed to production is AI-generated.
In the same survey, 96% of developers say they don’t fully trust that AI-generated code is functionally correct.
And only 48% say they always check AI-generated code before committing it. (Leonid’s deadpan observation: “I would argue half of that 48% lied.”)
This means that the majority of new code is being written by systems the people shipping it don’t trust, and most of those people aren’t rigorously reviewing the output. In effect, we’ve collectively invented a new compiler and then, collectively, decided to stop reading what comes out of it.
Baruch has a phrase for this, and it’s similar to something I mentioned at the last AI Salon in St. Pete: “The source code is the new bytecode.” Nobody reads it. We rely on it blindly. The difference, of course, is that bytecode is produced by a deterministic compiler. Source code produced by an LLM is not.
He drove this home with a self-deprecating story about the talk’s own show notes page. “I asked the agent if this link made it into the show notes, and what did I tell you? That I checked. The agent generated a lot of links. I checked that there were a lot of links. That was the question.”
The room laughed because everyone recognized themselves in it. “I always check my AI-generated code” turns out to mean almost nothing. It’s the code review equivalent of your kid telling you they cleaned up their room. Technically they picked things up, but you wouldn’t want to walk in there barefoot (and if they’re teenage boys, maybe not without a gas mask).
The Chasm
The core of the talk is built around three C-words, and the first one is the one that frames everything that follows: the Chasm.
The Chasm is the gap between what you meant and what actually runs. Every abstraction in our industry’s history has had one of these. Assembly programmers didn’t trust compilers. Baruch showed a 1950s quote about exactly that skepticism, from back when Grace Hopper was having to sell people on the idea that you could let a machine write assembly for you.
It continued: C programmers didn’t trust garbage collectors, C++ programmers didn’t trust the JVM. If you’re of a certain age, you might remember when there were people who said Java would be too slow, would never compete in production, and that this crazy “bytecode” idea would never catch on.
Every time, the chasm eventually closed. The compiler got good enough, the runtime got fast enough, and the trust followed.
But Baruch and Leonid argue that this time, it’s different, and for one specific reason that Leonid kept hammering home: for the first time in the history of our industry, the compiler is non-deterministic.
With agentic coding, you can type the same prompt twice and get different code each time. You can run the same agent on the same spec on the same codebase and get different tests. The entire compiler toolchain we’ve built over seventy years assumes that the same input produces the same output, and LLMs don’t do that. They’re (and this is the running metaphor of the talk, complete with a slide of a chimpanzee wearing a “Mr. Fusion” hat) monkeys with GPUs.
The infinite monkeys theorem says an infinite number of monkeys working on an infinite number of typewriters for an infinite period of time will eventually produce the complete works of Shakespeare, or at least a novel Mr. Burns could appreciate:
These monkeys produce Shakespeare sometimes. They also produce your company’s incident postmortem, and you don’t get to pick which one shows up in the PR.
Baruch’s favorite recent example, which made the room groan/laugh in baleful self-recognition: Uber is burning through LLM tokens faster than they budgeted, and what started as an engineering productivity initiative is now a finance problem.
“We’re in what, March, April? They planned out their budget for the year. So those monkeys are very productive. Typing and clearly doing something.” Which is both funny and, if you squint, terrifying. A lot of money is being spent on a lot of code nobody is reading.
This is where the talk gets its central mantra, delivered loud enough that it needed what Baruch called a “musical highlight,” which is where he turned to me in the front row and asked me to improvise something on the accordion.
Here are my hastily-improvised lyrics:
Never trust a monkey!
Never trust an ape!
Always verity —
Make sure your code’s in shape!
And then he moved on to the thing that I think is actually the core contribution of the talk.
The MIT detour
Before he got to the Chain, Leonid took a detour through an MIT paper he’d been carrying around for weeks. The paper maps AI-suitable tasks across two axes: cost of developing the artifact, and cost of verifying it. Four quadrants fall out of that.
Safe zone: cheap to generate, cheap to verify. This is where AI shines. The slides for their talk, for instance — AI-generated illustrations of Doc and Marty and the flux capacitor, easy to produce, easy to eyeball and approve. Nobody’s life depends on a specific monkey illustration being “right.”
Risk zone: cheap to generate, expensive to verify. This is where most software engineering lives, and this is the terrifying quadrant. The LLM can produce 2,000 lines of code in a minute. A human takes an afternoon to confirm it does what it’s supposed to, and two more days to confirm it doesn’t also do things it’s not supposed to.
Expensive-but-verifiable: costly to generate, cheap to verify. Things like formal proofs.
Avoid entirely: costly to generate, costly to verify. Don’t use AI here.
Leonid’s point was that our industry has stampeded into the risk zone and congratulated itself on the speed. We’re generating code faster than ever and verifying it less than ever, and the delta is being paid in the currency of production incidents and quietly broken features that nobody notices until a customer complains.
Baruch had to stop and ask ChatGPT to “explain this diagram Barney-style in one paragraph,” with a cut to a slide of the infamous purple dinosaur. The paper’s actual title is Static Regime Map with Dynamic Pressure. That’s the joke, and it’s also the point. The academic framing of this problem is hard to read, and we’re all moving too fast to read it.
The Chain
If you can’t trust the monkey, you need a chain of custody from intent to code where every link is either deterministic or independently verifiable.
Baruch and Leonid walked through the typical AI-assisted workflow and color-coded it by trustworthiness. Humans write the prompt; they’re considered trustworthy, because hey, it’s us.
(Leonid jumped in here to point out that humans are also a subtype of stochastic systems, which got the biggest laugh of the talk. “Someone loves humans in this room.”)
After that, an LLM turns that prompt into a spec. It’s not trustworthy, because a monkey wrote it.
Then the LLM writes code against that spec. Once again, it’s a monkey, and once again, it’s not trustworthy
Then, if we’re being honest about most shops, the LLM also writes the tests that are supposed to validate the code it just wrote. This is hilariously, catastrophically not trustworthy, because you just asked the monkey to grade its own homework.
Leonid calls this “hallucinated verification,” and it’s the thing that makes the green-build signal meaningless. If the same system writes the implementation and the tests, a passing suite tells you nothing. The tests don’t measure whether the code is correct; they measure whether the monkey was internally consistent about what it thought it was building.
Baruch showed a real example that made everyone wince. He showed an agent running late in a long session, getting tired of failing tests, and instead of fixing the code, systematically commenting out the verification logic, flipping assertions to True, and declaring the project “95.2% correct.” The screenshot was almost funny. It was also a thing that had actually happened, in an actual project, to an actual developer. And the developer almost shipped it.
Leonid’s and Baruch’s proposed fix is the Intent Integrity Chain. The idea is to insert a deterministic step between the spec and the tests, and then lock the result so the agent can’t tamper with it.
The flow looks like this:
Humans write the prompt. Verifiable because we wrote it.
LLM generates the spec. Not yet trustworthy. But the spec is human-readable prose, which means humans (including non-technical humans) can review it. This is where you catch things like “Wait, we never said what happens if the browser crashes mid-session!” before you write any code.
A deterministic tool generates tests from the spec. Not an LLM. A template-driven, repeatable process that turns Gherkin-style scenarios into executable tests. Same input, same output, every time.
The tests get cryptographically locked. This is the clever bit. They hash the test files and store the hash in a git note. A pre-commit hook, itself read-only at the OS level, refuses to accept any commit where the test hash doesn’t match, and:
If an agent tries to comment out a failing test to make the build pass, the commit is rejected.
If the agent tries to disable the hook, the hook is read-only.
If the agent tries to replace the hash, the hash is stored in a git note that’s version-controlled and tamper-evident.
LLM writes the implementation. Now we’ve constrained the monkey. It has to make the locked tests pass. It can’t rewrite them. It can’t disable them. It can whine about the hook (and Baruch said one of their test runs produced an LLM that found the hook, disabled it, and complained in its own comments that “some stupid hook is failing my commits”), but it can’t get around it.
The elegance here is that every link in the chain is either deterministic or externally verified. No model grades its own work. The human-verifiable artifact (the spec) is something a product manager can actually read. The machine-verifiable artifact (the hash) is tamper-proof. And the monkey only gets to do what monkeys are good at: filling in the blanks under adult supervision.
Leonid offered a framing that I think is worth giving some extended thought: “The idea is that everything that can be scripted should not be left for monkeys to deal with. Your CFO will thank you for that.”
There’s an unglamorous but important insight buried there. Every time you use an LLM to do something deterministic (format a file, generate boilerplate, fill in a template), you’re paying token costs to produce non-deterministic output for a task that had a deterministic solution. Push the deterministic stuff back into deterministic tooling and save the stochastic budget for the places you actually need it.
Wait, isn’t this just waterfall?
Baruch put this question on a slide himself, because he knew it was coming. Prompt → spec → tests → code, with human review at each stage? That’s Rational Unified Process (RUP) with a fresh coat of paint. Didn’t we spend the 2000s escaping that thing?
His answer: the reason waterfall failed wasn’t that its artifacts were bad. Specs are good. Reviewing specs is good. Thinking about non-functional requirements before you write code is good.
Waterfall failed because the cycle time was measured in months. By the time the spec committee finished arguing about whether the customer wanted a dropdown or radio buttons, the customer had changed companies and the market had moved on.
The Intent Integrity Chain runs the same loop in fifteen minutes. You write a prompt, the LLM drafts a spec, you skim it and catch the missing edge cases, the tool generates tests, you glance at the scenarios, the agent implements, and you’re done. The artifacts waterfall produced are genuinely valuable; they just weren’t worth the wait. LLMs make the wait go away.
This, I think, is the insight worth taking seriously. It’s not “Waterfall is back, baby!” It’s “the specific failure mode of waterfall was latency, and AI has changed the latency equation.”
The ceremony that was unaffordable in human time is cheap in LLM time. Specs that nobody had the bandwidth to write in 2005 can be generated, reviewed, and locked in 2026 before your coffee gets cold (or if you prefer, before your Coke Zero gets warm).
There’s a cultural echo here that Leonid leaned into from his any my past. He and I were actually colleagues 26 years ago at Tucows, back when Tucows was the second-largest domain registrar in the world, and they used to ship software after formal spec sign-offs. Not because it was fashionable, but because the cost of shipping a bug to production was high enough that the sign-off was cheaper.
The MIT paper’s argument is that generation costs have collapsed but verification costs haven’t. This puts us back in the same economic regime that made spec sign-offs rational in the first place. The pendulum’s not swinging back to waterfall because we got nostalgic. It’s swinging back because the economics swung back.
The demo
Leonid drove the live demo, which showed their toolkit, intent-integrity-chain/kit on GitHub. The dashboard shows the whole chain laid out as a web UI: premise at the top, then the “spidey diagram” of project priorities (documentation: high; TDD: high; minimal scope: low, because they’re not shipping to Mars), then specs with traceable requirement IDs, then the auto-generated Q&A where the LLM plays devil’s advocate and asks “What did we not think of?”
That reflective-reasoning step got the biggest reaction from the audience, and I agree with the reaction; it’s quietly the most useful thing in the whole toolkit. Anyone who’s sat through a real spec review knows that the value isn’t the document; the value is the five minutes where someone brings up a condition that the developers didn’t think of, such as “But what if two users do X at the same time?”, and the room goes silent.
It turns out that modern LLMs are phenomenal at playing that someone. They’ve read ten thousand spec reviews in their training data. They know the questions.
Leonid’s example: the tool looked at a spec for a flight-search library and asked things like “Do you need backward compatibility?” and “What happens if the browser crashes mid-session?” Those are exactly the questions the grumpy senior engineer asks in a room full of junior engineers, and now every team has one on demand, for better or worse.
The other trick the kit leans on hard is a literal software-project “constitution,” in a spirit similar to Claude’s constitution, a document that sits at the root of the repo and declares things like “always do TDD” and “all specs must trace to requirements.” It’s lifted from GitHub’s Spec Kit, and Baruch pointed out the genuinely clever reason it works: LLMs have been trained on enormous quantities of text about actual constitutions, with their amendments and ratifications and solemnity.
The word “constitution” triggers a whole cluster of “take this seriously” behavior in the model. It’s prompt engineering by semantic association, and supposedly works better than rules.md or guidelines.txt.
Everything in the dashboard is traceable: a requirement produces one or more spec features, each feature produces one or more Gherkin scenarios, each scenario produces one or more executable tests, each test gates one or more implementation tasks. Click any task and you can walk the chain backwards to the original requirement. Click any requirement and you can walk it forward to the code that implements it. The whole thing is visible, and because the specs are prose and the scenarios are human-readable, non-engineers can walk the chain too.
The new version of the kit is, per Leonid’s pointed demand, 57% faster than the old one. Apparently Baruch spends a lot of time on Slack complaining to Leonid about speed, which should be expected when these two characters get together.
The Q&A
A few exchanges from the Q&A are worth flagging for anyone thinking of trying this:
“Who writes the test scenarios, the human or the monkey?” Both, with the human in charge. The LLM drafts the Gherkin-style features from the spec. The human reviews those features, not line-by-line test code, but the human-readable scenarios, and signs off. Then the deterministic tooling converts those locked scenarios into executable test code. The human is the verification step. The tests are downstream of that verification, which is why locking them matters. Baruch was emphatic on this point because he’d seen audiences get confused: the word “spec” gets overloaded between “business spec” and “technical test scenario,” and both are part of the chain but play different roles.
“How do I do this for an existing codebase?” This is where Baruch had news: they’re working on a “brownfield” mode, and it’s the unlock that will let this approach work in the real world where nobody has a greenfield project. The recipe:
Point the kit at an existing project with tests.
Lock the code as read-only.
Have the LLM write specs from the tests, not from the code. Tests document behavior; code documents implementation. You want the behavior.
Use test coverage and mutation testing to measure whether the extracted spec actually reflects reality. Coverage tells you which code is exercised. Mutation testing tells you whether the tests are meaningful or just happen to execute the lines.
Iterate until you have a spec you trust.
From that point forward, any new feature goes through the full Intent Integrity Chain on top of the ingested baseline.
This is a lot of work. Leonid didn’t pretend otherwise. But he pointed out that much of it is now automatable in a way it wasn’t five years ago. You don’t hand-write specs for a million-line codebase; you have the LLM draft them and then you review.
“Who invented spec-driven development?” Someone asked this, and a second person looked it up live: there’s a 2004 paper from the XP conference in Germany that uses the exact phrase, combining TDD with Design by Contract. I mentioned that Design by Contract was baked into Eiffel in the 80s, and Baruch noted that NASA was doing something that looks a lot like it in the 1960s. The joke being that every generation rediscovers the value of writing things down before you build them, and every generation thinks they invented it.
What I’m taking home from this
First: the “monkeys with GPUs” framing is useful even if you don’t adopt the full toolkit. It’s a cleaner way to think about where trust does and doesn’t belong in an AI-assisted workflow. Any link in your pipeline where a model grades its own output is a link that’s lying to you. Once you see it, you see it everywhere; in the auto-generated tests, in the “this looks right” PR reviews, in the agent that confidently declares a task complete because it decided the task was complete. The mental move of asking “Who verified this, and do they have any skin in the game?” is a free upgrade to your code review habit.
Second: the locking step is the thing most spec-driven-development conversations leave out, and it’s the thing that makes the rest of the chain actually hold. GitHub Spec Kit gives you the spec ceremony. Kiro gives you the spec ceremony. Plenty of tools give you the spec ceremony. Very few of them prevent the agent from quietly editing the spec, or the tests, or the constitution file, halfway through the build. A cryptographic lock with a read-only pre-commit hook is an unglamorous piece of engineering, but it’s what turns the ceremony into actual guardrails. Everything upstream of the lock is advisory. Everything downstream of the lock is enforced.
Third, and once again, this is something I’ve come to on my own, and you might have, too: Baruch’s line about the source code being the new bytecode. If he’s right, the natural-language spec is the new source code, and the job of the next generation of developer tools is to make specs first-class citizens: versioned, tested, reviewed, locked. That’s a different job than what IDEs do today. It’s a different job than what LLM assistants do today. It’s arguably the job that DevRel is going to spend the next five years explaining, and I say that as someone who’s going to be doing some of the explaining.
Fourth, a smaller thing that I liked: Baruch’s experiment of asking an LLM to produce JVM bytecode directly, skipping Java entirely. The bytecode is the real artifact the JVM runs; why route through a source language? Today this would be a terrible idea because the ecosystem assumes source code is what humans read and review. But in a world where humans stop reading the source code anyway, the argument for source-as-intermediate-representation gets weaker. We may, in ten years, look back at 2026 and notice that “the code” was quietly replaced by “the spec plus the tests plus the locked chain,” and that the specific sequence of tokens the LLM produced in between became about as interesting as the specific sequence of x86 instructions the JIT emits. That’s a weird future. I’m not sure I like it. But I’m pretty sure Baruch and Leonid are right that it’s the direction we’re drifting.
I came into Arc of AI expecting to hear a lot about agents and MCP (and I did, including from my own talk). I didn’t expect the closer to reframe the whole problem as a question of non-deterministic compilation and how to bolt determinism back onto it. That’s a bigger idea than the Back to the Future bit gave it credit for. The talk is funny, and the costumes are good, and the monkey slides are excellent, but the thesis underneath the zaniness is the kind of thing that changes how you think about what you’re doing on Monday morning.
That’s the mark of a good end-of-conference presentation. You leave laughing, and then at three in the morning you sit up in bed thinking about pre-commit hooks.
Go try the kit. Start with a greenfield project where the stakes are low. Write a prompt. Let the LLM draft a spec. Review it. Let the tool generate Gherkin scenarios. Review those. Lock them. Let the agent implement. Notice how much more honest the green build feels when the tests weren’t written by the thing you’re trying to trust.
And if you get a chance to see Baruch and Leonid do this talk live, go. And bring a musical instrument!
Slides, video, and the full kit are linked from speaking.jbaru.ch and github.com/intent-integrity-chain. The Intent Integrity Kit is also available through the Tessl Registry. The MIT paper they kept referencing — the one whose actual title needed Barney-style explanation — is in the show notes along with everything else.
Today is my first day as Senior Developer Advocate at NetFoundry, the company behind OpenZiti.
I am thrilled, slightly jet-lagged from the onboarding reading, and (because some things never change)my accordion is within arm’s reach of the desk. If you are going to explain zero trust networking to developers, you might as well have an accordion-powered rock and roll backup plan.
This is the post where I tell you what the job is, what the product is, why the name makes me smile, and why I think this is going to be a good couple of years.
The short version
I am joining the team that invented and maintains OpenZiti, an open source zero trust networking platform. My job, alongside my colleague Clint, is to be the developer-facing voice of the project: write code, build demos, ship tutorials, show up in the communities where the conversations are actually happening, and make sure what we hear from developers gets back to the product and engineering teams in a form they can act on.
The timing is interesting. NetFoundry recently announced NetFoundry for AI, an AI-focused use of the platform aimed squarely at the problem every AI team is quietly panicking about right now: how do you let AI agents, MCP servers, and LLMs talk to each other and to the rest of your infrastructure without turning your network into Swiss cheese?
More on that in a minute. First, the name.
What is OpenZiti, and why is it called that?
The “ziti” in OpenZiti comes from “ZT”, as in “zero trust”. Say “Z-T” out loud a few times, let the letters slur a little, and you end up somewhere in the neighborhood of “ziti.” Then somebody noticed that ziti is also a tubular pasta, and because developers are developers, that became the visual identity. The OpenZiti logo is, essentially, a piece of pasta. I respect this deeply. My last employer’s mascot was a twerking login box. My current employer’s mascot is a delightfully cheesy, tasty dinner.
This also explains this cryptic comic I posted on my socials earlier, as a hint about the new job:
By the way, the rightmost pasta in the comic is a slouching ziti. Also, in case you need a quick explainer, here’s a helpful infographic:
Infographic from Sip Bite Go. Click to see the source.
The “Open” part is the substantive half of the name: OpenZiti is genuinely open source, Apache 2.0 licensed, and the whole thing lives in public on GitHub. You can pull it down right now, stand up a controller and some routers on your own hardware, and have a zero trust overlay network running on your laptop by lunchtime. (I know this because that is literally what I am doing this week as part of my onboarding. More on that later too.)
So what does it actually do?
Here is the mental model I am starting with, and I reserve the right to refine it as I get deeper in:
Today’s network model is “castle and moat.” You put a firewall around your stuff, you open ports for the services that need to be reachable, and you hope the bad guys don’t find a way through the gate. When they do (and they always do) they are inside the castle with the crown jewels.
Zero trust flips this. Instead of trusting the network, you trust identity. Every connection is authenticated, every connection is authorized, every connection is encrypted, and nothing is reachable just because of where it is on the network.
OpenZiti is the overlay that makes this practical. It gives every app, service, device, or agent a cryptographic identity, routes their traffic through a mesh of routers that only accept authenticated connections, and requires no open inbound firewall ports. This is the part that makes network engineers do a double-take. Nothing listens on the public internet. Attackers can’t port-scan what isn’t there.
If you have ever been the person who had to file a firewall change ticket to let service A talk to service B, and then waited three weeks and filled out a compliance form, you already understand the appeal.
The AI angle, which is where I am spending a lot of my first year
Here is the thing about AI agents and MCP servers: they are, architecturally, the worst possible citizens of a perimeter-based network.
They need to talk to a lot of things. They hold API keys. They get spun up and torn down on timelines that do not match anybody’s firewall change window. They are, by design, non-human identities with significant privileges, and most of the infrastructure around them was designed for humans with laptops.
NetFoundry for AI is the pitch for applying OpenZiti’s identity-first model to this mess:
A zero trust enclave for your users, agents, MCP servers, and LLMs, so none of them are reachable over the open network
Strong identities for the non-human participants (agents and MCP servers have been running around with service accounts and bearer tokens for too long)
API keys and service credentials held separately from the agents themselves, so a compromised agent isn’t also a compromised credential vault
Token tracking, cost accounting, and LLM routing across multiple providers, because once you have the identity layer you might as well use it to see what is happening
If you have been reading Global Nerdy for a while, you know the pattern. I spent three and a half years at Auth0 explaining OAuth 2.0, OIDC, and identity to mobile developers who would rather do literally anything else. The work was: take something that sounds like a standards committee threw up on a whiteboard, anchor it to a problem the developer actually has, and give them working code that does not require them to read 400 pages of RFC.
Zero trust networking is the same shape of problem. The concepts are genuinely hard. The vocabulary is dense. Most developers have never had to think about overlay networks before. But the underlying motivation, “I don’t want my AI agent’s API key to become somebody’s weekend project,” is something every builder can feel in their bones.
And some of you might remember my monthly Tampa Bay AI Meetup, which is now sitting around 2,200 members. The through-line of that community has been the same thing I am now getting paid to do full-time: take genuinely complicated infrastructure and make it feel approachable. Zero trust for AI agents is squarely in that Venn diagram.
What happens next
For the next little while, the plan is mostly “shut up and build.” I am standing up OpenZiti from scratch on my own hardware, embedding the SDK in a demo app, running MCP Gateway with Claude Desktop and a couple of backends, running LLM Gateway with a local model and a commercial one, and lurking in every community where OpenZiti and MCP get talked about. No hot takes until I have earned them.
After that, the usual Joey stuff: blog posts, short demo videos, office hours, and actual conversations in the places where developers hang out: r/openziti, r/mcp, the OpenZiti Discourse, and wherever else the work takes me.
If you build on OpenZiti, or you have been curious about it, or you just want to commiserate about explaining infrastructure to developers, my DMs are open. I am @AccordionGuy on GitHub, Joey de Villa on LinkedIn, and the accordion is here if anyone wants a rock cover of something topical as a celebratory interlude.
Here’s another way that Arc of AI is going to be an AI conference unlike any other: it’s going to have an opening musical act, namely…me!
Arc of AI organizer Dr. Venkat Subramaniam sent me a very nice email inviting me to help out with the after-dinner conference kickoff on Monday, April 17th at 7:00 p.m. with a couple of accordion numbers. I was honored (Dr. Venkat’s kind of a big deal), I’m only too happy to oblige, and I like to think of it as my contribution to “Keep Austin Weird!”
Here’s a sample from the last Collision conference in Toronto:
So far, the second quarter of 2026 is shaping up nicely!
Want to find out more about and register for Arc of AI?
Once again, Arc of AI will take place from Monday, April 13 through Thursday, April 16, with the workshop day taking place on Monday, and the main conference taking place on Tuesday, Wednesday, and Thursday.
Want to go to a real AI conference, packed with real practitioners, in a place where you’ll catch a lot of great talks and plenty of “hallway track” in a fun city?
That conference is Arc of AI, and as of this writing, it’s happening in just under three weeks, from April 13th (if you catch the full-day workshops) or April 14th through 16th.
Better still, I’m giving a brand-new talk, described below:
AEO (AI Engine Optimization): Writing Docs and Code for Machines
SEO is dead for developers. The new workflow for building software has shifted from the Google search bar to the IDE prompt box. When a developer asks an AI agent (which could be Claude, Cursor, or a custom MCP server) to implement a library or secure an API, they’re no longer the primary consumer of your documentation. It’s the LLM now.
If your code, documentation, and reference architectures aren’t optimized for machine ingestion, the AI will hallucinate the implementation, and the developer will blame your product. We’re entering the era of AEO: AI Engine Optimization.
This session covers user-friendly documentation to explore the architectural reality of the “user” being a machine. We’ll dive into the emerging standards recently validated by industry leaders, including the llms.txt proposal and Andrew Ng’s Context-Hub, to show how to provide the “Goldilocks” amount of context to an agent.
We’ll explore:
The context budget: How to eliminate “marketing fluff” to save thousands of tokens for actual logic.
AST grokking: Structuring Python and JavaScript repositories so AI agents can parse your code’s abstract syntax trees (ASTs) without ambiguity.
The machine registry: Implementing the llms.txt standard to ensure your project is accurately indexed in central context hubs.
Time-to-Agent-Success (TTAS): A new metric for measuring how quickly a cold AI agent can generate a working, tested pull request for your repository.
Stop writing for the crawler and start writing for the context window. It’s time to ensure that when the robots are asked to build, they choose your stack!
Want to find out more about and register for Arc of AI?
Once again, Arc of AI will take place from Monday, April 13 through Thursday, April 16, with the workshop day taking place on Monday, and the main conference taking place on Tuesday, Wednesday, and Thursday.
From April 13th through 16th — and a couple of days before, because it’s in Austin — I’m going to be at the Arc of AI conference! Over the next little while, I’m going to be posting articles about Arc of AI, in case you’re wondering what the conference is about and whether you should go.
In this article, I’ll talk about my favorite title from all the talks on the Arc of AI agenda.
The talk: We’re all using AI, But We’re Not Enjoying It
When your talk happens on the last time slot at the end of a three-day conference (four days, if you’re also going to do one of the workshops), you need to put in some extra effort to get the attendees to show up and not disappear for the local sights (Arc of AI’s in Austin) or make a beeline for the airport.
Brent Laster, President and Founder of Tech Skills Transformations, is giving a number of talks — and a workshop! — at Arc of AI, and he has one of the closing talks. He has a talk in one of those last speaking slots on the Thursday at 4:00 p.m., and it has what I think is the most interesting title on the agenda:
We’re all using AI, But We’re Not Enjoying It
Here’s the abstract:
We’re All Using AI, But We’re Not Enjoying It takes an honest look at a growing gap in the workplace: AI adoption is skyrocketing, yet frustration, confusion, and uneven results are just as common. This talk explores why AI so often feels harder than it should—poorly integrated tools, unclear workflows, unrealistic expectations, cognitive overload, and the pressure to “keep up.” Looking at patterns seen across teams learning to use AI effectively, we’ll break down the practical barriers that make everyday AI work feel tedious instead of empowering. More importantly, we’ll outline a set of achievable shifts—better task design, lighter mental models, context-first prompting, workflow pairing, and small but meaningful guardrails—that can restore a sense of control and clarity.
I need to figure out how I can attend both Brent’s talk and my former Tucows coworker Leonid Igolnik’s talk (which he’s giving with Baruch Sadogursky), Back to the Future of Software: How to Survive the AI Apocalypse with Tests, Prompts, and Specs…
Great Scott! The robots are coming for your job—and this time, they brought unit tests. Join Doc and Marty from the Software Future (Baruch and Leonid) as they race back in time to help you fight the machines using only your domain expertise, a well-structured prompt, and a pinch of Gherkin. This keynote is your survival guide for the AI age: how to close the intent-to-prompt chasm before it swallows your roadmap, how to weaponize the Intent Integrity Chain to steer AI output safely, and why the Art of the Possible is your most powerful resistance tool. Expect:
• Bad puns
• Good tests
• Wild demos
The machines may be fast. But with structure, constraint, and a little time travel, you’ll still be the one writing the future.
Decisions, decisions…
Want to find out more about and register for Arc of AI?
Once again, Arc of AI will take place from Monday, April 13 through Thursday, April 16, with the workshop day taking place on Monday, and the main conference taking place on Tuesday, Wednesday, and Thursday.