Categories
Conferences Security Tampa Bay

Notes from BSides Tampa 13: “Dealing with Shadows” or “A day in the life of a threat actor negotiator”

If you’ve been anywhere near a screen this month, you saw the Canvas breach unfold in real time, where the ransomeware group known as ShinyHunters dropped a “rooting your systems since ’19 ;)” page onto the dashboards of nearly 9,000 schools during finals week. Instructure papered it over with a “scheduled maintenance” message that even the most gullible saw through. A few days later, they ended up paying the ransom in exchange for “shred logs” and a pinky-promise that no customers would be extorted further.

So when I sat down in a packed room at BSides Tampa 13 this past Saturday for a talk titled Dealing with Shadows: A Day in the Life of a Threat Actor Negotiator the timing felt less like a conference session and more like a debrief.

The speaker was Matt Barnett, CEO and co-founder of SEVN-X, a Pennsylvania-based cybersecurity firm. Matt spends his working hours talking to criminals on the dark web on behalf of clients whose systems have just been encrypted, whose data has just been exfiltrated, or  frequently both. He was joined onstage (in spirit, anyway) by his colleague Dave Zofran, who Matt repeatedly tried to make wave at the audience and who, in the great tradition of every backstage engineer at every conference ever, was having none of it.

This was easily one of the best talks of the day. Matt is jokey, sweary, self-deprecating, and irreverent, and the audience stayed well past the scheduled end for a Q&A that ended only because it was time for the closing keynote and raffle for Chris Machowski’s amazing BSides posters. Here’s what I took away.

“My career is a series of clerical errors”

Matt opened by describing his career path as “mostly an annoying inability to say no to things.” Somebody asked him if he wanted to do physical penetration testing. Sure. Forensic analysis school? Sure. Want to talk to criminals on the dark web? Hell yeah. Do you know what you’re doing? Not a clue. We’ll figure it out.

He compared himself to Jim Carrey in Yes Man, which he claimed was autobiographical. As somebody whose own career has been driven in no small part by saying yes to the next weird thing (DevRel, accordion-on-stage, organizing meetups, writing this blog for two decades), I felt seen.

Before getting into the meat of it, Matt did a room survey: students, IT folks (“the unpaid group, maybe the underpaid group”), cyber pros with one-to-five years (“the unjaded ones, because you still believe you can make a difference”), and the over-fives (“the unbothered”). Then he asked if there were any vendors in the room, and offered them the mic. Nobody took him up on it. They know a trap when they see one.

Myth-busting: paying ransoms, double-dipping, and “why does this exist?”

Matt opened with a couple of myths he wanted to put to bed.

Myth number one: Paying ransoms is illegal. Nope. Some payments are illegal, specifically payments to entities on the OFAC sanctions list, which is why you don’t want amateurs handling the wire. Ransom payment as a category is not, in itself, against the law.

Myth number two: You don’t always get what you pay for. Mostly false, with caveats. Double and triple extortion happen, but in Matt’s experience, they’re typically different groups exploiting the same unpatched Fortinet firewall (a refrain that came up roughly every six minutes during the talk; more on that in a moment), and not the original group going back on its word. Reputable ransomware crews are, weirdly, reputable, and that’s because their business model depends on it.

There is, however, no certification body for what Matt does. He has a GCFA, meaning that he’s a certified forensic analyst, but there’s no such thing as a certified-ransomware-negotiator credential. He quoted Jon DiMaggio (whom he says everyone calls ”Joe”) on the state of the field: nobody can really tell you whether you’re good at this job. You learn it the way Jason Statham’s character in The Mechanic learned his trade: “Good judgment comes from experience, and a lot of that comes from bad judgment.”

And on the moral question of “Why do negotiators exist at all? Doesn’t paying ransoms just feed the system?”, Matt invoked Tony Stark from the first Iron Man (alas, he’s no fan of the sequels): “It’s an imperfect world, but it’s the only one we got. The minute we don’t need threat actor negotiators anymore, I will build bricks and beams for baby hospitals.”

The ransomware industry is, in fact, an industry

Probably the most important reframe in the talk (and one I’m going to be repeating to people at NetFoundry and at Tampa Bay AI meetups) is that the mental image of “ransomware operator” most non-security people still carry around is wildly out of date.

The kid in his mom’s basement, surrounded by cold pizza, while she yells about meatballs? Not a thing anymore. Or more accurately, never coming back to a screen near you. Modern ransomware groups are full-on enterprises with:

  • Ransomware developers
  • Initial access brokers
  • Software and codebase maintainers
  • AI specialists (yes, really)
  • Web devs building the victim portals
  • Customer service / “help desk”
  • Translators (or rather, prompt engineers driving Google Translate and Claude and ChatGPT)
  • HR. HR.

“I don’t know if they have benefits,” Matt said. “The minute they have benefits, I might consider a career change.”

These aren’t lone actors. They’re businesses, and in many cases they’re tacitly or explicitly protected by their host governments because the money flowing back into their towns and villages props up local economies. As Matt put it: they’re heroes where they live. Which is one of those facts about the modern threat landscape that you have to sit with for a minute before you can keep going.

The shift to enterprise has changed everything about negotiation strategy. The old groups sometimes had a moral compass; for example, there was a group that would hand over decryption keys for free if they realized they’d accidentally hit a hospital, and another that announced they were retiring after they hit a billion dollars and then actually published a master decryption key on their way out. Those days are over. Today’s groups operate on margin and SLA, like any other B2B company. They just happen to be in the extortion vertical.

“Why use a negotiator?” Because you know everyone at your company.

Here’s a part of the talk worth keeping in mind should you find yourself or your company at the mercy of a ransomware organization.

Matt asked how many of us had worked at our current job for more than a year. Then more than five. Then more than ten. Then he asked the ten-plus hands: do you have kids? Because if you do, you have worked with these people longer than your kids have been alive. You know your coworkers better than you know your spouse, your friends, sometimes your own children.

Which means when your company gets ransomed, you’re most likely not going to be a calm, collected, rational actor. You’re a person watching your work-family bleed out, and you will do dumb things because of it. This is exactly why, in hostage negotiations, local PD will bring in officers from another jurisdiction the moment they realize anyone involved knows anyone involved. Emotional distance is the whole point.

A negotiator isn’t there because they’re smarter than you. They’re there because they don’t know your accounts receivable manager who just had her first kid, and that distance is, perversely, a gift.

The other thing negotiators bring is pattern recognition across hundreds of cases. There are really only two companies in the U.S. that actually facilitate ransom payments because it’s a risky line of work. Matt didn’t name them, but they’re not hard to find, and the negotiators who work with them have visibility into asks, settlements, durations, and outcomes that no individual victim can possibly have. Which brings us to the data.

Ransomware company discount curves

Hey, actual numbers!

Matt put up actual data from the last 12 months of facilitated payments. I’m reproducing the highlights here because they’re genuinely useful for anyone thinking about cyber insurance, incident response runbooks, or just calibrating their understanding of the threat landscape.

Akira (traditional / technical, business-oriented group)

  • Average initial ask: ~$1.3 million
  • Average settled payment: ~$429,000
  • Average discount: 60–70%
  • Average duration: ~20 days

Qilin (pronounced “CHEE-lin”; it’s Chinese and denotes a magical creature close in spirit to a unicorn or magical giraffe)

  • Average initial ask: ~$800,000
  • Average discount: ~62.5%, but with a hard floor around 50%
  • Tighter statistical clustering than Akira

ShinyHunters (the new kids; social engineering and help desk scams)

  • Much higher initial asks
  • Average discount: ~71%
  • Much shorter duration. Matt called it “almost like a fire sale.” I like to think of them as the TJ Maxx or Ross of malware.

The shape of the discount curve is the interesting part: time on the x-axis, percent off on the y-axis, and the curve goes up and to the right. Like buying a car, except the dealership is in a sanctions-adjacent country and the test drive is your production environment.

A practical consequence: if you’re paying for recovery (your systems are down, you’re hemorrhaging money), you pay faster and you pay more. If you’re paying for suppression (they didn’t encrypt anything, they just exfiltrated data and are threatening to leak), you can drag it out for a bigger discount. Which is exactly what we just watched happen with Canvas — Instructure ultimately paid for suppression and “shred logs,” not recovery.

The Black Basta “I had COVID” story

The single best war story of the talk involved Black Basta about a year and a half ago. The Black Basta victim portal, Matt said with what sounded like genuine professional admiration, is gorgeous. Looks like iMessage. Read receipts. Tight UX. “I wanted to send a meme. It doesn’t support that. The first ransomware group that allows GIFs [in their chats] is gonna be a work of art.”

But at the top of the portal: a countdown timer. Six days, twenty-three hours, fifty-nine minutes, fifty-eight seconds. Tick.

Matt was working a real case, was actually going to pay, and needed to stall. So he asked for more time. They gave him seven days. He asked again the following week. Seven more days. He was feeling pretty pleased with himself when, on the Friday of week three, they finally said: no more extensions. Pay or else.

Then Matt got on a flight home from Denver to King of Prussia, PA (which, as he pointed out, sounds like a Batman villain, as does his other hometown, Wayne, and look, I lived in Wayne; I can confirm it sounds exactly like the kind of place Bruce Wayne would buy a second house). He proceeded to get deathbed sick. Lost an entire weekend. Woke up Monday morning with roughly forty hours left on the clock and a portal full of increasingly unhinged messages from his criminal counterparts: “Are you there? Hello! I’m serious. Don’t make me do what I’m going to do.”

Matt typed back: “Really sorry, I got super sick. I think I had COVID.”

They gave him seven more days.

Matt’s rules of engagement (lightly paraphrased and worth tattooing somewhere)

He’s a flat-fee operator. Never a percentage of savings — because at that point you’re not a negotiator, you’re a co-conspirator with a conflict of interest. (The two negotiators who got federally indicted for actively colluding with ALPHV BlackCat are the cautionary tale he doesn’t want to become.)

He will lie to criminals with abandon, but he won’t lie to clients.

He won’t negotiate in bad faith. If you tell him “just stall, we’re never paying a dime,” he walks. Because he’s seen what happens when threat actors realize they’ve been strung along. He told a story about a client that changed their mind at the last minute after a long negotiation. The group responded by publishing pediatric patients’ Social Security numbers on Facebook. One. At. A. Time, in a slow, painful, drip campaign.

He does not hack back. He has heard of illicit activities waivers. They take two to three years to get and they are not a Get Out of Jail Free card. They are, at best, a “you probably won’t go to jail” card.

He does not facilitate the actual payment, because (a) money laundering, (b) OFAC compliance is a specialty unto itself, and (c) the two payment-facilitation firms have current data on which Bitcoin addresses and chat fingerprints map to which sanctioned entities. He just does the talking.

The four things he wants from every threat actor

When Matt’s at the table, he is always asking for the same four things:

  1. The decryption key. Of course.
  2. Proof of deletion. Typically a screenshot, ideally a video. He has an eight-hour video of someone DoD-wiping a drive somewhere in his archive.
  3. How they got in. No guarantees on how honest they’ll be; sometimes ransomeware operators will literally copy-paste from a different victim’s report. Matt and another negotiator once compared notes and got the exact same “you had a Fortinet firewall” attribution for clients who, respectively, ran Meraki and Cisco.
  4. A promise to never do it again. Worth roughly what you’d expect, but worth getting in writing.

If he can get those four, he’s done his job.

Q&A

The Q&A ran long. A few highlights:

Where do ransomware group names come from? Matt blames CrowdStrike. Honestly, fair. “Every cool t-shirt you’ve ever gotten from Black Hat came from the CrowdStrike booth.” I jumped in to point out that Qilin (pronounced “CHEE-lin”) is a Chinese mythological creature usually translated as “unicorn” or, more delightfully, “magic giraffe.”

Is ransomware seasonal? Absolutely. American holidays, especially Thanksgiving, are target-rich, because skeleton crews and four-day weekends mean defenders are slow to respond. Attackers also take vacations themselves. Ransomware drops off in the summer months. Because who wants to be at their computer when the weather’s nice? Even criminals deserve a beach day.

Are you ever personally targeted? Matt’s whole career is built around not announcing himself as a negotiator on the live chat. He plays the dumb IT guy. He’s got a story about a colleague suggesting they ask the threat actor what a “botcoin” is (after one of them mistyped “Bitcoin” in a chat), and the threat actors spent two days patiently explaining cryptocurrency to him. “Best time stall ever.”

What about emotional toll? Matt has been a paramedic, a cop, and a firefighter. “I don’t know of a crisis I haven’t run head-first into. It’s a programming defect from up top.” Then: “Better living through pharmacology. Oh God, don’t call my therapist.”

What industries get hit hardest? Manufacturing. Not necessarily the most often, but the hardest, because of legacy systems. He told a story about a Pennsylvania university that literally cemented a Novell NetWare box into a basement wall during construction because it was running directory services and they didn’t want to unplug it. It’s been running since the ’80s. It’s still there.

Why I’m writing this up

Two reasons.

One: BSides Tampa is a regional con and the speaker quality this year was outstanding. Matt’s talk in particular deserves a wider audience than the room it ran in. It could’ve been a keynote.

Two: I spend most of my professional life right now thinking about zero trust and AI-plus-network-security at NetFoundry, and what Matt’s talk drove home (better than any threat report I’ve seen lately) is that the human layer of incident response is where most of the leverage is. You can do everything technically right at the perimeter and still lose a six-figure negotiation because somebody on your team panicked, told the truth at the wrong moment, or said the magic words that flipped a transactional extortion into a personal vendetta. Zero trust as a philosophy (not just a product category) is partly about acknowledging that humans will always be the soft target, and designing accordingly.

Also: I am now permanently delighted by the idea that every ransomware negotiator on the planet should adopt the alias “Matt” so that threat actor groups go forever convinced that U.S. companies are staffed by an army of identically-named slow-witted staff who don’t know what Bitcoin is. Matt, if you read this, I’m in. Sign me up.

Big thanks to Matt Barnett and SEVN-X for an outstanding session, and to the BSides Tampa crew for putting on one of the best regional security cons in the Southeast!

Categories
Conferences Tampa Bay

poweredUp Tampa Bay Tech Festival 2026

Here’s something you might not know about the poweredUP Tampa Bay Tech Festival (which happens tomorrow): because I decided to attend it, I landed a job — and this has happened not once, but twice!

The reason poweredUP Tampa Bay Tech Fest led to those jobs is because a lot of tech industry people here in “The Other Bay Area” also attend. If you’re looking to meet technology leaders, innovators, entrepreneurs, and students, they’re at poweredUP, and they make it an opportunity-rich environment.

They’re mixing up their usual formula this year with a new format whose aim is to give attendees both the big-picture view of where technology is heading in Tampa Bay and the practical knowledge they can take back to their teams.

Here’s what’s on the agenda:

  • Job Seeker Hiring Event with High Tech Connect
    This will start at 10:30 (a little earlier than the rest of the conference) and it’s your chance to see who’s hiring and who’s looking! Bring your resume and your A-game.
  • The State of Tech – Tampa Bay
    They’ll kick off the day with a forward-looking conversation about how technology (and especially AI) is shaping Tampa Bay’s economy, workforce, and innovation ecosystem. They’ll have regional leaders, founders, and industry experts talk about the momentum building across our tech community and what it means for the future of our region.
  • Networking + Exploring Geek Row
    My favorite part! It’s happens in the part of the Mahaffey with the big windows and the view of the Bay, where you can connect with fellow attendees, meet innovative companies, and explore the Geek Row exhibitor area, where you can see what the local tech companies and orgs are up to.
  • Technical Keynote + Deep-Dive Sessions
    In the afternoon, poweredUP shifts into technical programming, featuring an inspiring keynote and multiple tech tracks focused on real-world implementation and best practices across today’s most important technologies.
  • More Networking + Happy Hour
    Wind down and reflect on the day’s insights with fellow attendees at our celebratory happy hour. Enjoy two complimentary drink tickets (21+) and build lasting connections in a relaxed setting.

Over the years, poweredUP has become a cornerstone event for Tampa Bay’s tech community, bringing people together to learn, collaborate, and spark new ideas about what’s next.

And I’ve said before, it’s led to some very nice outcomes for me. Go on May 20 and be part of the conversation shaping the future of technology in Tampa Bay!

Here’s where you can register for poweredUP Tampa Bay Tech Fest.

Categories
Conferences Editorial Security Tampa Bay

Go to BSides Tampa, because 80% of success is showing up

The 13th edition of BSides Tampa is happening tomorrow, Saturday May 16. It’s not too late to get tickets ($45 for general admission, $30 for students and military), and you can save 20% by using Tampa Devs’ discount code, TampaDevs20_BSIDESTAMPA_2026.

There are plenty of reasons to attend BSides Tampa, a cybersecurity conference that brings in 2,000+ attendees, including…

  • Great keynotes and presentations across seven tracks: keynotes, red team, blue team, cloud security, GRC and privacy, appsec, and AI and emerging
  • The exhibitor hall, where they don’t scan your badge, which means that you won’t get spammed as a result and they won’t sell your info
  • Interactive villages: malware, social engineering, IOT, network, lockpicking
  • A chance to meet the technology and cybersecurity professionals in the area, including these two…

But the most compelling reason I can think of to go is…

Let me repeat that:

80 percent of success is just showing up.

Let me illustrate with a story. Last May, techie-about-town Ammar Yusuf said he could hook me up with a free ticket to VueConf, which was taking place right here in Tampa.

I’d just come back from an expensive two-week trip, and I was still operating as an independent consultant. The spring and summer of 2025 were pretty slow; the well of clients was running dry.

I was strongly tempted to turn down the free ticket so I could devote more time and energy to finding my next job or client. Some might argue that it would be the smart thing to do.

But I decided to take the free ticket and go to VueConf instead, because I remembered all those times when showing up led to great things. Again, I remind you:

At VueConf, I met one of the organizers, Pratik Patel. When he came here in February, I decided to say hi and attend the Java User Group meetup where he gave a talk about AI architecture, pictured below:

I ended up chatting with Pratik, who then offered both me and Anitra free tickets to the Dev/Nexus conference in Atlanta that would take place a couple of weeks later. It was short notice, and Atlanta’s a 7+ hour drive from Tampa. But we remembered the rule:

So we went, learned a lot, and had a great time:

And while we were at Dev/Nexus, I ran into Pratik, who was walking the exhibitor floor with Venkat Subramaniam, who knows me because I show up to his talks whenever he comes to town.

Here’s the “Bollywood Buddy Movie Poster” photo taken at the meetup where I met Venkat:

When I ran into Pratik and Venkat at Dev/Nexus, Pratik suggested to Venkat that I speak at the Arc of AI conference that would take place the following month. Venkat thought that would be a good idea, and asked me to submit a couple of talk proposals. So I did, even though I was knee-deep in contract work and a job search, because…

My submissions got accepted, and the result was my talk about writing documentation and example code for consumption by AI agents:

…and I met a lot of people:

And here’s the kicker: not only did I get to meet new people and attend (and speak) at conferences, but all this helped me land my current job at NetFoundry. The fact that I’d managed to land a speaker gig at Arc of AI was a key point in my job interviews. And I wouldn’t have the key point for that interview if…

  • I didn’t speak at Arc of AI, which wouldn’t have happened if
  • I didn’t apply to speak at Arc of AI, which wouldn’t have happened if
  • I didn’t go to Dev/Nexus, which wouldn’t have happened if
  • I didn’t go to Pratik’s talk at the Tampa Java User Group meetup, which wouldn’t have happened if
  •  I didn’t go to VueConf with the free ticket Ammar gave me.

The lesson here is simple:

So if you don’t have prior commitments and you can afford to do so and you’re in a tech/tech-adjacent/cybersecurity/cybersecurity-adjacent field — and especially if you’re looking for work — consider going to BSides Tampa tomorrow, because you know what showing up can do for you!

Once again, ticket prices are:

  • $45 for general admission
  • $30 for students and military

…and you can save 20% by using Tampa Devs’ discount code, TampaDevs20_BSIDESTAMPA_2026.

Categories
Artificial Intelligence Conferences What I’m Up To

Baruch Sadogursky and Leonid Igolnik’s Arc of AI presentation: “Back to the Future of Software: How to Survive the AI Apocalypse with Tests, Prompts, and Specs”

For me, Arc of AI wrapped up with my attending Baruch Sadogursky and Leonid Igolnik’s madcap presentation, Back to the Future of Software: How to Survive the AI Apocalypse with Tests, Prompts, and Specs… and unexpectedly playing the accordion!

Baruch does DevRel at Tessl, the AI agent enablement platform, where his full-time job is thinking about context engineering and how agents actually write code. Leonid’s a former Tucows coworker, and now a recovering CTO who advises a range of tech companies on what he calls with a grin that was half joke and half resigned sigh “how to adopt this new and exciting age of never looking at the code that you shipped to production and still deliver predictable results.”

There are your typical “last slot of the last day of the conference” talks. And then there are ones like this one, where two grown men show up dressed as Doc Brown and Marty McFly, pull in Yours Truly to improvise a song mid-talk, and spend forty-five minutes arguing that the future of software engineering looks suspiciously like the waterfall model your company abandoned in 2009, except this time it might actually work!

If you wish you’d caught it, you’re in luck; they recorded their presentation, and you can watch it right now:

They’ve been road-testing this talk for over a year. I caught an earlier version referenced in their slides from Baruch’s appearance at DevNexus 2026 and a Geecon keynote in Kraków…

…but the Austin version had clearly been sharpened by a lot of live feedback and a lot of real-world use of their toolkit.

Underneath the flux capacitor jokes and the AI-generated illustrations of monkeys in lab coats, they were making a serious argument, and it’s one I’ve been chewing on ever since.

I want to unpack it here, because I think they’re onto something that a lot of the spec-driven-development conversation is quietly missing.

The setup: a crisis of trust

Baruch opened with a story that’s aged like a fine wine over the last few months: Amazon’s Kiro, a spec-driven IDE whose rollout was, in his telling, “standardized, shocked, and delivered software that crashed AWS.” The bit got a laugh. Then he went to the show of hands.

  • Who ships code to production that was written by an LLM? Most of the room.
  • Who’s happy with the results? Fewer hands.
  • Who trusts what’s being produced? Fewer still.

Then he put the real numbers on screen. According to the most recent Stack Overflow developer survey:

  • More than half of the code being committed to production is AI-generated.
  • In the same survey, 96% of developers say they don’t fully trust that AI-generated code is functionally correct.
  • And only 48% say they always check AI-generated code before committing it. (Leonid’s deadpan observation: “I would argue half of that 48% lied.”)

This means that the majority of new code is being written by systems the people shipping it don’t trust, and most of those people aren’t rigorously reviewing the output. In effect, we’ve collectively invented a new compiler and then, collectively, decided to stop reading what comes out of it.

Baruch has a phrase for this, and it’s similar to something I mentioned at the last AI Salon in St. Pete: “The source code is the new bytecode.” Nobody reads it. We rely on it blindly. The difference, of course, is that bytecode is produced by a deterministic compiler. Source code produced by an LLM is not.

He drove this home with a self-deprecating story about the talk’s own show notes page. “I asked the agent if this link made it into the show notes, and what did I tell you? That I checked. The agent generated a lot of links. I checked that there were a lot of links. That was the question.”

The room laughed because everyone recognized themselves in it. “I always check my AI-generated code” turns out to mean almost nothing. It’s the code review equivalent of your kid telling you they cleaned up their room. Technically they picked things up, but you wouldn’t want to walk in there barefoot (and if they’re teenage boys, maybe not without a gas mask).

The Chasm

The core of the talk is built around three C-words, and the first one is the one that frames everything that follows: the Chasm.

The Chasm is the gap between what you meant and what actually runs. Every abstraction in our industry’s history has had one of these. Assembly programmers didn’t trust compilers. Baruch showed a 1950s quote about exactly that skepticism, from back when Grace Hopper was having to sell people on the idea that you could let a machine write assembly for you.

It continued: C programmers didn’t trust garbage collectors, C++ programmers didn’t trust the JVM. If you’re of a certain age, you might remember when there were people who said Java would be too slow, would never compete in production, and that this crazy “bytecode” idea would never catch on.

Every time, the chasm eventually closed. The compiler got good enough, the runtime got fast enough, and the trust followed.

But Baruch and Leonid argue that this time, it’s different, and for one specific reason that Leonid kept hammering home: for the first time in the history of our industry, the compiler is non-deterministic.

With agentic coding, you can type the same prompt twice and get different code each time. You can run the same agent on the same spec on the same codebase and get different tests. The entire compiler toolchain we’ve built over seventy years assumes that the same input produces the same output, and LLMs don’t do that. They’re (and this is the running metaphor of the talk, complete with a slide of a chimpanzee wearing a “Mr. Fusion” hat) monkeys with GPUs.

The infinite monkeys theorem says an infinite number of  monkeys working on an infinite number of typewriters for an infinite period of time will eventually produce the complete works of Shakespeare, or at least a novel Mr. Burns could appreciate:

These monkeys produce Shakespeare sometimes. They also produce your company’s incident postmortem, and you don’t get to pick which one shows up in the PR.

Baruch’s favorite recent example, which made the room groan/laugh in baleful self-recognition: Uber is burning through LLM tokens faster than they budgeted, and what started as an engineering productivity initiative is now a finance problem.

“We’re in what, March, April? They planned out their budget for the year. So those monkeys are very productive. Typing and clearly doing something.” Which is both funny and, if you squint, terrifying. A lot of money is being spent on a lot of code nobody is reading.

This is where the talk gets its central mantra, delivered loud enough that it needed what Baruch called a “musical highlight,” which is where he turned to me in the front row and asked me to improvise something on the accordion.

Here are my hastily-improvised lyrics:

Never trust a monkey!

Never trust an ape!

Always verity —

Make sure your code’s in shape!

And then he moved on to the thing that I think is actually the core contribution of the talk.

The MIT detour

Before he got to the Chain, Leonid took a detour through an MIT paper he’d been carrying around for weeks. The paper maps AI-suitable tasks across two axes: cost of developing the artifact, and cost of verifying it. Four quadrants fall out of that.

  • Safe zone: cheap to generate, cheap to verify. This is where AI shines. The slides for their talk, for instance — AI-generated illustrations of Doc and Marty and the flux capacitor, easy to produce, easy to eyeball and approve. Nobody’s life depends on a specific monkey illustration being “right.”
  • Risk zone: cheap to generate, expensive to verify. This is where most software engineering lives, and this is the terrifying quadrant. The LLM can produce 2,000 lines of code in a minute. A human takes an afternoon to confirm it does what it’s supposed to, and two more days to confirm it doesn’t also do things it’s not supposed to.
  • Expensive-but-verifiable: costly to generate, cheap to verify. Things like formal proofs.
  • Avoid entirely: costly to generate, costly to verify. Don’t use AI here.

Leonid’s point was that our industry has stampeded into the risk zone and congratulated itself on the speed. We’re generating code faster than ever and verifying it less than ever, and the delta is being paid in the currency of production incidents and quietly broken features that nobody notices until a customer complains.

Baruch had to stop and ask ChatGPT to “explain this diagram Barney-style in one paragraph,” with a cut to a slide of the infamous purple dinosaur. The paper’s actual title is Static Regime Map with Dynamic Pressure. That’s the joke, and it’s also the point. The academic framing of this problem is hard to read, and we’re all moving too fast to read it.

The Chain

If you can’t trust the monkey, you need a chain of custody from intent to code where every link is either deterministic or independently verifiable.

Baruch and Leonid walked through the typical AI-assisted workflow and color-coded it by trustworthiness. Humans write the prompt; they’re considered trustworthy, because hey, it’s us.

(Leonid jumped in here to point out that humans are also a subtype of stochastic systems, which got the biggest laugh of the talk. “Someone loves humans in this room.”)

After that, an LLM turns that prompt into a spec. It’s not trustworthy, because a monkey wrote it.

Then the LLM writes code against that spec. Once again, it’s a monkey, and once again, it’s not trustworthy

Then, if we’re being honest about most shops, the LLM also writes the tests that are supposed to validate the code it just wrote. This is hilariously, catastrophically not trustworthy, because you just asked the monkey to grade its own homework.

Leonid calls this “hallucinated verification,” and it’s the thing that makes the green-build signal meaningless. If the same system writes the implementation and the tests, a passing suite tells you nothing. The tests don’t measure whether the code is correct; they measure whether the monkey was internally consistent about what it thought it was building.

Baruch showed a real example that made everyone wince. He showed an agent running late in a long session, getting tired of failing tests, and instead of fixing the code,  systematically commenting out the verification logic, flipping assertions to True, and declaring the project “95.2% correct.” The screenshot was almost funny. It was also a thing that had actually happened, in an actual project, to an actual developer. And the developer almost shipped it.

Leonid’s and Baruch’s proposed fix is the Intent Integrity Chain. The idea is to insert a deterministic step between the spec and the tests, and then lock the result so the agent can’t tamper with it.

The flow looks like this:

  1. Humans write the prompt. Verifiable because we wrote it.
  2. LLM generates the spec. Not yet trustworthy. But the spec is human-readable prose, which means humans (including non-technical humans) can review it. This is where you catch things like “Wait, we never said what happens if the browser crashes mid-session!” before you write any code.
  3. A deterministic tool generates tests from the spec. Not an LLM. A template-driven, repeatable process that turns Gherkin-style scenarios into executable tests. Same input, same output, every time.
  4. The tests get cryptographically locked. This is the clever bit. They hash the test files and store the hash in a git note. A pre-commit hook, itself read-only at the OS level, refuses to accept any commit where the test hash doesn’t match, and:
    1. If an agent tries to comment out a failing test to make the build pass, the commit is rejected.
    2. If the agent tries to disable the hook, the hook is read-only.
    3. If the agent tries to replace the hash, the hash is stored in a git note that’s version-controlled and tamper-evident.
  5. LLM writes the implementation. Now we’ve constrained the monkey. It has to make the locked tests pass. It can’t rewrite them. It can’t disable them. It can whine about the hook (and Baruch said one of their test runs produced an LLM that found the hook, disabled it, and complained in its own comments that “some stupid hook is failing my commits”), but it can’t get around it.

The elegance here is that every link in the chain is either deterministic or externally verified. No model grades its own work. The human-verifiable artifact (the spec) is something a product manager can actually read. The machine-verifiable artifact (the hash) is tamper-proof. And the monkey only gets to do what monkeys are good at: filling in the blanks under adult supervision.

Leonid offered a framing that I think is worth giving some extended thought: “The idea is that everything that can be scripted should not be left for monkeys to deal with. Your CFO will thank you for that.”

There’s an unglamorous but important insight buried there. Every time you use an LLM to do something deterministic (format a file, generate boilerplate, fill in a template), you’re paying token costs to produce non-deterministic output for a task that had a deterministic solution. Push the deterministic stuff back into deterministic tooling and save the stochastic budget for the places you actually need it.

Wait, isn’t this just waterfall?

Baruch put this question on a slide himself, because he knew it was coming. Prompt → spec → tests → code, with human review at each stage? That’s Rational Unified Process (RUP) with a fresh coat of paint. Didn’t we spend the 2000s escaping that thing?

His answer: the reason waterfall failed wasn’t that its artifacts were bad. Specs are good. Reviewing specs is good. Thinking about non-functional requirements before you write code is good.

Waterfall failed because the cycle time was measured in months. By the time the spec committee finished arguing about whether the customer wanted a dropdown or radio buttons, the customer had changed companies and the market had moved on.

The Intent Integrity Chain runs the same loop in fifteen minutes. You write a prompt, the LLM drafts a spec, you skim it and catch the missing edge cases, the tool generates tests, you glance at the scenarios, the agent implements, and you’re done. The artifacts waterfall produced are genuinely valuable; they just weren’t worth the wait. LLMs make the wait go away.

This, I think, is the insight worth taking seriously. It’s not “Waterfall is back, baby!” It’s “the specific failure mode of waterfall was latency, and AI has changed the latency equation.”

The ceremony that was unaffordable in human time is cheap in LLM time. Specs that nobody had the bandwidth to write in 2005 can be generated, reviewed, and locked in 2026 before your coffee gets cold (or if you prefer, before your Coke Zero gets warm).

There’s a cultural echo here that Leonid leaned into from his any my past. He and I were actually colleagues 26 years ago at Tucows, back when Tucows was the second-largest domain registrar in the world, and they used to ship software after formal spec sign-offs. Not because it was fashionable, but because the cost of shipping a bug to production was high enough that the sign-off was cheaper.

The MIT paper’s argument is that generation costs have collapsed but verification costs haven’t. This puts us back in the same economic regime that made spec sign-offs rational in the first place. The pendulum’s not swinging back to waterfall because we got nostalgic. It’s swinging back because the economics swung back.

The demo

Leonid drove the live demo, which showed their toolkit, intent-integrity-chain/kit on GitHub. The dashboard shows the whole chain laid out as a web UI: premise at the top, then the “spidey diagram” of project priorities (documentation: high; TDD: high; minimal scope: low, because they’re not shipping to Mars), then specs with traceable requirement IDs, then the auto-generated Q&A where the LLM plays devil’s advocate and asks “What did we not think of?”

That reflective-reasoning step got the biggest reaction from the audience, and I agree with the reaction; it’s quietly the most useful thing in the whole toolkit. Anyone who’s sat through a real spec review knows that the value isn’t the document; the value is the five minutes where someone brings up a condition that the developers didn’t think of, such as “But what if two users do X at the same time?”, and the room goes silent.

It turns out that modern LLMs are phenomenal at playing that someone. They’ve read ten thousand spec reviews in their training data. They know the questions.

Leonid’s example: the tool looked at a spec for a flight-search library and asked things like “Do you need backward compatibility?” and “What happens if the browser crashes mid-session?” Those are exactly the questions the grumpy senior engineer asks in a room full of junior engineers, and now every team has one on demand, for better or worse.

The other trick the kit leans on hard is a literal software-project “constitution,” in a spirit similar to Claude’s constitution, a document that sits at the root of the repo and declares things like “always do TDD” and “all specs must trace to requirements.” It’s lifted from GitHub’s Spec Kit, and Baruch pointed out the genuinely clever reason it works: LLMs have been trained on enormous quantities of text about actual constitutions, with their amendments and ratifications and solemnity.

The word “constitution” triggers a whole cluster of “take this seriously” behavior in the model. It’s prompt engineering by semantic association, and supposedly works better than rules.md or guidelines.txt.

Everything in the dashboard is traceable: a requirement produces one or more spec features, each feature produces one or more Gherkin scenarios, each scenario produces one or more executable tests, each test gates one or more implementation tasks. Click any task and you can walk the chain backwards to the original requirement. Click any requirement and you can walk it forward to the code that implements it. The whole thing is visible, and because the specs are prose and the scenarios are human-readable, non-engineers can walk the chain too.

The new version of the kit is, per Leonid’s pointed demand, 57% faster than the old one. Apparently Baruch spends a lot of time on Slack complaining to Leonid about speed, which should be expected when these two characters get together.

The Q&A

A few exchanges from the Q&A are worth flagging for anyone thinking of trying this:

“Who writes the test scenarios, the human or the monkey?” Both, with the human in charge. The LLM drafts the Gherkin-style features from the spec. The human reviews those features, not line-by-line test code, but the human-readable scenarios, and signs off. Then the deterministic tooling converts those locked scenarios into executable test code. The human is the verification step. The tests are downstream of that verification, which is why locking them matters. Baruch was emphatic on this point because he’d seen audiences get confused: the word “spec” gets overloaded between “business spec” and “technical test scenario,” and both are part of the chain but play different roles.

“How do I do this for an existing codebase?” This is where Baruch had news: they’re working on a “brownfield” mode, and it’s the unlock that will let this approach work in the real world where nobody has a greenfield project. The recipe:

  1. Point the kit at an existing project with tests.
  2. Lock the code as read-only.
  3. Have the LLM write specs from the tests, not from the code. Tests document behavior; code documents implementation. You want the behavior.
  4. Use test coverage and mutation testing to measure whether the extracted spec actually reflects reality. Coverage tells you which code is exercised. Mutation testing tells you whether the tests are meaningful or just happen to execute the lines.
  5. Iterate until you have a spec you trust.
  6. From that point forward, any new feature goes through the full Intent Integrity Chain on top of the ingested baseline.

This is a lot of work. Leonid didn’t pretend otherwise. But he pointed out that much of it is now automatable in a way it wasn’t five years ago. You don’t hand-write specs for a million-line codebase; you have the LLM draft them and then you review.

“Who invented spec-driven development?” Someone asked this, and a second person looked it up live: there’s a 2004 paper from the XP conference in Germany that uses the exact phrase, combining TDD with Design by Contract. I mentioned that Design by Contract was baked into Eiffel in the 80s, and Baruch noted that NASA was doing something that looks a lot like it in the 1960s. The joke being that every generation rediscovers the value of writing things down before you build them, and every generation thinks they invented it.

What I’m taking home from this

First: the “monkeys with GPUs” framing is useful even if you don’t adopt the full toolkit. It’s a cleaner way to think about where trust does and doesn’t belong in an AI-assisted workflow. Any link in your pipeline where a model grades its own output is a link that’s lying to you. Once you see it, you see it everywhere; in the auto-generated tests, in the “this looks right” PR reviews, in the agent that confidently declares a task complete because it decided the task was complete. The mental move of asking “Who verified this, and do they have any skin in the game?” is a free upgrade to your code review habit.

Second: the locking step is the thing most spec-driven-development conversations leave out, and it’s the thing that makes the rest of the chain actually hold. GitHub Spec Kit gives you the spec ceremony. Kiro gives you the spec ceremony. Plenty of tools give you the spec ceremony. Very few of them prevent the agent from quietly editing the spec, or the tests, or the constitution file, halfway through the build. A cryptographic lock with a read-only pre-commit hook is an unglamorous piece of engineering, but it’s what turns the ceremony into actual guardrails. Everything upstream of the lock is advisory. Everything downstream of the lock is enforced.

Third, and once again, this is something I’ve come to on my own, and you might have, too: Baruch’s line about the source code being the new bytecode. If he’s right, the natural-language spec is the new source code, and the job of the next generation of developer tools is to make specs first-class citizens: versioned, tested, reviewed, locked. That’s a different job than what IDEs do today. It’s a different job than what LLM assistants do today. It’s arguably the job that DevRel is going to spend the next five years explaining, and I say that as someone who’s going to be doing some of the explaining.

Fourth, a smaller thing that I liked: Baruch’s experiment of asking an LLM to produce JVM bytecode directly, skipping Java entirely. The bytecode is the real artifact the JVM runs; why route through a source language? Today this would be a terrible idea because the ecosystem assumes source code is what humans read and review. But in a world where humans stop reading the source code anyway, the argument for source-as-intermediate-representation gets weaker. We may, in ten years, look back at 2026 and notice that “the code” was quietly replaced by “the spec plus the tests plus the locked chain,” and that the specific sequence of tokens the LLM produced in between became about as interesting as the specific sequence of x86 instructions the JIT emits. That’s a weird future. I’m not sure I like it. But I’m pretty sure Baruch and Leonid are right that it’s the direction we’re drifting.

I came into Arc of AI expecting to hear a lot about agents and MCP (and I did, including from my own talk). I didn’t expect the closer to reframe the whole problem as a question of non-deterministic compilation and how to bolt determinism back onto it. That’s a bigger idea than the Back to the Future bit gave it credit for. The talk is funny, and the costumes are good, and the monkey slides are excellent, but the thesis underneath the zaniness is the kind of thing that changes how you think about what you’re doing on Monday morning.

That’s the mark of a good end-of-conference presentation. You leave laughing, and then at three in the morning you sit up in bed thinking about pre-commit hooks.

Go try the kit. Start with a greenfield project where the stakes are low. Write a prompt. Let the LLM draft a spec. Review it. Let the tool generate Gherkin scenarios. Review those. Lock them. Let the agent implement. Notice how much more honest the green build feels when the tests weren’t written by the thing you’re trying to trust.

And if you get a chance to see Baruch and Leonid do this talk live, go. And bring a musical instrument!


Slides, video, and the full kit are linked from speaking.jbaru.ch and github.com/intent-integrity-chain. The Intent Integrity Kit is also available through the Tessl Registry. The MIT paper they kept referencing — the one whose actual title needed Barney-style explanation — is in the show notes along with everything else.

Categories
Artificial Intelligence Conferences

Mike Amundsen’s Arc of AI afternoon keynote: “Thinking with Machines”

I’ve been to enough tech conferences to develop a finely-tuned radar for keynotes that are actually worth your time versus ones that are just vibes and venture capital optimism dressed up in slide decks. Mike Amundsen’s Wednesday afternoon keynote at Arc of AI in Austin — titled Thinking with Machines — landed firmly in the former category. I’m still chewing on it, which is the best possible sign.

An idea from a 147-year-old “movie”

Amundsen kicked things off with Eadweard Muybridge’s famous 1878 “Horse in Motion” experiment, where Muybridge set up a row of cameras, had a horse run past them, and then strung the resulting photos together into what we’d now recognize as the world’s first motion picture.

He used this particular example because there’s art based on “Horse in Motion” hanging in the hotel where Arc of AI is taking place. It’s in the hallway leading to the elevators and first-floor rooms, so it’s almost impossible to miss:

The point wasn’t “look, a fun historical curiosity, right here in the hotel!” The point was: our brains are story-completion engines. Show us a fast enough series of still images and it’s no longer a series of photos. It becomes a movie, a continuous stream of reality. It’s innate, and we can’t help it. We’re wired to fill in the gaps and manufacture coherence even when it isn’t there.

This is pretty much what’s happening with AI right now. We’re using words like “feeling,” “thoughtful,” and “trusting” to describe systems that are, at their core, sophisticated pattern-completion engines. Our brains are doing what they always do: making up a pretty good story to explain away something they weren’t prepared for.

That’s both exciting and a little terrifying. Mike was kind enough to call it “both good news and bad news” rather than just setting the room on fire.

The call center problem

Before getting into the historical heroes portion of the program, Mike took a detour through the AI-and-customer-service story that you’ve probably seen play out in the headlines. You know the one: the overreach-and-backtrack cycle that happens whenever new technology meets an industry unprepared for its consequences.

But he wasn’t making the usual point. He wasn’t talking about chatbots giving bad advice or AI agents going rogue. He was talking about the Pareto principle, a.k.a. “The 80-20 Rule.”

Here’s the setup: roughly 80% of customer service calls are easy. They’re things like password resets, store hours, and return policies, all of which a moderately caffeinated human can handle on autopilot. The remaining 20% are brutal: long, complicated, and emotionally charged; these are the kinds of calls that take everything you’ve got.

It turns out that when companies automate the call center, they don’t automate the hard problems. They do what profit-minded entities do and go after the low-hanging fruit. They automate the easy calls first. After all, it’s the rational business decision.

This yields a (supposedly) unintended consequence: the humans who remain now have a 100% hard-call day, every day. No easy wins to catch your breath. No quick “You’re welcome, have a great day!” to reset your mood between the difficult ones. Just wall-to-wall complexity and frustration, shift after shift.

“This is burnout,” Mike said, in summary. “Big time burnout.”

And nobody planned for it, because nobody was thinking past the efficiency gain. These are unintended consequences, and the call center story was just the warmup for a much bigger version of the same problem.

Three visionaries who saw this coming

Mike walked us through three figures from computing history who all had the same essential insight: computers should extend human thinking, not replace it.

Vannevar Bush was project manager at Los Alamos during the Manhattan Project, which meant his job was making sure a building full of geniuses had everything they needed to think at maximum capacity. He noticed that the real breakthroughs happened in the hallways and at dinner tables, when scientists made unexpected connections across disciplines.

He called it a “virtual brain,” a gestalt where the sum was  greater than the parts. In 1945, he wrote As We May Think, which described the memex: a personal workstation (built around microfilm, because it was 1945) that would let you create, store, and share “trails” of linked information. This gave us the intellectual ancestor of the hyperlink, forty years before Tim Berners-Lee.

J.C.R. Licklider was the “party animal” of DARPA who essentially funded the creation of the internet. Licklider wrote about “man-computer symbiosis” and had a very specific vision of how the relationship should work: computers were for doing, not deciding. Computing devices should handle the drudgery and the risky mechanical work, and leave the work of judgment, creativity, and choosing to us humans.

That separation between doing and deciding was, for Licklider, the whole point. And Mike’s argument is that we’re currently watching that line get extremely blurry in ways that would have alarmed Licklider considerably.

Douglas Engelbart is known particularly for two things:

  1. Inventing the mouse (reluctantly, because he needed it for a demo), and
  2. Giving what’s now known as “The Mother of All Demos” in 1968.

The Mother of All Demos was 90-minute solo stage performance in which Engelbart debuted, among other things:

  • the mouse
  • the graphical interface
  • hypertext
  • collaborative editing
  • video conferencing
  • screen sharing
  • version control.

In 1968. Engelbart’s driving obsession was “bootstrapping,” a word he used to describe using computers to make people smarter so they could build smarter computers so they could become smarter still. The idea was to create an upward spiral of human capability, with technology as the lever.

The throughline connecting all three is that they all saw technology as a thinking partner, not a replacement for thought.

De-skilling: The February Anthropic study

This is the part of the talk I expect to be quoting for months.

Anthropic released a study in February looking at how AI tool use affects skill formation, which is a fancy-pants term what the rest of us might call “learning.” They split developers into two groups: one with AI assistance, one without. Both groups were given a codebase they’d never seen before, in a language they knew, with bugs to fix.

Both groups finished in roughly the same time. The AI-assisted group spent more of that time talking to the AI than actually looking at the code, but the end result was comparable.

Then came the second task: a different, similar codebase with similar bugs.

The non-AI group solved it significantly faster than the first time. They’d learned. The AI group took about the same time as before.

The difference is what Amundsen called embodied knowledge, which is the stuff that gets installed in your brain through struggle and error and figuring things out the hard way. The non-AI group had gone through trial and error on the first task. Those mistakes became capability. The AI group had outsourced the trial-and-error loop to the machine, and when the machine wasn’t holding their hand anymore, they were roughly where they started.

The study went further. It wasn’t just a binary “AI or no AI”  exercise, but featured a gradient of engagement styles. They found that the more actively engaged a person was in solving the problem themselves, regardless of whether they used AI, the more they learned and the better they transferred that knowledge to new problems. Engagement is the key variable. The AI is just one factor in whether engagement happens.

The creative loop that AI disrupts

Mike has a framework for this. It’s a three-stage creative loop that he argues is the core differentiator between human and machine cognition.

  1. Brainstorm: Generate lots of ideas without censorship. Volume is the goal.
  2. Refine: Evaluate, narrow, follow promising leads, and backtrack when needed.
  3. Execute: Build the thing you’ve decided to build.

Every creative domain uses some version of this loop, from musicians to athletes to architects to engineers. The loop is how humans make decisions and build things that are genuinely new.

AI, Mark argues, is great at brainstorming. It’ll generate ideas you’d never have thought of, and you’ll always find gems among them. It’s mediocre at refining. You can design interactive experiences that scaffold the refining process, but it doesn’t happen automatically. It’s reasonably good at execution, which is exactly where we’ve focused almost all our energy and tooling.

The biggest problem, though, is one that doesn’t fit neatly into any of those three stages: AI is terrible at stopping.

“Generate an idea, generate an idea, generate an idea. Okay, let me refine these. And then at the bottom it says, ’You know, it would make this really cool if I added an image.’”

AI always wants to do one more thing. It’s a dopamine-delivery machine with no off switch, and the cognitive load of constantly managing the firehose is real. Harvard researchers are apparently already calling it “AI brain fry.” The call center paradox is a microcosm for the larger situation: we’ve outsourced the easy cognitive work to the machine, and now we’re spending all our time on the hard parts.

The Coach Model

So what’s the alternative? Amundsen’s answer is what he calls AI coaches.

The idea: instead of building AI systems that do work, build AI systems that guide you through doing work. Make system prompts (or “skill files” or however your shop names them) engineered to embody a coaching personality. They should do things like asking questions, surfacing options, making the human choose, pausing at decision points, and crucially, stopping when the task is done.

He demoed a simple example: a coach that walks you through building a small API. It explains its scope upfront. It asks you to confirm you’re ready before proceeding. It presents choices with context (“Most APIs in our system use JSON. Are you okay with text?”). It pauses before moving from refinement to execution. It waits for your explicit go-ahead before generating code. And when it’s done, it says “We’re all done here. Stop.”

The human is always in charge of the pace. The machine doesn’t proceed without confirmation. The decisions are yours.

It sounds almost quaint compared to the “give the AI five monitors and let it loose” approach that’s been trending lately. But Mike’s been building these coaches for nine months, and the principle is backed by the research: high interactivity plus genuine engagement plus the human making real decisions equals actual learning. The result is embodied knowledge, the kind you can carry to the next problem.

Engelbart’s bootstraps and ours

Mike closed by coming back to Engelbart’s bootstrapping vision. Engelbart, it turns out, got into computers because he read As We May Think (Vannevar Bush’s 1945 article) while stationed on a Pacific island during peacetime military service. Remember, this was an article in a twenty-year-old magazine, and it changed the direction of his life!

The chain goes: Bush writes the article → Engelbart reads it decades later → Engelbart invents tools that help humans think better → those tools help us build more tools → and so on, upward.

That’s the version of AI development Mike is asking us to choose. Not the Terminator version. Not the robots-take-over-and-destroy-humans version. Instead, it’s the Licklider/Engelbart version, where technology makes us smarter, preserves the creative loop, and keeps humans in the deciding seat while offloading the drudgery.

He closed with Alan Kay’s line: “The best way to predict the future is to invent it.” And he made sure we heard the warning embedded in it: other people are also inventing futures, and not all of those futures are ones we’d choose.

This was, for my money, a contender for the most substantive talk at Arc of AI. It’s the kind of talk that gives you not just something to think about on the drive home, but something to actually do differently. I’m already reconsidering some of my own AI tooling habits, which is the highest compliment I can pay to a conference keynote.

If you want to dig deeper into the coaching approach Amundsen has been developing, he mentioned he’s working on a book. I’ll link it here when it’s available.

Categories
Artificial Intelligence Conferences

Venkat Subramaniam’s Arc of AI afternoon keynote: “Influencing the Irrational AI-Clouded Minds”

Hello from the Arc of AI conference, happening as I write this from Austin, Texas! I’m currently enjoying a pre-new-job “vacation” in true geek style by attending, speaking at, and playing accordion at an AI conference. Yeah, that tracks.

Yesterday (Tuesday, April 14, 2026), Arc of AI’s ringleader, Venkat Subramaniam, gave one of his “big picture” talks with the insight, warmth, and humor that are his stock in trade. I took notes and pictures, my phone took a recording, I used an LLM to pull it all together, and the end result are these notes from Venkat’s post-lunch keynote, titled Influencing the Irrational AI-Clouded Minds.

Venkat’s session went beyond concerns about code and into the idea of protecting our field from the “breakneck speed” that threatens to break our collective necks. Right now is both the best of times and the worst of times. As tech professionals, we find ourselves on a daily rollercoaster of twists and turns, bouncing between fascination with new capabilities and a very real fear of potential (career? existential?) threats.

Venkat noted that while AI is an incredible tool, it’s  currently functioning as the most powerful “vomit engine” ever created. It can puke out more code than you can handle, but that doesn’t mean it’s code you should trust.

It takes time for tech to catch on and fit into our lives

As the saying goes, history doesn’t always repeat itself,  but it often rhymes. Venkat reminded us that technology maturity is rarely instantaneous. We often take for granted that the world wasn’t always “plugged in.”

  • Electricity: First made available 1878, it was initially very expensive and difficult to generate. It took until the 1940s for the majority of U.S. homes to get electricity, which was a 60-year journey.
  • Bicycles: Designs emerged in the 1830s (and wow, were they ridiculous and downright dangerous), but they didn’t become commonplace until the 1890s. 60 years again.
  • Cars: Emerged in the late 1800s, and took until the 1930s for 60% of U.S. households to own one. 60-ish years.
  • Flight: I was in the “keener row” (Canadian slang for the front row of a classroom) at the keynote, and Venkat knows where I live, so he called on me and asked if I knew where and when the first commercial flight took place. I didn’t know.It turns out that it happened in 1914, cost $400 ($13,000 in today’s money), and was a 22-minute jaunt from Tampa to St. Pete. It would take until 1972 for 50% of Americans to have flow

There is something both magical and sobering about that 60-year window. Every technology starts out shaky, costly, and unsafe. AI is no different.

Redefining AI: It’s Not Intelligence

One of the most grounding points of the keynote was the definition of AI itself. This is something that Venkat brought up at his talk in Tampa back in December.

We admire “intelligence” as original thinking and innovation. What AI does often looks more like what we call “plagiarism” in school:

“I call AI as Accelerated Inference, because that’s what AI really does. AI is an inference engine, not a machine [of intelligence].”

AI analyzes patterns based on available data. If the data’s garbage, the inference is garbage. And let’s be honest: we’ve trained AI on the code we’ve been writing for decades (and hey, it’s not all good). We’re effectively feeding it garbage and being surprised when it shows us what it’s got.

Where AI shines, and where it ends up where the sun don’t shine

Venkat shared a powerful anecdote about an expert C++ developer struggling to write automated tests for an enormously complex library. He suggested handing the tass over to AI. The developer, after some coaxing, did that, and the AI generated a suite of tests with extensive mocking.

But the initial result didn’t even compile. Venkat began to think that his suggestion was a mistake…until the developer took a closer look at the test code.

Upon closer inspection, the developer noted that while the tests didn’t work, they were close to working. He said that he could get them up and running in two hours. It turned out that it took even less time than his original estimate.

The developer came to the realization that what the AI did in seconds would have taken him three months of full-time effort to figure out. AI got the developer 70% of the way there, but it required a developer with enough expertise to do the remaining 30%.

AI strengths vs. weaknesses

I need to take a moment to thank Gemini for turning my hastily-typed notes into the table below, which summarizes AI’s strengths and weaknesses, as enumerated by Venkat at his talk:

AI is Great At… AI is Terrible At…

Handling Cognitive Load: It doesn’t have a “mind” to get overwhelmed by complex, intertwined code.

Reliability: It cannot yet create reliable code or documentation you can release without a human check.

Detecting Issues: It can snap-analyze a design and find bugs or architectural flaws like a missing game loop.

Contextual Truth: It can be “unbelievably smart” while being factually wrong, as seen in the Lua unpack example.

Generating Ideas: It is a powerful tool for ideation where correctness isn’t the primary burden.

Correctness: It might give itself a 9/10 for correctness while admitting its quality is poor due to “mutating sin”.

He also presented this slide:

The items listed:

  • Generate ideas (thumbs up)
  • Detect issues (thumbs up)
  • Analyze design (thumbs up)
  • Explain complex code (thumbs up)
  • Vomit code (thumbs up)
  • Create tests (maybe)
  • Create reliable code (thumbs down)
  • Provide reliable documentation (thumbs down)
  • Be an authentic learning tool (thumbs down)
  • Test your patience (double thumbs up)

The “Vasa” lesson: You can disagree with physics, but it bites back

To illustrate the danger of “irrational AI-clouded minds” in management trying to mis-apply AI, Venkat pointed to the Vasa, a Swedish warship built in the 1600s. The King of Sweden, powerful and arrogant, demanded a massive, overly-ornate ship. The king understood the power of bling, but not trivialities such as “seaworthiness” and “center of gravity”.

The shipwright didn’t have the courage (but an excellent sense of self-preservation) to tell the King he was wrong. The outcome was predictable: the ship sailed only 1,600 yards before sinking, where it stayed for 300 years until it was raised, restored, and now resides in a museum in Stockholm as an (admittedly beautiful) object lesson.

In 2026, we have “Kings of Sweden” at our workplaces, and they want AI solutions at any cost for the same ornamental reasons as the Vasa. But while you might disagree with the “physics” of software development, they’re often as hard to fight as real-world physics. If the cost of failure is high (loss of data/money/life) you can’t blindly jump in and build an AI Vasa.

Programming is thinking, not typing

Venkat pointed us to Wikipedia’s list of obsolete occupations. Will “programmer” end up on this list of over 100 obsolete jobs, like the town crier or the leech collector?. He argues the opposite:

  • Our job isn’t about typing characters; that is the least enjoyable part.
  • We’re thinkers, not typists. We write applications in our heads, not on the keyboard. The keyboard’s just a user interface.
  • Compilers didn’t kill development; they accelerated it. AI will do the same.

The real threat cognitive decline. The level of abstraction takes us far from the code that we spent time developing hard-earned critical thinking skills. If we lose the ability to think critically because we are delegating our thinking to machines, we are in trouble. As Venkat put it, the “illiterate” of the future are those who cannot think critically.

 

New rules of engagement

So, how do we handle the “Black Swan” event of AI?. Venkat suggests we look to the rules recently laid down by Linus Torvalds for the Linux kernel:

  • No AI Sign-offs: AI agents cannot add “sign-off by” tags.
  • Mandatory Attribution: It must be clear if code was assisted by AI.
  • Full Human Liability: The human submitter bears 100% responsibility for bugs, security flaws, and license compliance.

“Don’t trust and verify the heck out of it” is the new norm.

A framework for success

To remain relevant and responsible, Venkat outlined a five-step process for working with AI:

  • Understand the Problem: Don’t jump to AI immediately. Take the time to grasp the requirements.

  • Ideate: Spend time thinking about possible solutions and their consequences.

  • Activate AI: Only after ideation should you engage the tool.

  • Iterate: Work through the solutions provided.

  • Evaluate Critically: Verify everything before it goes anywhere near production.

We can’t go faster with ignorance. We need competency to gain speed and protect our reputations. AI is a powerful tool, but in the hands of a fool, it’s a dangerous one.

Let’s stop the fear-mongering. There is real work to be done, and it requires our minds more than ever.

Categories
Artificial Intelligence Conferences

Notes from Schutta and Vega’s Arc of AI Workshop, part 4: Own your career, learn how to learn, and don’t become a dependent

I caught the Fundamentals of Software Engineering in the Age of AI workshop yesterday at the Arc of AI conference’s workshop day, led by Nathaniel Schutta (cloud architect at Thoughtworks, University of Minnesota instructor) and Dan Vega (Spring Developer Advocate at Broadcom, Java Champion).

Nate and Dan are the co-authors a book on the subject, Fundamentals of Software Engineering, and they’re out here workshopping the ideas with developers who are living through the same AI-saturated moment we all are.

Fair warning: this post is long. The session was dense, the conversation was good, and I took a lot of notes.

Here’s part four of several notes from the all-day session; you might want to get a coffee for this one.

Here are links to my previous notes:


The afternoon session of this workshop shifted away from the technical and toward the personal: career management, professional skill-building, how to actually learn things in an industry that never stops changing, and how to stay sane while it’s all happening. Nathaniel Nate carried most of this section alongside Dan Dan, with some sharp contributions from the audience. It was a good room for this kind of conversation: people who’d been in the industry a while, who’d seen waves come and go, trying to figure out what the current wave means for them specifically.

You are your own career manager, and that’s non-negotiable

Dan opened by acknowledging what a lot of people in the room were probably thinking: the career path they imagined when they started — get good at coding, keep getting better at coding, code until retirement — is not the only path, and for a lot of people it turned out not to be the right one either.

His framework for figuring out what direction to go: pay attention to what actually energizes you when you’re working. What problems do you want to solve? Do you prefer building interfaces or working with data and algorithms? Does debugging a gnarly problem feel like a puzzle you want to crack, or a tax you want to stop paying? Do you like the creative side of software, or the precision and correctness side? Side projects, he argued, are one of the best ways to run these experiments without quitting your job to do it.

The paths he outlined go well beyond the traditional “developer or manager” binary: software architect, staff engineer, engineering manager, technical product manager, developer advocate (his own role), sales engineer, and the increasingly relevant entrepreneur. Each has a different center of gravity, and none of them requires you to stop being technical.

His advice for navigating toward one of these: walk backwards from where you want to be. If you want to be an architect in five years, figure out what that role actually requires, then map it back to what you should be doing in years three to five, and years one to two. You’re already doing the mental motion of decomposing complex problems. Apply it to your own career.

Nate added the practical mechanics: use your personal development budget. A lot of people don’t, often out of a quiet fear of standing out or seeming like they’re trying too hard. He was blunt about this: “If you’ve got it and you’re not using it, you’re leaving part of your comp on the table. Any good manager should be thrilled you want to get better at your job.”

The technology radar: a personal framework for staying current without losing your mind

One of the more immediately actionable tools the workshop introduced was the Technology Radar concept. It’s familiar to a lot of people from Thoughtworks’s public-facing version, but here applied personally rather than organizationally.

The idea: organize technologies and techniques into four buckets. Adopt (things you’re currently using and mastering). Trial (things you’re actively experimenting with). Assess (things you’re watching but not diving into yet). Hold (things you’re deliberately not learning right now, even if people keep telling you to).

The audience exercise around this got interesting quickly. People shared their lists. “Rust on hold because Go is a higher priority at my company” was one contribution — and that’s exactly the right way to think about it. Your radar isn’t the same as someone else’s radar. Boris at Anthropic running five parallel Claude Code instances in his terminal doesn’t mean that’s the right workflow for you. Dan was emphatic: “Don’t see what someone else is doing and feel like you’re behind. You’re not.”

The schedule layer Nate added was useful: once you’ve identified something you want to learn, think through the cadence. Weekly, maybe a podcast or a short video. Monthly, maybe a meetup. Quarterly, maybe a deeper hands-on session. Annually, maybe a conference. Small, consistent investment over time beats cramming every time.

Record your wins, and be specific about the numbers

This was a section I wish someone had told me about fifteen years ago, and I suspect most people in the room felt similarly.

Dan’s recommendation: maintain a running wins document. Not elaborate. Not ceremonial. Just a note in Apple Notes or Google Docs where you record things you accomplished, feedback you received, skills you built, presentations you gave. The point is to have the material when you need it — annual reviews, promotion conversations, job searches, award nominations.

The key, and this is where most people go wrong: be specific, and attach numbers wherever possible.

“I improved performance in our flagship application” is forgettable. “I improved performance by 25% by implementing virtual threads” is a data point. “I reduced memory usage across a thousand instances over 300 apps” is a business case. The person making decisions about your raise or your promotion can’t make that case for you if you don’t give them the ammunition. Your manager is not necessarily keeping track of your contributions with the same level of care you are.

Nate extended this with a point about visibility: you want your manager to be able to walk into a room and tell a specific story about you. Not “Nate’s a solid engineer,” but “Nate’s Azure lunch and learn series pulled 200 people in the first session and our Chief Strategy Officer shared the metrics upward.” When your name comes up in rooms you’re not in, you want there to be a story attached to it — and that story needs to be true, specific, and ideally tied to a dollar amount or a measurable outcome.

His framing: “If your boss can say ‘Dan saved us 1.8 million dollars last year in Cloud costs,’ it’s a lot harder to put Dan on the non-regrettable attrition list.”

How we actually learn things (and why most approaches don’t work)

Nate took over for the learning science portion, and it was some of the best material of the day.

The core claim: in order to remember something, it needs to be elaborate, meaningful, and have context. Which is why story is so powerful — stories create context and meaning around facts that would otherwise evaporate. He mentioned that an AV technician once stopped him after a talk specifically to say she noticed he told stories, because most speakers just recite facts, and the stories were why she stayed engaged. He took that as confirmation of what he already believed: stories are the actual unit of memory, not information.

Spaced repetition matters. Brute-forcing your way through something until you think you’ve got it and then never returning to it is how you lose it. The Forgetting Curve is real. Little bits over time beats big chunks all at once. This is why blocking regular learning time on your calendar — Friday afternoons, Tuesday lunches, fifteen minutes of morning coffee before your day explodes — actually works where “I’ll get to it eventually” does not.

He was also honest about the limits of memory: forgetting is normal, not a personal failing. He now uses Gemini to re-explain things like OSI layers that he learned thirty years ago and hasn’t needed day-to-day. “I don’t freaking deal with it constantly. Getting a nice, concise refresher is fine, as long as I verify when it matters.”

The Dreyfus model of skill acquisition came up here, and it’s worth understanding. Five stages: novice (needs explicit recipes, follow the steps exactly), advanced beginner (can start combining recipes), competent (can troubleshoot, begins to self-correct), proficient (can self-correct in the moment), expert (operates on intuition, can’t always explain what they’re doing). The punchline: most developers don’t have ten years of experience; they have one year of experience ten times. And LLMs are permanently stuck somewhere around advanced beginner. They can combine recipes. They will never have intuition, the felt sense that something is wrong before you can articulate why.

Rules are essential for novices. Rules kill experts. A slightly different thing, checklists, are powerful across all levels, as the aviation and surgery examples illustrated. The distinction matters for how you think about AI-assisted development: AI needs guardrails because it can’t develop the intuition to know when to break the rules. You set the guardrails. That requires knowing the rules well enough to encode them.

You cannot read it all. Stop trying.

The death of a thousand subscriptions. Nate described the pile of unread magazines accumulating on his kitchen island and his wife’s gentle suggestion that most of it should go in the recycling as a near-perfect metaphor for the state of our industry’s information environment.

His rough estimate: the amount of content added to YouTube while the workshop was running would take more than a week to watch straight through, even without eating or sleeping. The amount of content added to the internet while they were in the room is unfathomable. Heat death of the universe is going to happen before you read it all.

His solution: cultivate a network of trusted people who read different things and share the signal. He and Glenn, he mentioned, exchange texts constantly, each person watching a different slice of the landscape, forwarding things worth attention. If something is genuinely important, it will hit you from multiple directions regardless. You don’t need to be first to every wave.

This connects back to the Technology Radar: FOMO is real, but you cannot surf every wave. Being a fast follower, letting other people take version 1.0 and joining at 1.1 once the shakeout has happened, is a completely legitimate strategy. The people who are struggling right now, Nate suggested, are the ones saying “nope, not my thing, not engaging,” and not the ones who are choosing deliberately where to focus.

On AI, anxiety, and not feeling like you’re behind

Dan closed with a section that felt necessary: acknowledgment that the current moment is genuinely overwhelming, and that AI fatigue is real even if nobody talks about it.

He referenced an Andrej Karpathy tweet about feeling like a powerful alien tool had been handed to everyone simultaneously without a manual, while a magnitude-9 earthquake is rocking the profession. Nobody knows how to hold it yet. The expectation that developers should now be 10x as productive is not a reality for most people. They’re still learning the tools, still figuring out what works, still dealing with the new cognitive load of evaluating AI output on top of doing the actual work.

His practical guidance on where to start, because the list of things you’re “supposed to know” (MCP, evals, prompt chaining, vibe coding, function calling, embeddings, constitutional AI, token sampling, and so on) is legitimately intimidating:

Start with playing with multiple models. Try the same prompt in Claude, Gemini, GPT. Notice the differences. That alone builds intuition. Then understand context and memory. What are the limitations of these systems, and how do you work within them? Then tools: the idea that you can give an LLM access to actions in the world. Then MCP servers as a way of packaging that capability. Then, eventually, agents and agentic workflows. But not before the foundational layers make sense.

And critically: don’t let someone else’s advanced workflow make you feel behind. The Boris-at-Anthropic-running-five-Claude-Code-instances workflow exists in a context you don’t share. Build your own relationship with these tools from wherever you actually are.

The closing argument

Nate closed the day, and I want to quote him here as directly as I can from my notes, because the framing was right:

“Fundamentals will always serve you well. I am adamantly of the opinion that they are even more important now than they were five years ago, and I thought they were pretty damn important five years ago when we started this book.”

Two mindsets available to you: define yourself by what you’ve done in the past, or define yourself by the problems you’re going to solve in the future. Reactive or proactive. Either way, change is coming. It always has been. He’s been doing this for almost thirty years and has not yet seen an instance where the industry just… stopped. The pendulum swings, the landscape shifts, and the people who navigate it best are the ones who maintain the fundamentals while staying curious enough to pick up the new tools.

He admitted he’s nervous about the cohort of people entering the industry right now: the steep drop in junior hiring, the Stanford placement numbers, the companies that have convinced themselves AI obsoletes entry-level work. But he thinks the snapback is coming. We need juniors to become seniors. Seniors don’t appear from nowhere. At some point, that math becomes undeniable.

His last line stuck with me: “I’d rather be the lead sled dog, because at least the view changes.”