Last week, Anitra and I attended both the Dev/Nexus conference and its companion conference, Advantage, an AI conference for CTOs, CIOs, VPs of Engineering, and other technical lead-types, which took place the day before Dev/Nexus. My thanks to Pratik Patel for the guest passes to both conferences!
I took copious notes and photos of all the Advantage talks and will be posting summaries here. This set of notes is from the third talk, Language Stacks and Gen AI, presented by Rod Johnson.
Rod is a developer, author, investor and entrepreneur. He has authored several best-selling books on Java EE. He is the creator of the Spring Framework and was cofounder and CEO of SpringSource. He has served on the board of Elastic, Neo Technologies, Apollo, Lightbend and several other successful companies. He is presently developing a structured RAG system using Spring and Kotlin.
And here’s the abstract of his talk:
Python is the language of data science and dominant in AI research. However, it is not the language of enterprise apps, and there are good reasons for this. In this session, Rod will discuss when to use what language and stack for AI success in enterprise. He’ll discuss the key adjacencies for success: LLMs, existing data and business logic, and how to choose what language, stack and framework for a particular problem.
My notes from Rod’s talk are below.
Your existing enterprise systems are an asset, not a liability
Rod opened with something that probably felt like a relief to many in the room: a clear-eyed argument that the overwhelming pace of AI change is not a reason to abandon what your organization has already built. Enterprise systems represent years of accumulated business logic, domain knowledge, and battle-tested reliability. These things change slowly, and in this case, the cliche is true: that’s not a bug, but a feature. The pressure to throw out existing applications and start fresh with AI-native rewrites is, in Rod’s view, is not just misguided, but reckless.
He was equally direct about the organizational risk of letting AI enthusiasm displace experienced people. Every major technology wave produces a class of self-declared experts who rush in and crowd out the engineers who actually understand the business. Domain expertise doesn’t get replaced by a new framework. Instead, it gets more valuable, because it’s the thing that makes AI systems accurate and useful rather than fluently wrong.
The message to leaders was clear: protect your people, and make sure your AI strategy grows out of your existing institutional knowledge rather than treating it as an obstacle.
Personal assistants and business processes are fundamentally different. Stop conflating them.
One of the sharpest distinctions in Rod’s talk was between AI as a personal productivity tool and AI embedded in enterprise business processes. The personal assistant category (for example: chatbots, coding agents, tools like Cursor) operates under forgiving conditions. If a coding agent produces bad output, a developer catches it before it reaches production. The feedback loop is tight, human oversight is immediate, and the cost of failure is manageable. This is why maximizing agent autonomy makes sense in that context.
Business processes are an entirely different environment. Rod pointed to the Air Canada chatbot case, where the airline told a customer it would honor a discounted bereavement fare and then tried to disclaim responsibility when the customer held them to it. Unlike a coding error that gets caught in review, a business process error engages real customers, real employees, and real legal and financial consequences. You can’t roll back a workflow the way you can roll back a pull request. The asymmetry between these two domains is enormous, yet most of the noise driving enterprise AI strategy comes from the personal assistant space, where demos are impressive and the failure modes are invisible.
Rod is clearly frustrated with this conflation, and you should be too. The loudest voices in the generative AI conversation are the ones driving media coverage and executive attention, and they’re overwhelmingly people with no background in or interest in enterprise software. Leaders who let those voices set their enterprise AI agenda are optimizing for demo impressiveness rather than production reliability, and that’s a recipe for expensive disappointment.
Structure (almost always) beats natural language
Perhaps the most technically counterintuitive point in Rod’s talk was his argument that interacting with LLMs in natural language is often the wrong approach. Yes, LLMs are trained on vast amounts of natural language text, but the underlying Transformer architecture is fundamentally about predicting tokens. It’s not inherently about language at all. The seductive thing about natural language interfaces is that you can demo them impressively in minutes. The unsettling thing is that natural language is ambiguous, extremely difficult to test, and essentially opaque when something goes wrong.
Rod’s alternative is to structure your interactions with LLMs as much as possible: structured inputs, structured outputs, and as little free-form natural language in the critical paths as you can manage. His thought experiment about what a bank knows about its customers illustrated the point neatly. The vast majority of the high-value data a bank holds — transactions, account balances, product relationships — is already highly structured. The fringe cases that exist in text (notes from a branch visit, a customer service transcript) are real but marginal. Adding generative AI to that environment should leverage the structure that’s already there, not dissolve it into a sea of markdown and free text.
The practical consequences of over-relying on natural language are significant. Systems built around unstructured text accumulate context rapidly, which drives token counts (and therefore costs) through the roof. They become increasingly unpredictable as that context grows, and when they produce wrong outputs, there’s no clean way to explain or audit what went wrong. Rod’s point, reinforced by his analysis of OpenAI’s Operator product, is that even sophisticated AI systems hit a hard ceiling when they’re built on a foundation of loose text rather than structured data and deterministic logic.
Your language stack probably shouldn’t change, but your thinking about AI layers should
Rod was characteristically direct on the language debate that consumes a lot of oxygen in AI developer communities: Python is not magical for building enterprise AI agents, and the fact that most academic AI research is published in Python is not a reason for a Java or C# shop to rewrite everything. There are reasons your enterprise applications were written in the languages they were written in: stability, ecosystem maturity, existing tooling, and team expertise, and those reasons haven’t changed. What sits in the generative AI layer is substantially shallower than your core application logic, and the risk-reward calculation for rewriting those core systems in a trendier language is deeply unfavorable.
That said, Rod drew a reasonable distinction: Python genuinely does have advantages for certain tasks like document processing, model fine-tuning, and data ingestion pipelines, where the research community’s tooling is simply more mature. The error isn’t using Python for those things. It’s letting data science people with a Python background architect the entire enterprise AI strategy, because data science and enterprise AI application development require genuinely different skills. Conflating them leads to frameworks that are academically interesting but operationally fragile when exposed to real enterprise requirements around security, observability, testability, and integration.
The practical implication for enterprise leaders is that you need a good agent framework. Rod’s example was Embabel, a framework his company developed. Agent frameworks should feel like a natural extension of your existing stack. It should play nicely with Spring, respect your existing domain model, integrate with your existing observability tooling, and support unit testing at every level. You shouldn’t have to introduce an entirely new operational paradigm just to add generative AI capabilities. Adjacency to your existing systems is where the value gets unlocked, and any framework that treats your existing applications as irrelevant legacy to be worked around is solving the wrong problem.























Tampa Bay AI Meetup is a community partner of Arc of AI, and we can help you save $50 off the ticket price! Just use the discount code 




































































