Categories
Artificial Intelligence Hardware What I’m Up To

Talking about HP’s ZGX Nano on the “Intelligent Machines” podcast

On Wednesday, HP’s Andrew Hawthorn (Product Manager and Planner for HP’s Z AI hardward) and I appeared on the Intelligent Machines podcast to talk about the computer that I’m doing developer relations consulting for: HP’s ZGX Nano.

You can watch the episode here. We appear at the start, and we’re on for the first 35 minutes:

A few details about the ZGX Nano:

  • It’s built around the NVIDIA GB10 Grace Blackwell “superchip,” which combines a 20-core Grace CPU and a GPU based on NVIDIA’s Blackwell architecture.

  • Also built into the GB10 chip is a lot of RAM: 128 GB of LPDDR5X coherent memory shared between CPU and GPU, which helps avoid the kind of memory bottlenecks that arise when the CPU and GPU each have their own memory (and usually, the GPU has considerably less memory than the CPU).
NVIDIA GB10 SoC (system on a chip).
  • It can perform up to about 1000 TOPS (trillions of operations per second) or 1015 operations per second and can handle model sizes of up to 200 billion parameters.

  • Want to work on bigger models? By connecting two ZGX Nanos together using the 200 gigabit per second ConnectX-7 interface, you can scale up to work on models with 400 billion parameters.

  • ZGX Nano’s operating system in NVIDIA’s DGX OS, which is a version of Ubuntu Linux with additional tweaking to take advantage of the underlying GB10 hardware.

Some topics we discussed:

  • Model sizes and AI workloads are getting bigger, and developers are getting more and more constrained by factors such as:
    • Increasing or unpredictable cloud costs
    • Latency
    • Data movement
  • There’s an opportunity to “bring serious AI compute to the desk” so that teams can prototype their AI applications  and iterate locally
  • The ZGX Nano isn’t meant to replace large datacenter clusters for full training of massive models, It’s aimed at “the earlier parts of the pipeline,” where developers do prototyping, fine-tuning, smaller deployments, inference, and model evaluation
  • The Nano’s 128 gigabytes of unified memory gets around the issues of bottlenecks with distinct CPU memory and GPU memory allowing bigger models to be loaded in a local box without “paging to cloud” or being forced into distributed setups early
  • While the cloud remains dominant, there are real benefits to local compute:
    • Shorter iteration loops
    • Immediate control, data-privacy
    • Less dependence on remote queueing
  • We expect that many AI development workflows will hybridize: a mix of local box and cloud/back-end
  • The target users include:
    • AI/ML researchers
    • Developers building generative AI tools
    • Internal data-science teams fine-tuning models for enterprise use-cases (e.g., inside a retail, insurance or e-commerce firm).
    • Maker/developer-communities
  • The ZGX Nano is part of the “local-to-cloud” continuum
  • The Nano won’t cover all AI development…
    • For training truly massive models, beyond the low hundreds of billions of parameters, the datacenter/cloud will still dominate
    • ZGX Nano’s use case is “serious but not massive” local workloads
    • Is it for you? Look at model size, number of iterations per week, data sensitivity, latency needs, and cloud cost profile

One thing I brought up that seemed to capture the imagination of hosts Leo Laporte, Paris Martineau, and Mike Elgan was the MCP server that I demonstrated a couple of months ago at the Tampa Bay Artificial Intelligence Meetup: Too Many Cats.

Too Many Cats is an MCP server that an LLM can call upon to determine if a household has too many cats, given the number of humans and cats.

Here’s the code for a Too Many Cats MCP server that runs on your computer and works with a local CLaude client:

from typing import TypedDict
from mcp.server.fastmcp import FastMCP

mcp = FastMCP(name="Too Many Cats?")

class CatAnalysis(TypedDict):
    too_many_cats: bool
    human_cat_ratio: float  

@mcp.tool(
    annotations={
        "title": "Find Out If You Have Too Many Cats",
        "readOnlyHint": True,
        "openWorldHint": False
    }
)
def determine_if_too_many_cats(cat_count: int, human_count: int) -> CatAnalysis:
    """Determines if you have too many cats based on the number of cats and a human-cat ratio."""
    human_cat_ratio = cat_count / human_count if human_count > 0 else 0
    too_many_cats = human_cat_ratio >= 3.0
    return CatAnalysis(
        too_many_cats=too_many_cats,
        human_cat_ratio=human_cat_ratio
    )

if __name__ == "__main__":
    # Initialize and run the server
    mcp.run(transport='stdio')

I’ll cover writing MCP servers in more detail on the Global Nerdy YouTube channel — watch this space!

Categories
Artificial Intelligence Meetups Tampa Bay What I’m Up To

Scenes from last night’s “Architecture Patterns for AI-Powered Applications” meetup with Michael Carducci

Last night, we had a “standing room only” crowd at Michael Carducci’s presentation, Architecture Patterns for AI-Powered Applications, which was held jointly by Tampa Java User Group, Tampa Devs, and Tampa Bay Artificial Intelligence Meetup (which Anitra and I co-organize).

This article is a summary of the talk, complete with all the photos I took from the front row and afterparty.

The event was held at Kforce HQ, home of Tampa Bay’s meetup venue with the cushiest seats (full disclosure: I’m a Kforce consultant employee), and the food was provided by the cushiest NoSQL database platform, Couchbase!

Michael Carducci is many things: engaging speaker, funny guy, professional magician, and (of course) a software architect.

While he has extensive experience building systems for Very Big Organizations, the system-building journey he shared was a little more personal — it was about his SaaS CRM platform for a demographic he knows well: professional entertainers. He’s been maintaining it over the past 20 years, and it served as the primary example throughout his talk.

Michael’s central theme for his presentation was the gap between proof-of-concept AI implementations and production-ready systems, and it’s a bigger gap than you might initially think.

He emphasized that while adding basic AI functionality might take only 15 minutes to code, it’s a completely different thing to create a robust, secure, and cost-effective production system. That requires  additional careful architectural consideration.

Here’s a quote to remember:

“architecture [is the] essence of the software; everything it can do beyond providing the defined features and functions.”

— “Mastering Software Architecture” by Michael Carducci

A good chunk of the talk was about “ilities” — non-functional requirements that become architecturally significant when integrating AI.

These “ilities” are…

  • Cost – AI API costs can escalate quickly, especially as models chain together
  • Accuracy – Dealing with hallucinations and non-deterministic outputs

  • Security – Preventing prompt injection and model jailbreaking
  • Privacy – Managing data leakage and training data concerns

  • Latency & Throughput – Performance impacts of multiple model calls
  • Observability – Monitoring what’s happening in AI interactions

  • Simplicity / Complexity – Managing the increasing technical stack

And then he walked us through some patterns he encountered while building his application, starting with the “send an email” functionality:

The “send an email” function has an “make AI write the message for me” button, which necessitates an AI “guardrails” pattern:

And adding more AI features, such as having the AI-generated emails “sound” more like the user by having it review the user’s previous emails, called for using different architectural patterns.

And with more architectural patterns come different tradeoffs.

In the end, there was a progression of implementations from simple to increasingly complex. (It’s no wonder “on time, under budget” is considered a miracle these days)…

Stage 1: Basic Integration

  • Simple pass-through to OpenAI API
  • Minimal code (15 minutes to implement)
  • Poor security, no observability, privacy risks

Stage 2: Adding Guardrails

  • Input and output guardrails using additional LLMs
  • Prompt templates to provide context
  • Triple the API costs and latency

Stage 3: Personalization

  • Adding user writing style examples
  • Building data pipelines to extract relevant context
  • Dealing with token optimization challenges

Stage 4: Advanced Approaches

  • Fine-tuning models per customer
  • Context caching strategies
  • Hosting internal LLM services
  • Full MLOps implementation

This led to Michael talking about doing architecture in the broader enterprise context:

  • Organizations have fragmented information ecosystems
  • Oragnizational data is spread across multiple systems after mergers and acquisitions
  • Sophisticated information retrieval has to be implemented before AI can be effective
  • “Garbage in, garbage out” still applies — in fact, even more so with AI

He detailed his experience building an 85-microservice pipeline for document processing:

  • Choreographed approach: Microservices respond to events independently
  • Benefits: Flexibility, easy to add new capabilities
  • Challenges: No definitive end state, potential for infinite loops, ordering issues
  • Alternative: Orchestrated approach with a mediator (more control but less flexibility)

He could’ve gone on for longer, but we were “at time,” so he wrapped up with some concepts worth our exploring afterwards:

  • JSON-LD: JSON with Linked Data, providing context to structured data
  • Schema.org: Standardized vocabulary for semantic meaning
  • Graph RAG: Connecting LLMs directly to knowledge graphs
  • Hypermedia APIs: Self-describing APIs that adapt without redeployment

He also talked about how models trained on JSON-LD can automatically understand and connect data using standardized vocabularies, enabling more sophisticated AI integrations.

What’s a summary of a talk without some takeaways? here are mine:

  • Architecture is fundamentally about trade-offs! Every decision that strengthens one quality attribute weakens others; you need to decide which ones are important for the problems you’re trying to solve.
  • Effective architects need breadth over depth. Instead of being “T-shaped,” which many people call the ideal “skill geometry” for individual developers, the architect needs to be more of a “broken comb.”
  • AI integration is more than just functionality. It’s architecturally significant and requires careful planning
  • Standards from the past are relevant again! Like Jason Voorhees, they keep coming back. Technologies like RDF and JSON-LD, once considered ahead of their time, are now crucial for AI.
  • The chat interface is just the beginning! Yes, it’s the one everyone understands because it’s how the current wave of AI became popular, but serious AI integration requires thoughtful architectural patterns.

Here’s the summary of patterns Michael talked about:

  • Prompt Template Pattern
  • Guardrails Pattern
  • Context-enrichment & Caching
  • Composite Patterns
  • Model Tuning
  • Pipeline Pattern
  • Encoder-decoder pattern
  • Choreographed and Orchestrated Event-driven Patterns
  • RAG
  • Self-RAG
  • Corrective-RAG
  • Agentic RAG
  • Agent-Ready APIs

And once the presentation was done, a number of us reconvened at Colony Grill, the nearby pizza and beer place, where we continued with conversations and card tricks.

My thanks to Michael Carducci for coming to Tampa, Tampa JUG and Ammar Yusuf for organizing, Hallie Stone and Couchbase for the food, Kforce for the space (and hey, for the job), and to everyone who attended for making the event so great!

Categories
Artificial Intelligence Hardware What I’m Up To

Specs for NVIDIA’s GB10 chip, which powers HP’s ZGX Nano G1n AI workstation

I’m currently working with Kforce as a developer relations consultant for HP’s new tiny desktop AI powerhouse, the ZGX Nano (also known as the ZGX Nano G1n). If you’ve wondered about the chip powering this machine, this article’s for you!

The chip powering the ZGX Nano is NVIDIA’s GB10, a combination CPU and GPU where “GB” stands for “Grace Blackwell.” The chip’s two names stand for each of its parts…

Grace: The CPU

The part named “Grace” is an ARM CPU with 20 cores, arranged in ARM’s big.LITTLE (DynamIQ) architecture, which is a mix of different kinds of cores for a balance of performance and efficiency:

    • 10 Cortex-X925 cores. These are the “performance” cores, which are also sometimes called the “big cores.” They’re designed for maximum single-thread speed, higher clock frequencies, and aggressive out-of-order execution, their job is to handle bursty, compute-intensive workloads such as gaming and rendering, and on the ZGX Nano, they’ll be used for AI inference.
    • 10 Cortex-A725 cores. These are the “efficiency” cores, which are sometimes called the “little cores.” They’re designed for sustained performance per watt, running at lower power and lower clock frequencies. Their job is to handle background tasks, low-intensity threads, or workloads where power efficiency and temperature control matter more than peak speed.

Blackwell: The GPU

The part named “Blackwell’ is NVIDIA’s GPU, which has the following components:

    • 6144 neural shading units, which act as SIMD (single-instruction, multiple data) processors that act as “generalists,” switching between standard graphics math and AI-style operations. They’re useful for AI models where the workloads aren’t uniform, or with irregular matrix operations that don’t map neatly into 16-by-16 blocks.
    • 384 tensor cores, which are specialized matrix-multiply-accumulate (MMA) units. They perform the most common operation in deep learning, C = A × B + C, across thousands of small matrix tiles in parallel. They do so using mixed-precision arithmetic, where there are different precisions for inputs, products, and accumulations.
    • 384 texture mapping units (TMUs). These can quickly sample data from memory and do quick processing on that data. In graphics, these capabilities are use to resize, rotate, and transform bitmap images, and then paint them onto 3D objects. When used for AI, these capabilities are used to perform bilinear interpolation (used by convolutional neural network layers and transformers) and sample AI data.
    • 48 render output units (ROPs). In a GPU, the ROPs are the final stage in the graphics pipeline — they convert computed fragments into final pixels stored in VRAM. When used for AI, ROPs provide a way to quickly write the processing results to memory and perform fast calculations of weighted sums (which is an operation that happens with all sorts of machine learning).

128 GB of unified RAM

There’s 128GB of LPDDR5X-9400 RAM built into the chip, a mobile-class DRAM type designed for high bandwidth and energy efficiency:

  • The “9400” in the name refers to its memory bandwidth (the speed at which the CPU/GPU can move data between memory and on-chip compute units) of 9.4 Gb/s per pin. Across a 256-bit bus, this provides almost 300 GB/s peak bandwidth

  • LPDDR5X is more power-efficient than HBM but slower; it’s ideal for compact AI systems or edge devices (like the ZGX Nano!) rather than full datacenter GPUs.

As unified memory, the RAM is shared by both the Grace (CPU) and Blackwell (GPU) portions of the chip. That’s enough memory for:

  • Running large-language-model inference up to 200 billion parameters with 4-bit weights

  • Medium-scale training or fine-tuning tasks

  • Data-intensive edge analytics, vision, or robotics AI

Because the memory is unified, it means that the CPU and GPU share a single physical pool of RAM, which eliminates explicit data copies.

The RAM is linked to the CPU and GPU sections using NVIDIA’s C2C (chip-to-chip) NVLINK , their low-power interconnector that lets CPU/GPU memory traffic move at up to 600 GB/s aggregate. That’s faster than PCIe 5! This improves latency and bandwidth for workloads that constantly exchange data between CPU preprocessing and GPU inference/training kernels.

Double the power with ConnectX

If the power of a single ZGX Nano wasn’t enough, there’s NVIDIA’s ConnectX technology, which is based on a NIC that provides a pair of 200 GbE ports, enabling the chaining/scaling out of workload across  two GB10-based units. The doubles the processing power, allowing you to run models with up to 400 billion parameters!

The GB10-powered ZGX Nano is a pretty impressive beast, and I look forward to getting my hands on it!

 

Categories
Artificial Intelligence Current Events Tampa Bay What I’m Up To

“Back to the Future of Work” covered in Tampa Bay Business & Wealth!

Last week’s panel event, Back to the Future of Work, was featured in Tampa Bay Business and Wealth!

Taking place at the Reliaquest Auditorium in Tampa startup space Embarc Collective, the event featured a discussion about different ways to think about how we measure the value of work in the new world of AI, remote work, ubiquitous internet, and economic uncertainty.

On the panel were:

Check out the article, From hours to outcomes: Tampa panel explores the future of work in an AI world!

 

Categories
Artificial Intelligence Meetups Tampa Bay What I’m Up To

I’ll be speaking at the Indo-US Chamber of Commerce’s “Practical AI” panel — Tuesday, September 16 at 6:30 p.m.!

Promotional poster for “Practical AI: Turning Technology into Business Value” featuring moderator Sam Kasimalla and panelists Joey de Villa, Sudeep Sarkar, and Priya Balasundaram.
Click the image to reserve your ticket!

Want to learn how AI can be used in your business or career, meet key people from Tampa Bay’s dynamic South Asian community, and enjoy some Indian food? Then you’ll want to attend the Indo-U.S. Chamber of Commerce’s panel, Practical AI: Turning Technology into Business Value, taking place next Tuesday, September 16th at 6:30 p.m. at Embarc Collective!

The tl;dr

Why Attend?

  • Learn how AI can be applied beyond theory to solve real business challenges.

  • Hear from leaders in academia, entrepreneurship, and applied technology.

  • Network with Tampa Bay’s growing AI and tech community.

  • Enjoy complimentary Indian cuisine while connecting with innovators and peers.

Speakers & Moderator

Register for this event!

Once again, this is a free event, and there’ll be a complimentary Indian dinner. Register now!

Categories
Artificial Intelligence Tampa Bay What I’m Up To

Scenes from Fractio’s “Back to the Future of Work” event (September 4, 2025)

Last night, Fractio hosted Back to the Future of Work, an event built around a panel discussion about changing the way we assign value to work in the age of AI.

I arrived early to set up my computer to run the pre-panel video…

…and check that the panel seats and mics were set up properly…

…then had some quick breakfast-for-dinner (which was symbolic of how our understanding of paying for time was about to be turned upside-down):

The event took place at Embarc Collective, who’d set up the room in a way that would let people comfortably eat “brinner” while watching the panel…

…and when the doors open, a room-packing crowd came in.

After a little time to let people get their food, breakfast cocktail, and mingle, they were seated…

…and the panel got under way!

Fatin Kwasny, organizer of the panel and Fractio CEO, moderated…

…and the panel got started.

From left to right, the panelists were:

I enjoyed participating on the panel, and it appears that my fellow panelists did as well! I also heard from many audience members who found the event informative and entertaining.

Thanks to Florida CFO Group for sponsoring breakfast-for-dinner and breakfast cocktails…

…and to Byron Reese for providing us with copies of his book, We are Agora, to give to attendees!

 

Categories
Hardware What I’m Up To

The room where it happens

Joey de Villa’s home office. It has a shiny hardwood floor, two desks in an L-shaped configuration, monitors, keyboards, synthesizers, and other gear. A large octopus art piece looms over the back wall.
Tap to view at full size.

For the curious, here’s a recent pic of my home office, a.k.a. “The Fortress of Amplitude.” The gear configuration changes every now and then, but it generally looks like this. It’s where the magic happens!