HP ZBook Ultra G1a: Hands-on with a Powerful 14-inch AI Workstation
This mobile workstation can tackle massive LLMs thanks to 128GB of unified memory.Reviews

Mobile computing often gets sidelined in AI conversations for one simple reason: a lack of memory. For developers running high-end LLMs, the bottleneck has rarely been compute cycles -- it's been RAM (or VRAM).
The HP ZBook Ultra G1a is a mobile workstation that addresses this issue by leaning on AMD’s Ryzen AI Max+ PRO 395 processor, a chip that uses a unified memory architecture, and then pairing with a massive amount of memory.
This is not a general-purpose laptop. It's a $2,999 specialized workstation designed to run large language models (LLMs) locally, untethered from cloud APIs or server racks, thanks to its memory configuration. The system I tested was equipped with 128GB of LPDDR5x-8533 RAM, and importantly, its unified memory architecture allows the system to dynamically allocate up to 96GB of its total system RAM as VRAM for the GPU. (Before this, even a high-end 14-inch mobile workstation was often limited to 16GB of VRAM.)
For local AI workflows, this is a huge shift. Running a 70-billion parameter model (such as Llama 3 70B) typically requires over 40GB of VRAM -- an impossible figure for standard mobile GPUs. On the ZBook Ultra G1a, however, I was able to load that model, and many other large ones, directly into memory with room to spare for the OS and other system tasks.
Note that if you're looking for the best models to download and run if you only have 16GB/32GB/64GB of memory, we've got a detailed guide to the best LLMs for every common memory capacity right here.
Strictly Strix
Besides all that memory, the real heart of this workstation is the AMD Ryzen AI Max+ PRO 395, also known by the codename Strix Halo. This is AMD's star APU (Accelerated Processing Unit) that combines 16 high-performance Zen 5 CPU cores with an integrated GPU, the Radeon 8060S. Combining a CPU and GPU into a single chip eliminates some of the thermal and power overhead needed when these are discrete parts.
And while the CPU and GPU handle heavy training and high-bandwidth inference tasks, the processor also includes an NPU (Neural Processing Unit), rated at 50 TOPS (Tera Operations Per Second). In real-world terms, the NPU takes on persistent, low-power AI tasks, such as real-time audio processing or background effects, freeing up the main GPU cores for heavy lifting.
Portable power
The ZBook Ultra G1a, despite its power, has an understated design that looks and feels like a typical 14-inch professional laptop. The gray all-aluminum enclosure weighs about 3.5 pounds and offers a standard USB-C port, two Thunderbolt 4/USB-C ports, one USB-A port, and HDMI 2.1, along with a pleasingly spacious keyboard and large touchpad.

The display is surprisingly great for a non-consumer/non-gaming system -- it's a 14-inch 2.8K (2,880,1,800) OLED panel with a 120Hz refresh rate. It's also a touchscreen, which was another surprising extra.
The battery is a 74.5 Wh model, which isn't the biggest, but it's paired with a 140W power brick, which makes sense given the power needed to run the Strix Halo hardware. Try plugging the system into a lower-rated typical laptop USB-C power brick and you may get a warning that you're not giving it enough juice for maximum performance.
Hands-on with AI
To test drive a system like this, besides our standard benchmarks, we ran several local LLM models on the ZBook Ultra G1a, primarily through LM Studio, which is a frontend GUI for downloading and running models locally.
For example, I was able to easily download, install, and run OpenAI's gpt-oss-120b model, which has 117 billion total parameters and takes up nearly 60GB of RAM. In one of my sample math/reasoning questions, it ran 12.21 tokens per second, with a TTFT (time to first token) of 2.18 seconds.

But that's a model that would be difficult to run on almost any consumer-level (or even gaming) PC. Asking the same test question with OpenAI's gpt-oss-20b, a much smaller, easier to run model, gave us 16.41 tokens per second but a slower 3.32 TTFT.
In my original hands-on video with the ZBook Ultra G1a, shot some months ago before some of the most popular current LLM versions were available, I asked similar questions of both large and small models (Gemma 3 12B and Llama 3.3 70B), to similar results -- running a larger LLM typically gives more detailed, more accurate, results, at the cost of increased memory usage.
A quick note on gaming. No, this is not a gaming laptop. Yes, you can do some mainstream gaming on it. Using the AMD Radeon 8060S GPU, I was able to run Cyberpunk 2077 at native resolution and medium settings, with AMD FSR 3 frame gen turned on, and got around 85 frames per second.
Test System Specifications
| Component | Specification |
| Processor | AMD Ryzen AI MAX+ PRO 395 (3.0GHz) |
| Memory | 128GB LPDDR5x-8533 RAM |
| Graphics | AMD Radeon 8060S Integrated |
| Storage | 2TB SSD |
Benchmark Results
| Benchmark Test | Metric | Score |
|---|---|---|
| Geekbench 6 | Single-Core | 2,785 |
| Geekbench 6 | Multi-Core | 17,482 |
| Geekbench AI (CPU) | Single Precision | 3,873 |
| Geekbench AI (CPU) | Half Precision | 1,741 |
| Geekbench AI (CPU) | Quantized | 7,511 |
A rare mobile workstation for local AI
As LLMs get bigger (and hopefully better), this memory bottleneck will continue to be an issue for on-device AI. I'd call 128GB of unified memory the sweet spot right now, and the ZBook Ultra G1a remains one of the only ways to get it in a laptop.
One final note, besides your LLM interface of choice, you can also try out the bundled HP AI Companion software, which allows you to switch between local and cloud AI (local is handled by Phi 3.5), and assign specific document libraries to specific conversations -- allowing you to keep data local and siloed, which is one of the main use cases for on-device AI.
More from MC News
- Run AI Locally: The Best LLMs for 8GB, 16GB, 32GB Memory and Beyond
- Quantization Explained: Why the Same LLM Gives Better Results on High-End Hardware
- Why VRAM and Memory Bandwidth are Key for Powering Local AI
- Hands-on with NVIDIA DGX Spark: Everything You Need to Know
- How to Build a PC with a Hardline Water-Cooling Loop
- 3D Print a Mac Mini Monitor Mount
- The End Has Come for Windows 10: Four Tips to Make the Most of Windows 11
- Everything You Need to Know About WiFi 7
- Keyboard 101: Intro to Computer Keyboards
- Fix It Yourself: Talking to iFixit on Why Repairable Tech Matters
Dan Ackerman is the Editor-in-Chief of Micro Center News. A veteran technology journalist with nearly 20 years of hands-on experience testing and reviewing the latest consumer tech, he previously served as Editor-in-Chief of Gizmodo and Editorial Director at CNET. He is also the author of The Tetris Effect, the critically acclaimed Cold War history of the world's most influential video game. Contact Dan at dackerman at microcenter.com.
