AI Factories

Compute:
The Engine Room of
Your AI Factory

Overview

From desktop NVIDIA DGX™ Spark systems to NVIDIA DGX SuperPOD™-scale deployments, we help build the engine room of your AI factory. As an NVIDIA Elite Partner for Compute, we deliver hands-on, open infrastructure expertise: helping you choose, configure, and deploy the right system for your workloads, from the first GPU to a full production cluster.

When building any AI factory, the use case determines everything.

The type and volume of compute you need for each build depends on a few key questions, including: _

What applications are you running?
What’s the model size or required token rate?
Is the app sensitive to memory latency?
Can it run on Arm-based architectures?
Are you training, fine-tuning, or running inference? Do you plan to support agentic AI systems?
Do you need to support multiple users or departments with different access levels? What controls are needed to ensure security and sovereignty?
What are your constraints on power, cooling, or rack depth?

Answering these questions helps determine whether you need a standalone single-GPU system, a 6-node BasePOD, or something in between. For example, at the edge, smaller deployments often start with short-depth MGX servers with 1-2 GPUs. Teams working on diagnostic imaging with larger user groups might prioritise GPU-to-GPU communication through NVLink or Superchip architectures. Meanwhile, a cloud service provider might prioritise a larger-scale SuperPOD or NVL72 system.

Practical
Considerations

We can help you design all of this before a single system is deployed by balancing workloads, physical constraints, and total cost of ownership.

Liquid Cooling

With server rack densities on the rise, and a single HGX system peaking at over 10kW, cooling requirements are also increasing.

Modern GPUs are pushing beyond what fans and air cooling alone can handle efficiently. That’s why more teams are exploring direct liquid cooling (DLC), direct-to-chip (D2C), and, for the right architecture, immersion.

DLC and immersion cooling will shape how you plan your racks, your pipework, and your overall footprint for the next 5 to 10 years, so it makes sense to factor them in early. We’re already working with OEMs like Supermicro, Gigabyte, and Lenovo who are shipping liquid cooling-ready systems for AI factories today.

Compute-
Intensive Use Cases

Here are some of the
most common applications
driving demand for
AI adoption today

Resources

Practical guidance to help you accelerate performance, simplify operations, scale efficiently, and optimise total cost of ownership from your compute:

Compute

07/31/2025

From Air to Liquid: Planning for Cooling in Your Next GPU Rack

Lorem ipsum dolor sit amet, consectetur adipiscing...

Compute

07/31/2025

Why Superchips Change the Rules for AI Inference at Scale

Lorem ipsum dolor sit amet, consectetur adipiscing...

Compute

07/31/2025

How to Size Your AI Cluster: Lessons from Real-World GPU Deployments

Lorem ipsum dolor sit amet, consectetur adipiscing...

Compute, featured

07/25/2025

What’s Actually Inside a BasePOD? A Practical Look at NVIDIA’s Reference Cluster

Lorem ipsum dolor sit amet, consectetur adipiscing...

Deep Dives:
Architectural
Families

NVIDIA offers a set of reference architectures designed to support different types of AI infrastructure. This section breaks down the key families of compute and how we see them being used in practice.

The NVIDIA DGX™ platform offers you the fastest time to value. As the only system offered by NVIDIA directly, it’s the quickest route to a fully integrated AI factory. Bundling together software, infrastructure, and support into a single solution, these systems suit teams that want to deploy quickly and standardise around a known architecture. That’s especially valuable for companies where procurement cycles, internal compliance, and multi-team access need to be solved up front.

Because DGX comes as a complete package, it also sets a reliable baseline for scaling. It’s the closest to NVIDIA you can
get. The product line runs all the way from the Wi-Fi router-sized DGX Spark and desktop NVIDIA® DGX Station™ to the
multinode BasePOD and SuperPOD designs. That makes it easier to expand from a single-node or unit test environment
to a full production deployment without redesigning the stack.

The fixed bill of materials (BOM) removes complexity from procurement and integration, offsetting concerns about flexibility by keeping the platform consistent from lab to production. If you’re building an AI factory with clear performance goals and limited room for risk, DGX is still the reference point.

The NVIDIA DGX BasePOD™ and SuperPOD reference designs provide testbed blueprints for building large-scale AI factories. BasePOD offers a repeatable cluster design from two nodes to dozens with known cooling, power, and cabling requirements. This is most useful if you’re rolling out across regions or building a shared internal platform.

The NVIDIA GB200 NVL72 is NVIDIA’s densest rack-scale platform for the most demanding AI workloads. It combines 72 GPUs with shared power and liquid cooling, using OCP racks, rear I/O, and blind-mate trays to support extreme density.

SuperPOD is a full data centre build, involving 100s of GPUs, typically seen in sovereign AI projects or model training at national scale. While this is best suited for larger customers, people with an eye on long-term growth will find aligning with the architecture early can avoid a protracted rework later.

The NVIDIA HGX™ platform delivers NVIDIA’s GPU architecture through OEM and ODM partners, offering flexibility in how systems are designed and deployed. Unlike DGX, it supports a broader choice of CPUs – such as those from Intel or AMD – memory configurations, and networking. This makes HGX an easy choice for teams focused on aligning with their existing data centre standards.

This flexibility is especially valuable at scale, where fine-tuning memory, storage, or networking can drive efficiencies. For advanced users, HGX also avoids the overhead of NVIDIA software licensing. That makes it well-suited to those who have already trialled a turnkey DGX platform and have now made customisation a priority.

HGX systems are typically found in larger, tailored deployments where customers want more control over design and procurement. Most major OEMs now offer HGX platforms, helping enterprises standardise procurement across workloads and regions without committing to NVIDIA’s fixed DGX bill of materials.

NVIDIA MGX™ is a modular reference architecture that takes a building-block approach to AI factories. It gives you the flexibility to tailor your infrastructure while providing some clear guardrails. NVIDIA supplies the reference architecture while OEMs build systems around it: speeding up custom deployments without sacrificing compatibility.

Because MGX systems follow a validated spec, you’re not dealing with bespoke cooling or power designs for every node.
It’s well suited to distributed environments and multi-tenant platforms where customers want smaller, self-contained clusters of compute rather than one large monolithic build. It’s often used by advanced users who value flexibility in specification over bundled software or support.

Because these systems follow a consistent design, they simplify scaling, procurement, and hardware management over time. They also tend to reach market faster than fully bespoke designs, which helps customers adopt new GPUs or architectures without delay.

Superchips are built for AI factories which demand massive memory bandwidth – either as standalone systems or in very large clusters. By directly coupling CPU and GPU through NVIDIA’s NVLink-C2C, systems like the NVIDIA GH200 Grace Hopper™ Superchip or Grace Blackwell avoid PCIe bottlenecks and enable cache-coherent access to shared memory. That’s critical for inference tasks like LLMs, where GPU throughput depends as much on data movement as raw compute and FLOPS.

Because they integrate CPU, GPU, and HBM3/4 memory into a single package, Superchips are well suited for large-scale
inference, real-time analytics, and production environments where latency and scheduling overhead add up fast. They’re
also at the heart of NVIDIA’s highest-density systems, including Blackwell-based NVL72 builds and future roadmap
landmarks like the Vera Rubin NVL144.

For AI factories focused on maximising energy efficiency, the Grace–Grace Superchip offers a dual-CPU option with 2x the performance per watt and 2x the packaging density compared to traditional x86 systems. It also provides 900GB/s of interconnect bandwidth between nodes, which is ideal for simulation or preprocessing-heavy workloads.

OCP (Open Compute Project)

OCP defines a set of standards for rack-scale systems, including 21” racks, shared power shelves, rear I/O, and blind-mate liquid cooling. These standards help simplify high-density deployments and are already common in data centres built for large-scale AI.

AI factory systems like the GB200 NVL72 are already designed around this format, as few facilities can support the weight, power, and cooling without it. As an official OCP Solution Provider, we’re familiar with the requirements needed for data centres to accommodate these systems, so we know what to look out for, especially when planning long-term growth.

Every company will have their own preferred OEMs and partners, and we can accommodate a wide range of choices.

We regularly deploy NVIDIA-Certified Systems – from
DGX servers to HGX-based configurations – from a global
network of trusted partners including:

Supermicro
Lenovo
Gigabyte
HPE
Dell
WiWynn
QCT
ASRock Rack

We’re not tied to a single vendor, which means we can design around your constraints, whether it’s rack depth, power availability, or procurement policy. We also maintain direct relationships with engineering and product teams across our ecosystem, so we can escalate quickly and advocate for your needs

Partners

Test with Us:
The Vespertec
Test Drive Programme

Talk to us about Compute

Our on-site AI Lab was the first in Europe to combine NVIDIA’s Grace Hopper Superchip with the SN5600 800Gbps Ethernet and QM9700 400Gbps InfiniBand switches.

Through Vespertec’s Test Drive programme, you can remotely access the platform to benchmark applications, understand how your code behaves on Arm-based architectures, and support validation pre- and post-sale.

This hands-on test environment allows you to carry out early evaluation and post-deployment troubleshooting, backed by our in-house engineers.

Our ideology:

Reduce cost — Improve performance — Maximise interoperability — Scale freely

Get in touch

Get support for your business

Partners Products Services About Us

Sales Account Form Contact Us

info@vespertec.com +44 (0)161 947 4321
LinkedIn

Terms & Conditions

Terms & Conditions Privacy Policy

Compute:
The Engine Room of
Your AI Factory

Overview

Practical
Considerations

Liquid Cooling

Compute-
Intensive Use Cases

Here are some of the
most common applications
driving demand for
AI adoption today

Resources

From Air to Liquid: Planning for Cooling in Your Next GPU Rack

Why Superchips Change the Rules for AI Inference at Scale

How to Size Your AI Cluster: Lessons from Real-World GPU Deployments

What’s Actually Inside a BasePOD? A Practical Look at NVIDIA’s Reference Cluster

Aligning Your Infrastructure with the NVIDIA Software Stack

What NVIDIA NeMo Looks Like in a Real-World Fine-Tuning Workflow

Scheduling, Monitoring, and Serving – A Practical Stack for AI Infrastructure Operations

How We Helped One Customer Run Six Inference Pipelines on a Single GPU Node

Deep Dives:
Architectural
Families

OCP (Open Compute Project)

Partners

Test with Us:
The Vespertec
Test Drive Programme

Get in touch

Compute:The Engine Room ofYour AI Factory

Overview

PracticalConsiderations

Liquid Cooling

Compute-Intensive Use Cases

Here are some of the most common applications driving demand for AI adoption today

Resources

From Air to Liquid: Planning for Cooling in Your Next GPU Rack

Why Superchips Change the Rules for AI Inference at Scale

How to Size Your AI Cluster: Lessons from Real-World GPU Deployments

What’s Actually Inside a BasePOD? A Practical Look at NVIDIA’s Reference Cluster

Aligning Your Infrastructure with the NVIDIA Software Stack

What NVIDIA NeMo Looks Like in a Real-World Fine-Tuning Workflow

Scheduling, Monitoring, and Serving – A Practical Stack for AI Infrastructure Operations

How We Helped One Customer Run Six Inference Pipelines on a Single GPU Node

Deep Dives: Architectural Families

DGX

Rack-Scale: NVL72, SuperPOD, and BasePOD

HGX

MGX

Superchip

OCP (Open Compute Project)

Partners

Test with Us: The Vespertec Test Drive Programme

Get in touch

Compute:
The Engine Room of
Your AI Factory

Practical
Considerations

Compute-
Intensive Use Cases

Here are some of the
most common applications
driving demand for
AI adoption today

Deep Dives:
Architectural
Families

Test with Us:
The Vespertec
Test Drive Programme