AI Factories

Software:
The Foreman of
Your AI Factory

Overview:
NVIDIA Software
Stack

Once you have your compute, network, and storage, how do you forge it into a single AI factory? NVIDIA offers a growing software portfolio to help you maximise your physical infrastructure, supporting everything from workload orchestration and resource management to the creation and deployment of AI agents. Here, we focus on four key platforms:

NVIDIA Base Command
NVIDIA Mission Control
NVIDIA Run:ai
NVIDIA AI Enterprise

Each software platform helps you manage different layers of your AI factory. For example, at the infrastructure level, Base Command helps you run, schedule, and manage AI model training on your DGX systems – while Mission Control automates the entire operation of your AI factory.

At the pinnacle, AI Enterprise is designed as a full, cloud-native suite of products, including NIM Agent Blueprints, a catalogue of pretrained, customisable AI workflows, to help companies of any size deploy agentic AI systems at scale

Software Use Cases

Here are some of the
most common software-
driven workloads we
support across customer
environments:

Resources

Practical guidance to help you streamline
deployment, optimise hardware usage, and
manage software complexity:

software

07/31/2025

Aligning Your Infrastructure with the NVIDIA Software Stack

Lorem ipsum dolor sit amet, consectetur adipiscing...

software

07/31/2025

What NVIDIA NeMo Looks Like in a Real-World Fine-Tuning Workflow

Lorem ipsum dolor sit amet, consectetur adipiscing...

software

07/31/2025

Scheduling, Monitoring, and Serving – A Practical Stack for AI Infrastructure Operations

Lorem ipsum dolor sit amet, consectetur adipiscing...

software

07/31/2025

How We Helped One Customer Run Six Inference Pipelines on a Single GPU Node

Lorem ipsum dolor sit amet, consectetur adipiscing...

Deep Dives:
Software Families

The right software stack helps you get more out of your AI infrastructure. That means better scheduling, optimised utilisation, and less time spent on orchestration. We focus on the platforms that support real deployments — the ones that help customers move from experiment to production.

NVIDIA Base Command is the operating system layer for NVIDIA GPU systems and comes as standard on DGX platforms. It gives you a way to run, schedule, and manage AI training jobs without having to build and maintain the software stack yourself. Everything from container orchestration to monitoring and system tuning is pre-integrated and optimised.

This means less time worrying about infrastructure and more time training models, by allowing you to queue up jobs and allocate compute through a unified interface. If you’re working inside a DGX environment, Base Command offers you the fastest time to value. However, even for those using a non-DGX NVIDIA system, Base Command Manager Essentials comes as standard under AI Enterprise.

NVIDIA Mission Control is the AI operations layer that sits above Base Command. It automates and simplifies the day-to-day running of an AI factory, integrating with your Building Management System to improve control across power, cooling, leak detection, and system coordination.

Where Base Command helps you run workloads across your compute cluster, Mission Control helps you run the entire factory. It’s especially useful in larger environments with multiple users or production-scale workloads, where uptime, efficiency, and visibility all matter. Mission Control is available as a licensed upgrade to Base Command

Run:ai is a GPU orchestration platform built to maximise your use of shared infrastructure, which was acquired by NVIDIA in 2024. It makes sure workloads aren’t left waiting on idle compute by abstracting your underlying hardware and dynamically assigning workloads based on demand, priority, and available capacity.

It’s well suited to teams running mixed workloads or supporting multiple users, especially in shared or hybrid environments. Because it runs on Kubernetes, you can deploy it on almost any infrastructure and integrate it into your existing stack. If you need to turn a static cluster into a multi-tenant platform, Run:ai is ideally suited to doing the heavy lifting on compute – much like Netris does for networking.

AI Enterprise is the ‘mothership’ of NVIDIA’s software stack: a single platform that brings order to every layer of the AI workflow. It wraps together microservices, tools, and frameworks like NVIDIA NeMo™, NVIDIA NIM™, and NVIDIA CUDA-X to support everything from training foundation models to deploying agentic AI systems for tasks like customer service or drug discovery.

Think of it as the wrapper that holds the stack together. Underneath the AI agents sit the models; under the models, the tools; and under those, the infrastructure. AI Enterprise is included with every DGX system and some PCIe GPU cards, but can also be licensed on a per-GPU basis for other infrastructure – so you can run it anywhere. If you want the most powerful way to build, deploy, and scale agentic AI, this is it.

Blueprints

Blueprints, as part of AI Enterprise, sit at the top of the AI stack. They bridge the gap between infrastructure and real-world AI applications. Each one shows how to build and run a specific type of generative AI system, using NVIDIA tools alongside your own data and business logic.

Blueprints are especially useful when you want to move fast without reinventing the wheel. They map real-world tasks like RAG, virtual agents, and digital humans to infrastructure. That gives your team a reference implementation they can adapt and deploy, backed by patterns NVIDIA has already validated in production environments.

Contact Us

Talk to us about Software

Need help sizing a software stack or rolling out an internal AI platform?

We work with customers at every stage — from early scoping to full-scale deployment — and provide practical, vendor-neutral advice based on real-world projects.

Get in touch to discuss your requirements.

Our ideology:

Reduce cost — Improve performance — Maximise interoperability — Scale freely