Q&A with Netris CEO, Alex Saroyan: Network Automation for the AI Era

Release date: 16 June 2025

—

In part two of our ongoing series, Vespertec Partner Insights, we spoke to Alex Saroyan, CEO of Netris, a leading cloud-like network automation company. Alex spoke to us about the growing complexity of AI networks, the challenges of delivering secure multi-tenancy at scale and why the tools and automation methods pioneered by hyperscalers are now becoming essential for everyone else.

Alex’s whistle-stop tour touched on everything from switch fabric abstraction to the critical differences between training and inference clusters. Crucially, he walked us through why network automation has become foundational for GPU infrastructure, whether you’re building one cluster or a hundred.

Can you talk about your background? What was the origin story of Netris?

Alex Saroyan: Before Netris, I worked as a network engineer across ISPs and data centres. One of my early large-scale roles was at Lycos, which some might remember as a major search engine before Google. What stood out from those years was the massive difference in how traditional data centres and cloud providers approached networking. Cloud providers were designing for user self-service at scale. Traditional network engineering couldn’t keep up.

That’s why we built Netris. To bring cloud-style network automation and abstraction to everyone else. We tried building this kind of system in-house, but it became clear that only a company dedicated to solving this problem could cut through the complexity and do a very good job for customers.

What specifically makes AI infrastructure different from traditional enterprise networking?

Alex Saroyan: AI clusters are far denser. A traditional enterprise rack might use one or two leaf switches for 40 or 50 servers. An AI cluster with 32 GPU servers—what we call a single scalable unit—needs eight data switches and a management switch. Each GPU server has eight production links. That’s 256 cables just for 32 servers. Multiply that out for larger deployments, and it quickly gets out of hand.

But if you’re networking an AI cluster, you’ve got more to worry about than just the number of switches. These links are connected in a certain way to optimise GPU-to-GPU communication and support RDMA or RoCE traffic. That creates a lot of extra work for network engineers. Manually engineering a high-performance AI network, then changing it on the fly, isn’t realistic.

So, it’s about the operational model as well as physical density.

Alex Saroyan: Exactly. Because GPUs are expensive, the only way to maximise ROI is to share them internally (other departments) or externally (customers). That means multi-tenancy. And that’s where things get really complex.

Now, AI networking is complicated enough. Add the need for dynamic, safe, continuous configuration changes to fuel multi-tenant use, and it’s on another level. You’re essentially building a cloud. And, if we look at the big cloud providers, they’re not using off-the-shelf network tools. Hyperscalers solved this problem years ago with their own internal abstraction software. But until recently, that kind of tooling wasn’t available to others.

You mention multi-tenancy as a key method for firms looking for ROI on their GPU clusters. Are there any other methods or nuances people should be aware of?

Alex Saroyan: Multi-tenancy is a great way to pay back your GPU investments. But there’s more than one type, and it’s important to know the difference between ‘hard’ and ‘soft’ multi-tenancy.

Hard multi-tenancy means enforcing isolation at the physical network level, right inside the switch. The ASIC, the actual chip inside the switch, is what enforces those boundaries. There’s essentially no way around it. That kind of strict separation is usually a security requirement, especially for training workloads. Those typically involve proprietary or highly sensitive data, and they run on bare metal with no virtualisation layer. So the only way to safely share infrastructure across tenants is by physically segmenting everything. Every server, every port, every link.

And keep in mind, a typical training cluster might use thousands of GPUs. That’s hundreds of servers—say, 128 servers, each with eight GPUs—and every one of those has eight connections into one network fabric, maybe two more into another. That’s a lot of links. And that hard isolation has to be enforced across every single one.

Manually maintaining that is nearly impossible, especially when the environment’s constantly changing. One day it’s 1,000 servers, next it’s 1,200, then maybe it drops to 800. It’s fluid. You’re operating a cloud, whether you realise it or not.

Soft multi-tenancy is more common in inference workloads (i.e. running live data through a trained AI model). Those clusters can be isolated from training clusters using hard separation, but inside the inference cluster, workloads are typically virtualised. So you’d use containers, Kubernetes namespaces, and other software-level controls to isolate tenants. That’s still multi-tenancy. It’s just happening further up the stack.

The truth is, if you want to really maximise ROI on your infrastructure, you need both. Hard for training. Soft for inference. There’s no single approach that covers everything. And the more serious the deployment, the more obvious that becomes.

Right now, lots of firms are shifting from training-focused workloads to inference. How is that influencing network demands?

Alex Saroyan: For training, you build very large and centralised clusters. The more GPUs you can bring together into a single cluster, the more powerful and complex the models you can train. That’s how modern AI works. These models require enormous compute, so scale matters. You want as many GPUs as possible in one tightly integrated setup to handle that workload efficiently.

Inference is completely different. You don’t need nearly as many GPUs in a single cluster to serve inference workloads. But you have far, far more users. Think about something like ChatGPT. Sometimes it responds instantly, and sometimes there’s a bit of lag. That’s because of high utilisation. If a lot of people are hitting the system at once, it needs more GPU resources to keep up.

So when it comes to inference infrastructure, you need efficient distribution just as much as raw compute. You might be serving users across different regions, and latency really matters. That’s why you often see distributed inference clusters, smaller in size but spread across multiple geographic locations. You want to bring the compute closer to the users. It’s the opposite of training in that sense: fewer GPUs per site, but more sites overall.

Going from large, centralised training setups to distributed, user-centric inference clusters has had big implications for how we design, manage and automate networks.

When customers succeed at AI networking, are they usually the ones who plan extensively, or the ones that iterate fast? And what are the risks of both?

Alex Saroyan: AI is still new, and we’re working with customers who come from very different backgrounds and with different levels of experience. It’s interesting. There’s a very vibrant set of perspectives out there. Some AI infrastructure operators come from traditional data centre backgrounds. They know classical network engineering, but AI networking is new to them. They’re open to learning, and when we explain how it differs, they get it quickly.

Others are totally new to infrastructure altogether. Some of them are coming from facilities roles like power or cooling, that sort of thing. Maybe they’ve been around data centres, but now they’re having to manage active components and networks directly. That’s a different kind of learning curve. In some cases, it means training from scratch.

Now, in terms of trial and error, what we’ve seen is that the early generation of GPU cloud operators often deployed their AI clusters without network automation. And yes, technically that’s possible. You can do it—but then you end up with a static, single-tenant configuration. What they’d do is build individual, physically separated clusters. A certain amount per customer. And that was fine at first.

But then customer needs would shift. One might need 32 servers one month and 64 the next. The problem is, in AI networking, physical cabling is extremely dense and structured. Every switch port is already used. So if your clusters are physically separate and you want to merge or reallocate resources, you’re not just replugging one or two nodes. You’re replugging every cable in the network. And that’s an insane amount of manual work.

The way to real ROI is to move away from those physical separations. You build one large, shared cluster, and then use hard multi-tenancy to enforce boundaries. With Netris, you define a tenant just like you would in the cloud. You use a VPC. If you say, “This is VPC One, and these are the servers it should control,” the automation engine applies those policies in the background, and your hard multi-tenancy is implemented instantly. Not after weeks of hard, manual work from your network engineers.

Another common point of trial and error is around service exposure. Once you’ve built these isolated environments for your customers or teams, they still need connectivity. Maybe to download training data, or for inferencing workloads, to expose endpoints to the internet. That’s where you need things like NAT gateways, elastic IPs, or load balancers.

And here’s where traditional thinking gets in the way. People assume that means bringing in firewalls or appliances like F5 load balancers. But those tools aren’t designed for cloud-like, multi-tenant environments. They don’t understand your physical switch fabric, and there’s no integration. You’d have to write a ton of glue code just to stitch tenant-level services to your fabric-level topology. It’s fragile and inefficient.

So without native integration between services and the fabric, how do you actually make multi-tenant infrastructure work at scale?

Alex Saroyan: That’s why we built SoftGate. It’s a software-based gateway that runs on Linux servers: commodity CPU boxes. You can dedicate four, six, eight nodes, whatever you need. It’s horizontally scalable and becomes a native part of your physical network. You define service like NAT, internet gateways, elastic IPs, load balancers right within the VPC, and we implement them programmatically. It all runs through the data plane of SoftGate.

These are cloud-native concepts: VPCs, elastic IPs. We didn’t invent them. Hyperscalers did. But what we’ve done is create a way for everyone else to deliver the same functions, without having to build their own internal stack like AWS or Azure. Because those functions don’t come out of the box. The hyperscalers have internal systems that do what Netris and SoftGate do, but for most operators, that tooling just doesn’t exist. Until now.

This is one of the biggest hurdles for people coming from traditional networking backgrounds. They understand routing, firewalls, load balancing, but not in this integrated, multi-tenant, programmable way. And unless your gateways are tightly tied to your fabric and your automation stack, you’re going to struggle to provide those services inside a VPC environment. It all has to be connected. Otherwise, it breaks down.

Some customers might think they’re not big enough for this level of tooling. Is this just for hyperscalers?

Alex Saroyan: That was a common view pre-AI. But AI clusters introduce five to six times more network connections than traditional enterprise setups. Even a two-unit cluster, with 64 servers and 512 GPUs, is complex enough to benefit massively from automation.

And these GPUs aren’t cheap. Compared to that investment, the cost of network automation is minor. Our licensing is flexible for small and large operators. The ROI is there.

Are enterprises adopting multi-tenancy internally too. Not just for customers?

Alex Saroyan: Definitely. Many start with internal-only clusters. But over time, they want to share those across teams or departments. If they’ve hardwired everything, they find they can’t adapt.

Some try to build their own automation using Ansible or Python. But the scale and reliability you need for multi-tenant AI workloads is way beyond that. It’s the difference between building a small treehouse and building a bridge that’ll support thousands of cars.

What surprises customers most when they start using Netris?

Alex Saroyan: It’s the abstraction. First, we hide the switch ports. Users assign servers to tenants based on names, not physical ports. That’s huge when you’ve got hundreds of links.

Second, we bring in cloud constructs—like VPCs, NAT gateways, elastic IPs—on your own hardware. And because SoftGate and the fabric automation are integrated, you don’t need to stitch anything together manually.

You just define the VPC, assign the servers, and apply the policy. That’s it. That’s what people get excited about.

If you’re building AI clusters and need to make multi-tenant networking work properly, get in touch. Our data centre infrastructure experts can walk you through how platforms like Netris help abstract away the usual complexity.

Blog

Latest News

Q&A with Netris CEO, Alex Saroyan: Network Automation for the AI Era

The Vespertec ideology always shows in the solutions we provide and the way we work – as we hope you’ll discover for yourself soon.