Q&A with Jason Hammons at VAST Data: Building 10x More AI Products with Your Existing GPU Clusters
Release date: 07 July 2025
—
In part three of our Vespertec Partner Insights series, we spoke with Jason Hammons, VP of International Systems Engineering at VAST Data, about how enterprises can turn existing GPU clusters into new AI products – without the complexity.
Jason talked us through the architecture choices, security roadblocks, and data infrastructure mistakes that stall enterprise AI, and why you don’t find the solution by using more tools.
Enterprises have made huge investments into GPU clusters, often with a particular set of use cases in mind, but they often struggle to progress beyond the pilot stage. Why do you think that is?
Jason Hammons: Most enterprises can get through the initial proving, whether that’s on-prem, in the cloud, or a mix of both. But scaling beyond the pilot is where it usually breaks down. That’s where we make the biggest difference: solving problems of scale.
By nature, we’re a scale-out system that’s multifaceted and multifunctional in how you access and manipulate data. Whether you’re pipelining, training, or inferencing, all of these require different access methods, fast access, massively parallel access across all formats. And that’s actually quite hard to do with most traditional solutions.
We like to compare ourselves to shared-nothing architectures. Those are typically limited in how much parallel concurrency you can drive to a given dataset. We think we’ve solved that with our disaggregated, shared-everything approach. It eliminates a lot of performance trade-offs and allows partners like NVIDIA, AMD, and Cerebras to achieve new levels of parallelism at scale.
Whether you’re training models, inferencing, or doing large-scale analysis like financials often do, the datasets are growing, and the access those GPUs need grows linearly. Once you start scaling past the 10 to 20 petabyte range, having an architecture that still performs well is actually quite hard to do. We think we’ve done a great job of solving that.
GPU acceleration is a given now, but many traditional cloud platforms weren’t built for it. How are those limitations showing up as customers try to scale?
Jason Hammons: A lot of public cloud infrastructure is built on shared-nothing architectures. That works fine for general-purpose compute, but it introduces real bottlenecks once you try to scale up AI workloads. You end up hitting architectural trade-offs that limit throughput, parallelism, and, ultimately, cost efficiency.
We’ve taken a different path. Rather than trying to work around those limits with caching or layers of abstraction, we built an architecture that makes full use of storage-class memory. That lets us maintain a high-speed, metadata-rich system that keeps up with GPU demand, whether you’re building 200Gbps or 400Gbps pipelines, across all kinds of workloads.
Our job is to make sure you can grow without hitting the usual scaling walls. So instead of dealing with bottlenecks or chasing performance through complicated tuning, customers get something that just works across petabyte-scale deployments. It gives them the consistency and efficiency they need to build real products.
Security is non-negotiable, especially in regulated industries. What role does fragmented security play in slowing down AI deployment, and how does VAST address that?
Jason Hammons: Security is a big topic, and we don’t claim to specialise in it across the board, but what makes VAST different is that we’ve built a unified access and security model from the ground up. That lets us make secure, real-time changes to access and policy as data is being ingested, manipulated, or retrieved.
Today, whether you’re on-prem or in the cloud, chances are your data is being handled in a lot of different ways. So, when you want to apply a security change, you either need a very sophisticated overlay – or you need a large operational team just to manage credentials and policies across all the endpoints.
We’ve seen analytics environments, especially in larger banks, where there are 80 to 100 different silos, and each one manages its own access controls. It becomes incredibly complex. What we’ve done instead is centralise it. Everything runs through what we call the Element Store. Whether it’s S3, SMB, NFS, block, databases, or Kafka, all of those access methods are unified under the same credential management system.
If a user’s LDAP or Active Directory details change, that change gets applied across every protocol instantly. That simplifies everything. And because of AI – and especially agentic AI – that simplicity matters now more than ever.
In older environments – batch-based Hadoop systems or early analytics stacks – you could tolerate 12 to 24 hours of ingest latency. You’d collect the data, write it down later, and train a model in stages. But agentic systems don’t work that way. They rely on fresh data to make decisions in real time.
We’re already seeing enterprises plan and deploy 10 or 20 different agentic products that talk to each other, and all of them need real-time capabilities. When new data is ingested, the system needs to respond instantly, with a single source of truth that’s secure, scalable, and fast enough to hit that sub–six or seven millisecond latency. That’s the baseline for making real-time AI work.
What about people who are worried about data sovereignty and compliance?
Jason Hammons: It’s a big challenge, because in a lot of cases the regulations just aren’t fully formed. That’s not the regulators’ fault; they’re trying to respond to what the market needs as things break, and that’s always been the way with technology.
What we can do is give customers a universal system that makes it easier to react when those changes come. Full auditing, end-to-end, is built in, whether that’s internal to the system, exported to a third party, or to a regulator. And we do that at the lowest possible cost, natively.
We’ve also built large-scale multi-tenancy with secure isolation. For example, say you’re buying data from a provider and licensing it for one tenant. You can’t allow other users to see that data, and in a lot of cases, the regulation or the licensing agreement doesn’t just discourage it, it forbids it. You have to be able to prove it hasn’t been accessed by anyone else.
We can do that. We use encryption and key integration with whatever tools the customer prefers, and we can show exactly what’s happened to that data from the moment it entered the system.
As things evolve, regulators might start asking for more detail: who’s touched what, what it’s been joined to, what models have trained on it. That’s where our metadata structure becomes a real differentiator. It’s built for GPUs, and it’s designed to react fast, without bolting on extra products, gateways, or servers.
That’s why we talk about being a partner, not just a product vendor. As AI regulation develops, things are going to change. And we believe we’re best positioned to adapt quickly because we’ve engineered for this from day one.
And we’ve already proven it. Over the last 18 to 24 months, we’ve delivered this product into more than 50 AI geo clouds, including CoreWeave, probably the most well-known. So, when customers ask about sovereign AI, we can say: most of what you’re asking for is already in use, at scale, right now.
Do you find that regulatory uncertainty holds some enterprises back from moving beyond pilot programmes?
Jason Hammons: I think it can, yes. A lot of teams tend to do what they’ve always done: copy data around the infrastructure, get the necessary approvals, and move forward. Then, when a regulator, auditor, or compliance team comes calling, they’ve got their own systems and controls in place to show what they did.
But that whole model – manual prep, copying, ETL – it’s high-latency. And it becomes a real barrier when you’re trying to deliver real-time systems. That’s one of the big pain points we help solve.
Some of the fastest builders we’ve worked with have succeeded because they avoided that manual overhead. We’ve seen it work, and we’ve got a strong reference base that shows how regulated companies can do this at speed.
So when a team asks, “How do we scale this while staying compliant?”, we can point to real examples. Here’s how others did it. Here’s how we can adjust it for you.
From a product perspective, most enterprises already have the ideas. They know what they want to build, and they understand their compliance requirements. What they’re missing is simplicity, something that lets them scale faster, at lower cost, without reinventing their stack.
That’s really our core value. If you want to know what makes VAST different, it’s that we can make the difference between delivering three AI products in a year, or 30. The agility and scalability we offer, while still hitting all the compliance, governance, and security requirements, is what really moves the needle.
Since most customers already know how to build a pilot, it sounds like it’s all about making the process faster, simpler, and easier to integrate. How do you go about doing that?
Jason Hammons: It starts with engineering. We’ve built what we believe – and what our customers tell us – is the best engineering team in the industry for solving large-scale analytics problems that include AI. We’ve got a strong track record, and our approach is to really understand what the customer is trying to build.
In many cases, we already have 90% of what they need. Maybe we’ve done something similar before. But if they need that extra 10%, we have the engineering investment to deliver it – usually in days or weeks, rather than months. Other vendors might need you to buy three different products just to close that last gap.
That agility matters as we move into a world of large-scale, agentic AI systems: systems that are highly secure, governed, and production-ready. That’s why we’ve invested so heavily in our partnership with NVIDIA. We believe it’s the best in the industry. You can see that in Jensen’s keynote from Computex: the first demo he gave was a VAST Data demo. It shows what’s possible when the integration is that tight.
We don’t run a consultancy model. We’re not sending in a team of contractors to glue together 10 different tools. Our approach is: you use two products – VAST and NVIDIA – and we help you get there. If you have someone on your team who can write code, we’ll show them how. You don’t need to bring in external consultants or burn months stitching together multi-vendor workflows.
And once customers see what that makes possible – prompt engineering, RAG, agentic pipelines – they start moving quickly. The ones who need to disrupt their market don’t waste time. They’ll either go with an AI neo-cloud like CoreWeave, or they’ll build on-prem, where GPUs are cheaper and the ROI can land inside 12 months.
Our job is to make sure they get there. We support them directly, and we think we do that well. That’s reflected in our NPS, in our customer references, and in the number of OEMs building with VAST today. It’s a strong signal of the engineering quality we deliver.
When you add all that up – NVIDIA certification at the highest level, support for GPU-native blueprints, and the ability to get real, working demos into customers’ hands fast – we make it easy for them to get building in a matter of days.
Let’s look ahead. What should enterprises be watching out for in the next phase of VAST’s work?
Jason Hammons: That’s a really good question. Because, at this point, so many people are working on similar problems. The real question becomes: how do you stay ahead? And for us, it comes down to speed, agility, and safety. Safety is a huge focus.
We’re building a lot of sovereign cloud projects, each with their own regulatory requirements and constraints. And those rules will keep changing. To stay competitive, we need to adapt as quickly as possible: not just in how we engineer, but in how the architecture itself is designed.
That’s why we continue to keep everything native to the platform. What we see from a lot of other products is that they bolt on features like servers, gateways, and third-party tools. Eventually you’ve got a system with so many external dependencies that it becomes unmanageable. And, frankly, insecure. We’ve seen customers get stuck waiting months to apply a patch because of all the interconnected components their security teams have to check first.
With security threats that can emerge overnight, and critical bugs spreading globally in hours, that just doesn’t work. You need to react fast. That’s why we believe running native services is our biggest differentiator.
Not everyone fully appreciates how important that is – yet. But problems of scale are starting to reveal themselves. Whether you’re a large model builder or a GPU cloud provider, the cracks start to show. We’re seeing those lessons first, and applying them quickly to how we continue to build.
So we’re going to keep everything native. Keep security unified and controllable. And we’ll continue building tools that help customers create advanced AI pipelines in a way that’s simple and fast. It may sound basic, but it’s the right foundation for what’s coming next.
So simplicity really is the answer to scale?
Jason Hammons: Exactly. While the architecture gets more complex, our job is to keep things simple and real-time on the customer side. That’s how we think about engineering. There’s a quote – I think it was Einstein – about the highest level of intelligence being the ability to simplify. That’s what we aim for. We’ve got a modern, agile codebase, and we’re using it to deliver something native, fast, and built for what’s coming next.