The AI Infrastructure Stack · Part 2 of 3
Featuring Russ Artzt · February 2026
About Russ Artzt: Russ Artzt is co-founder of CA Technologies and former executive chairman and head of R&D at RingLead, acquired by ZoomInfo. He speaks with DataStorage.com regularly on AI infrastructure, enterprise software strategy, and the evolving data stack. Connect with him on LinkedIn.
In Part 1 of this series, we examined how AI agents are dismantling the per-seat pricing model that powered two decades of SaaS growth. The natural follow-on question is: if the software layer is being rebuilt from scratch around agents, what does that mean for the infrastructure underneath it?
The answer, according to Russ Artzt, is that storage stops being a utility and starts being infrastructure. Not a destination where data sits. A system through which data moves, continuously, across an AI pipeline that never stops running.
Artzt, co-founder of CA Technologies and former executive chairman and head of R&D at RingLead, acquired by ZoomInfo, has spent the past several years watching AI reshape the infrastructure stack at every layer. His view on storage is direct: “Storage becomes an anchor in the AI story.” That framing is worth unpacking, because it's more specific than it sounds, and the implications for how teams buy, architect, and pay for storage are significant.
Traditional software had a predictable relationship with data. You stored records in a database, users queried them through an interface, and the data sat largely still between interactions. The storage requirement was mostly about capacity and reliability.
AI agents work differently. They are, as Artzt put it, voracious. “You could feed your whatever information you want into that knowledge base. You can give it audio, you can give it video, you can give it images, you can give it PDFs. You name it.” Modern agents don't query records. They ingest entire bodies of knowledge, process them, generate outputs, and then turn around and read from storage again on the next pass.
The scale of this appetite is becoming measurable. Google is now processing more than 1.3 quadrillion tokens per month, up from 10 trillion just a year earlier. RAG (retrieval-augmented generation) pipelines alone can increase data storage requirements by 10 to 20 times compared to traditional inference workloads, because every query requires embedding relevant documents into the prompt context. And for advanced reasoning models, KV cache storage needs are projected to reach 2 to 5 terabytes per concurrent user by 2026, a 20 to 50 times increase over traditional inferencing models.
That isn't a rounding error. It's a structural change in how much storage the average enterprise AI workload requires, and how often that storage needs to be accessed.
One of the more underappreciated storage challenges in the AI era is the model itself. AI models aren't static artifacts. They get trained, fine-tuned, checkpointed, versioned, rolled back, and retrained, sometimes on a daily cycle in production environments. Each of those steps generates data that needs to go somewhere.
Artzt framed it through the lens of scale: “Think about the model Tesla had to put together to do self-driving. Incredible. You keep feeding all real-time information into the Tesla model. These AI models are very sophisticated, and they get very large.” A model that's continuously learning from real-world inputs doesn't have a storage event at the end of training. It has storage events constantly, throughout its operational life.
“People want to be able to store them,” he said. “They may want to store them in hot storage or cold storage.” That choice matters a lot. Modern AI deployment architectures use three distinct storage tiers: hot storage on NVMe or high-performance SSD for real-time inference, active model weights, and live embeddings; warm storage for batch inference, RAG pipelines, and intermediate caching; and cold storage on HDDs or lower-cost object storage for archived model versions, historical training datasets, compliance records, and checkpoints that need to be preserved but don't need to be fast.
Getting that tiering wrong is expensive in both directions. Keep too much in hot storage and the cost curve is brutal. Push active model artifacts into cold storage and your inference latency suffers. Tiered storage architectures done well can cut storage costs by 60% while maintaining millisecond response times on active workloads. Done poorly, they create bottlenecks that idle expensive GPUs waiting on data.
“Storage becomes an anchor in the AI story.” (Russ Artzt)
Here's where the economics of AI infrastructure get quietly brutal, in a place most teams don't scrutinize until they see the bill.
In the agentic model, data doesn't stay put. It moves from storage to GPU compute for training. It moves back to storage when the run completes. It moves out again for inference. It moves back in when outputs are logged. A single AI workflow can shuttle the same dataset across multiple systems, multiple times, in the course of a day. Each one of those movements, in a hyperscaler environment, can trigger an egress charge.
“If you store the data in Amazon and you move it around, you pay what is called an egress charge,” Artzt explained. AWS charges $0.085 to $0.09 per GB for outbound data transfer, with additional charges for cross-region and cross-availability-zone movement. AWS has made some moves to address this, expanding the free tier to 100 GB per month and offering conditional egress waivers for customers migrating off the platform, but the operative word is conditional. For teams running continuous AI workloads with heavy data movement, those waivers don't apply to day-to-day operations, and the meter runs constantly.
The math compounds fast. 37signals, the company behind Basecamp and HEY, expects to save $1.5 million per year on storage alone by moving off AWS S3. That's a single mid-sized software company. For enterprises running petabyte-scale AI pipelines with daily data movement between storage and compute, the egress line item can exceed the storage cost itself.
This is the argument for storage providers that don't charge egress in the same way. Backblaze B2, for example, offers free egress up to three times the monthly storage average, with overages at $0.01 per GB, and unlimited free egress for data moving to compute partners including CoreWeave, Vultr, Cloudflare, Fastly, and Equinix Metal. For a team whose workflow is: store data in object storage, push to a neocloud GPU cluster for training, pull results back, that partnership list matters enormously. The data movement that would generate egress charges on AWS happens at no additional cost when storage and compute live in the same partner ecosystem.
Artzt made this point explicitly: “Amazon will charge you every time you move data. Every time you do a data transfer from one point to another, they will charge you. Backblaze doesn't work that way.” At roughly one-fifth the cost of AWS S3 for combined storage and egress, the economic case for purpose-built object storage in multi-cloud AI pipelines isn't subtle.
The practical enabler of all of this is a protocol most enterprise teams are already using without thinking about it: S3.
“There's a standard called S3,” Artzt said. “Amazon created it and all the storage players have some level of compatibility to S3.” That compatibility is what makes it possible to swap storage providers without rewriting applications. Tools built for AWS S3 can route to Backblaze B2, Wasabi, or any other S3-compatible provider with a configuration change rather than a re-architecture. In Artzt's words: “It thinks it is talking to Amazon, but it is talking to Backblaze.”
That portability is what makes the multi-cloud AI stack viable. You don't have to bet your entire data layer on a single hyperscaler to get S3 compatibility. You can store data with a provider whose economics suit your workload, and your existing tooling largely doesn't care.
Artzt was careful to note that compatibility is rarely perfect. “It doesn't support it a hundred percent,” he said. The gaps matter, and they tend to matter most precisely where AI workflows push hardest: high-throughput parallel reads during training, large sequential writes during checkpointing, and the specific API behaviors that orchestration frameworks depend on. B2 Overdrive, Backblaze's high-throughput tier, was specifically engineered to close those gaps for AI and ML workloads, offering up to 1 Tbps sustained throughput at $15 per TB per month, with unlimited free egress in every direction.
The implication for storage buyers is that S3 compatibility is necessary but not sufficient. The question isn't whether a provider supports S3. It's whether they support it well enough for the specific patterns your AI pipeline generates, and whether their partner integrations make data movement frictionless rather than just possible.
The neocloud ecosystem, examined in our previous series on AI infrastructure, is the compute layer that has grown up around the GPU shortage. Providers like CoreWeave, Nebius, and Vultr have built specialist GPU infrastructure to serve AI workloads that hyperscalers can't supply fast enough or price competitively enough. That compute layer is where training runs, where fine-tuning happens, where inference gets served.
But neoclouds are compute specialists, not storage specialists. They're optimized for GPU clusters, networking fabric, and raw throughput. The data that feeds those clusters has to come from somewhere. Artzt sees this as the central integration challenge of the current moment: “I would predict in the future, in this world of AI, much more integration between the neoclouds, like CoreWeave and Nebius, and storage companies like Backblaze and Wasabi and others.”
The practical shape of that integration is already visible. Backblaze has free egress partnerships with CoreWeave, Vultr, and Equinix Metal already in place, meaning data stored in Backblaze B2 can move to those compute providers and back without egress charges. That's not a marketing arrangement. It's the infrastructure expression of Artzt's thesis: storage wins in the AI era not by competing with compute, but by becoming the frictionless data layer that makes compute more efficient.
“You want to egress it, probably a neocloud provider, analyze it, and when he is done processing, he will send it all back,” Artzt said. “So what we need to do is build integration between the two. They need connectors, seamless integration. I hit a button and it goes.”
That “I hit a button and it goes” standard is deceptively high. It requires not just S3 compatibility but deep integration with the orchestration tools (think MLflow, Weights and Biases, Kubernetes-based pipeline frameworks) that teams actually use to manage their AI workflows. Storage that doesn't plug naturally into those environments creates friction at every stage of the pipeline, and friction at scale is expensive.
One detail Artzt flagged that doesn't get enough attention in the neocloud discussion: many GPU-specialist providers have narrow device support. “A lot of these neoclouds don't have support for HDD and flash drives,” he noted. That matters because not all AI data belongs on the same medium.
The storage market is sorting itself out along predictable lines. Dell'Oro Group projects the storage drive market to grow at a CAGR of over 20% over the next five years, with HDDs holding roughly 90% of cold storage while SSDs and NVMe dominate hot and warm tiers. For AI specifically, TrendForce is predicting severe HDD shortages in 2026, with lead times stretching from weeks to more than a year, as hyperscalers and AI builders stockpile nearline capacity for training datasets.
A storage provider with broad device support can tier data appropriately across the full lifecycle: NVMe for active inference, standard SSD for warm RAG pipelines, HDD for the cold archives of training datasets and model checkpoints that need to be preserved but won't be touched for months. A neocloud optimized for GPU compute doesn't offer that breadth. Pure-play object storage providers do, and that's a meaningful differentiator as AI data volumes grow.
The argument Artzt is making isn't that storage is glamorous. It's that storage is unavoidable infrastructure, and the teams that treat it as a commodity line item rather than a strategic decision are going to find themselves paying for that choice in egress bills, pipeline latency, and integration debt.
A few practical implications:
First, model the data movement economics before committing to a storage provider. The cost of storing data is often less significant than the cost of moving it. If your AI pipeline involves frequent data transfers between storage and compute, egress fees are a first-order cost, not a footnote.
Second, S3 compatibility is necessary but the integration depth is what actually matters. Ask specifically whether a provider's S3 implementation handles the write patterns your checkpointing generates, the throughput your training runs require, and the API behaviors your orchestration tooling expects.
Third, evaluate storage and compute together, not separately. Backblaze's free egress partnerships with CoreWeave, Vultr, and Cloudflare are a direct expression of this principle. The storage decision and the compute decision are linked, and a storage provider that has pre-negotiated frictionless data movement with your compute provider of choice is materially more valuable than one that hasn't.
Fourth, plan for the model lifecycle, not just the training run. The data your AI system generates after deployment, embeddings, logs, inference outputs, checkpoints, fine-tuned variants, is often larger and more persistent than the training data itself. Storage architectures that aren't designed for that ongoing lifecycle create problems that compound over time.
The AI infrastructure story has spent two years being told primarily as a compute story: who has the GPUs, who can build the data centers fast enough, who wins the chip wars. That framing isn't wrong, but it's incomplete.
Compute is where AI runs. Storage is where AI lives. Every model that gets trained, every dataset that gets processed, every output that gets generated has to go somewhere. And in an agentic world, where software is continuously ingesting, processing, and producing data as a matter of course, the question of where that data goes, how fast it can be retrieved, and what it costs to move it becomes one of the defining infrastructure questions of the next decade.
“These are huge models, and they change all the time,” Artzt said. “So you need a fast way of getting the data over to a reliable data center where it can be stored and managed properly.”
That's not a description of a utility. That's a description of infrastructure. And infrastructure decisions made cheaply at the beginning tend to be expensive to fix later.
Part 3 of this series goes one layer deeper, into the tooling required to actually build, debug, and operate AI models in production. Coming next: “Debugging the AI Stack: Why Model Building Needs a Smarter Set of Tools.”
Russ Artzt is co-founder of CA Technologies and former executive chairman and head of R&D at RingLead, acquired by ZoomInfo. He speaks with DataStorage.com regularly on AI infrastructure, enterprise software strategy, and the evolving data stack. Connect with him on LinkedIn.