DevZero has launched an autonomous infrastructure optimisation platform for Kubernetes workloads that can rightsize resources in real time without restarts.
The Seattle-based company, founded by former Uber engineers Debo Ray and Rob Fletcher, is moving more directly into a market occupied by companies such as Cast.ai and ScaleOps. It argues that its use of checkpoint-restore technology sets it apart by allowing live migration of workloads during shifts in demand or infrastructure disruption.
The platform operates at the cluster, node and workload levels, using software that profiles resource demand and adjusts CPU, memory and GPU allocation as usage changes. The approach is intended to reduce overprovisioning, a common practice among engineering teams that reserve excess capacity to avoid outages.
The launch marks a shift in emphasis for DevZero, which was established in 2022 around a cloud development platform aimed at improving software engineering productivity. While running that service on Kubernetes, the founders identified inefficiencies they said were eating into margins and built tools to address them.
That work led the company to focus on infrastructure optimisation, particularly as artificial intelligence inference workloads add pressure to cloud spending. DevZero says its customers include DataBahn, Dentira, Starburst, OpenObserve and Outerbounds.
Industry data cited by DevZero points to the scale of the problem. A survey by the Cloud Native Computing Foundation found that 66% of organisations hosting generative AI models use Kubernetes to manage some or all of their inference workloads. Datadog research found that 83% of container costs go to idle resources, and that 54% of those costs come from overprovisioned cluster infrastructure.
DevZero says its average client had been overspending on compute by 53% before adopting the platform. It also says users typically reduce compute bills by 30% to 60%, though those figures were provided by the company and were not independently verified.
Trust issue
One obstacle for suppliers in this market has been convincing infrastructure teams to hand over control to automated systems. Engineers may accept that manual rightsizing is time-consuming and difficult to scale, but still worry that unsupervised software could cut resources too far and create instability.
DevZero says it designed the platform around that concern, combining continuous monitoring with statistical modelling of resource demand. The system then uses scheduling and autoscaling tools to place workloads and select capacity across cloud providers including AWS, Azure, GCP, OCI and OpenShift.
The platform analyses more than 3,000 instance types, 69K price points, 23 GPU models and more than 80 regions. The goal is to decide where workloads should run and how much infrastructure they require at any given moment.
Its central technical claim is that rightsizing can happen without restarting workloads. That matters for businesses running production applications or AI services, where interruptions can affect reliability, customer experience and revenue.
DevZero says checkpoint-restore makes it possible to move Kubernetes workloads live when conditions change, including during sudden spikes in demand or an availability zone outage. In practice, that could reduce the need for spare resources sitting idle as a contingency.
Debo Ray outlined the company's position on the trade-off between savings and reliability.
"Infrastructure teams are hesitant to let anything manage compute autonomously because that usually comes with tradeoffs. Cutting the cloud bill isn't worth it if the result is downtime," said Debo Ray, Chief Executive Officer and Co-Founder of DevZero. "We built DevZero to eliminate the tradeoffs between performance, reliability and cost. This is a platform for engineers who want autonomous optimization they can trust at 3 am."
Customer example
DataBahn, one of DevZero's customers, described an outage scenario in which the system moved workloads without interruption.
"During a recent availability zone outage, DevZero transparently migrated our workloads live without requiring a single restart or operational intervention from our team," said Mihir Nair, Head of Architecture at DataBahn. "That level of resiliency gave us the confidence to push infrastructure optimization much more aggressively at DataBahn. We're now partnering closely to extend that operational intelligence further with Kubernetes observability for outbound network traffic optimization, reduced data transfer costs, and continuous security posture monitoring. The combination of autonomous resiliency, operational visibility, and continuous optimization is fundamentally changing how modern AI and data platforms should operate at scale."
Backed by Anthos Capital, Foundation Capital and Madrona, DevZero is now using the infrastructure product as its main route into a growing market shaped by the rising cost of cloud compute and AI inference.