Expanso nets $7.5M to pioneer distributed data processing for enterprises

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

Data is the lifeblood of modern businesses but mobilizing it is far from easy. Companies have to go through a lot of steps just to make sure they are getting the most (if not all) out of the information coming in from different sources. 

Now, as the volume of this information grows multifold, Seattle-based Expanso is moving to give teams a better way to handle their data assets with distributed processing. The company today announced it has raised $7.5 million in a seed round of funding, led by General Catalyst and Hetz Ventures.

It plans to use the capital to double down on this idea, accelerate the development of its data processing platform ‘Bacalhau’ and take it to even more enterprise users, giving them the ability to process information right where it is. 

“Infrastructure built to meet data where it is, even if distributed around the world, is long overdue. What Expanso is building with Bacalhau is intended to revolutionize the way big data is processed and global compute jobs are executed while unlocking an entirely new class of applications,” David Aronchick, the founder and CEO of the company, said in a statement.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!


Learn More

Tackling the problem of distributed data

In the current scheme of things, enterprises extract value from vast amounts of data by moving all of it across networks through complex ETL pipelines and centralizing everything in a cloud data platform. The approach works well (allowing for BI/AI applications) but also takes a lot of time and financial resources at the same time. 

Aronchick, who was the first non-founding product manager on Kubernetes and lead product manager at Google, was quick to note the challenge of these globally distributed workloads during different stages of his career.

“Customers again and again would bring up solutions that they had to build themselves to solve the problem of globally distributed workloads,” he told VentureBeat. To top it off, the rapid explosion of enterprise data in comparison to network growth was not helping the case either. At Protocol Lab, the last company where the CEO worked, over 10 Exabytes (EB) of data was spread across the entire network. On a standard 10GBps network, this much data would take billions of years to move to a cloud platform. 

To tackle this challenge, he launched a project to let people execute compute jobs locally where data was being stored, which ultimately spun off into Expanso.

“We launched the project in February of 2022, building the system entirely in open-source and public domain. Very quickly thereafter, we had our first Compute over Data summit in April, and we realized even at this early stage that this was going to be much larger than just Filecoin (of Protocol). By November, we released our public alpha and then released version 1.0 in May of 2023. At the same time, we closed our pre-seed funding and spun the project out into the new company,” he said.

Today, Expanso calls this open-source project Bacalhau. It runs on the distributed systems organizations have already deployed (or plan to deploy) and schedules computing jobs against the data right where it resides. All one has to do to get started is give a command to install a Bacalhau agent on the machines and join a public/private cloud network. As analytical needs grow, they can add more capacity by provisioning extra Bacalhau nodes.

“Ideally, teams will have to do almost no code rewriting to use our workflows. We already support Docker and WASM, and any arbitrary binary that they already use…The workflow from a team’s perspective is simpler and more streamlined with Bacalhau and Expanso,” Aronchick explained. 

When this product is in use, teams can analyze local data instantly using lightweight Bacalhau nodes installed alongside their existing infrastructure. It reduces the operational overhead of replicating data centers or managing data movement between clouds and allows organizations to use idle edge computing resources, leading to additional cost savings. Most importantly, processing data in situ increases security and speed while reducing the risk of regulatory fines.

Growth so far

Currently, Bacalhau can handle a range of data tasks, right from sanitizing and processing application logs at source and running distributed ML training across remote devices to processing files data distributed across storage and varied regions and managing distributed device fleets.

According to Aronchick, since the launch of its public demo earlier this year, Bacalhau has been used to run over 2 million jobs across use cases. He refused to share exact revenue growth stats but noted the company is working with heavyweights such as the U.S. Navy, CalTech, University of Maryland, Prelinger Labs, WeatherXM, and others. 

Moving ahead, the company hopes to build on its work and evolve Bacalhau to support additional enterprise use cases and address major customer needs. It also plans to expand the user base of the platform, which currently sees over 50,000 CLI downloads per month.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.


Leave a Comment