Braintrust Data wants to make enterprise AI better with faster evaluations

Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.

California-based Braintrust Data, a startup helping enterprises build and improve AI at speed and scale, today announced it has raised $5.1 million in a seed round of funding, led by Greylock Partners.

Founded just a little over two months ago by Ankur Goyal, who sold his previous AI venture Impira to Figma, Braintrust targets the problem of AI evaluation by giving teams a dedicated tool to see how their AI model performs and improve it well before it reaches the production stage.

Despite being an early-stage venture, the company has drawn dozens of customers and investments from known names in the industry, including Elad Gil, Clem Delangue, Greg Brockman, Jack Altman, Howie Liu, Guillermo Rauch, Bryan Helmig, Simon Last, Vipul Ved Prakash.

Now, it plans to expand its team and build on this work, allowing developers to move faster and constantly stay at the forefront of AI.

VB Event

The AI Impact Tour

Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

Learn More

Taking AI to production can be messy

AI is the backend of modern business applications, but when it comes to keeping those applications up to the mark, things can get quite messy. A small code change aimed at improving the application might end up breaking the entire workflow, leaving backend teams hustling to figure out and fix what went wrong.

This reactive approach can break the customer experience — which is why developer teams give a lot of attention to the practice of evaluation in the dev loop, where they try to measure how well the AI system performs. They first analyze context-specific data and metrics, and then rapidly experiment with various models, prompts, fine-tuning and other techniques to achieve the desired results.

Time and effort, streamlined

Now, the thing is, this technique works well but also takes a lot of time and effort, often delaying the launch of features — which is exactly what Goyal faced during his work at Impira and Figma.

After speaking with multiple teams in the same trouble, he decided to build Braintrust Data to test out code changes on real-world examples and enable faster evals.

“Our product allows you to easily (in under an hour) instrument your code to define evaluations, capture user feedback, log LLM calls, etc. Every time you make a change, you can re-run evaluations and instantly get a dashboard that tells you how much you improved or regressed things, and debug individual cases (before moving to final deployment). You can also log examples from staging/production and run evaluations against them to find new edge cases users are hitting,” he told VentureBeat.

Hundreds of customers already

The CEO launched the product in August 2023 and has already roped in “hundreds” of enterprises and startups as customers, including known names such as Airtable, Zapier, Coda and Instacart. According to him, with Braintrust, these players have been able to boost the accuracy of their AI offerings by over 30% in just a matter of weeks, leading to faster ship cycles, increased engagement and better team collaboration.

“Our product can run inside of your own cloud environment, which is critical for enterprise security, especially in AI which is rampant with PII and proprietary info. This has enabled our enterprise customers to use Braintrust for their most mission-critical workloads,” Goyal added.

More importantly, in addition to evaluations, Braintrust has started offering other handy capabilities to help AI teams iterate and ship faster. This includes a prompt playground to compare multiple prompts, benchmarks, respective input/output pairs between runs, dataset management and an AI proxy giving access to popular AI models, including all of OpenAI’s models, Anthropic models, LLaMa 2 and Mistral.

Growing focus on AI quality

As enterprises are bullish on AI capabilities, an offering to evaluate model performance and fix gaps can come in handy. However, Braintrust is not alone in this space.

Over the last year, since OpenAI kicked off the generative AI boom with the launch of ChatGPT, many players have fielded products to help teams build AI products. Some of them focus on model performance metrics like API error rates, rate limits and response times.

Meanwhile, others target the observability front, providing detailed analytics and insights into the quality of outputs provided by the model.

Braintrust, on its part, claims to differentiate by offering insights before the model reaches the production stage.

“There is no doubt this is an exciting space with other companies trying to add value. Most products out there are focused on observability, which allows you to see what’s happening in production. Unfortunately, if you only have observability, you have to ship things to your users to find out whether they work. We’ve found that engineering teams who implement great evaluations move significantly faster – up to 10 times faster – than those who are just watching what happens in production and trying to fix them ad-hoc, Goyal pointed out.

With this round from Greylock, which takes the company’s total capital raised to $8.3 million, he plans to hire more talent and continue aggressively on the product roadmap to build out the market-leading solution for evaluations and support more AI tooling, including a prompt playground, production logging, multi-modal model support, AI proxy, and much more.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

source