From vibes to visibility: Why we built Traceloop

Today is a big day at Traceloop: we’re announcing our $6.1 million seed round led by Sorenson Ventures, with participation from Y Combinator, Samsung Next, IBEX, and notable angels including the CEOs of Datadog, Elastic, and Sentry. It’s a big milestone for us, but it also brings us back to the problem that first pulled us in.
Gal and I started Traceloop because – like so many AI engineers these days – we felt stuck. We were building with early LLMs in 2022 and testing what was then called GPT-3. It felt exciting, like new territory. But it was also unpredictable. Something would work one day and break the next – even if we didn’t change anything. We didn’t want to ship broken AI, but there weren’t any tools on the market to tell us what was breaking or why.
Between the two of us, we had spent years building at Google and Fiverr. At Google, I worked on models that drove growth for Photos, Maps, and YouTube. You couldn’t ship a single line of code without making sure it had been validated in test. At Fiverr, Gal built the machine learning pipelines that powered the core product. Everything was observable and accountable.
But here we were, managing LLM prompts in a spreadsheet. It wasn’t called “vibecoding” back then, but that’s what we were doing. We figured that if this was the future of software, it needed better infrastructure. So we started building what would become Traceloop.
We started with OpenLLMetry – our open-source framework that helped to bring real observability to AI. Built on OpenTelemetry, it provides developers with the observability and accountability that had been missing from AI. It took off, with more than half a million downloads a month and companies like Cisco, Dynatrace and IBM using it. But it became clear to us that this was only the beginning.
Teams didn’t just want to see what their LLMs were doing. They wanted more granularity: things like if the model was hallucinating, if a prompt was drifting, or if a user-facing chatbot had started returning the wrong answers. They needed answers to questions about latency, quality, and performance. So we built the Traceloop platform.
Now, we help companies evaluate and compare models, detect hallucinations, version and test prompts just like if they were lines of code, and monitor cost and performance. For example, Miro uses Traceloop to monitor real-world performance at scale, flag edge cases early, and safely experiment with new models like GPT-4.1 – all without disrupting the user experience.
This round gives us what we need to keep going. We’ll use the funds to scale the platform, deepen our evaluation stack, and support more enterprise-grade developments. We’ll also keep investing in the open-source community that helped us get to where we are. Because at the end of the day, we won’t rest until AI developers can have the same confidence and discipline that has existed in software engineering for years.
Building with LLMs today and feel like you’re flying blind? Get in touch :)