Understanding Traces and Spans in LLM Applications

Nir Gazit
Co-Founder and CEO
October 2025

When an LLM application fails, it can often feel like a black box. A user gets a bad response, but why? Was the prompt bad? Was the database query slow? Did the LLM hallucinate? Without a clear view of the application's internal workflow, debugging becomes a frustrating process of guesswork.

This is the problem that traces and spans are designed to solve. They are the core components of modern observability, providing a detailed, step-by-step record of everything that happens during a request. This guide explains what traces and spans are and how observability platforms use them to provide critical visibility into your LLM's behavior, a topic we explore in depth on our blog.

Key Takeaways

  • A span represents a single operation, like an API call or a database query.
  • A trace is a collection of all the spans in a single request, showing the full end-to-end journey.
  • Spans contain rich metadata called attributes, which is where LLM-specific data (like prompts, responses, and token counts) is stored.
  • Observability tools use traces and spans to visualize your LLM's workflow, making it possible to debug failures, identify performance bottlenecks, and monitor costs.

How Traces and Spans Provide Visibility into LLM Behavior

To understand how observability tools work, it’s essential to grasp the relationship between traces, spans, and attributes. These concepts, standardized by the OpenTelemetry project, are the building blocks of modern observability.

1. Spans: The Individual Steps

Think of a span as a single, named, and timed operation within your application. In an LLM application, a span could represent:

  • An API call to an LLM provider like OpenAI or Anthropic.
  • A query to a vector database like Pinecone or Chroma.
  • A function that processes data or formats a prompt.

Each span captures critical information, including a name, a start time, and a duration. This allows you to see exactly how long each step in your application takes.

2. Traces: The Complete Journey

A trace is the complete story of a single request, represented as a collection of all its spans. For example, when a user asks your RAG application a question, the resulting trace would contain multiple spans. This provides a clear picture of how to trace LLM agents and find failures.

  1. An initial span for the incoming user request.
  2. A child span for the query to your vector database.
  3. Another child span for the API call to your LLM, including the prompt and the retrieved context.
  4. A final span for the generated response.

The trace ties all these individual steps together, showing you the full, end-to-end workflow and how the different components of your application interact.

3. Attributes: The Key to LLM Visibility

This is where the magic happens for LLM applications. Spans are not just timers; they can be enriched with detailed key-value pairs called attributes. This is how observability platforms capture the unique data that LLMs produce. For an LLM call, the attributes would include:

  • llm.vendor: openai
  • llm.request.model: gpt-4
  • llm.prompt: The full text of the prompt sent to the model.
  • llm.response: The full text of the response received.
  • llm.usage.prompt_tokens: The number of tokens in the prompt.
  • llm.usage.completion_tokens: The number of tokens in the response.

By capturing these attributes, an observability platform can move beyond simply telling you that an API call happened and tell you exactly what was asked, what was answered, and how much it cost.

Understanding and implementing this system of traces, spans, and attributes is the key to unlocking visibility into your LLM's behavior. It requires instrumenting your code with a standard like OpenTelemetry and then building a system to collect, visualize, and analyze this data. This is precisely where a platform like Traceloop provides critical value. It is built on OpenTelemetry and comes with pre-configured instrumentation that automatically captures LLM-specific spans and attributes. It provides an out-of-the-box solution for visualizing traces and analyzing model behavior, allowing teams to gain deep observability without the complexity of building their own system from scratch.

Frequently Asked Questions (FAQ)

1. What is OpenTelemetry? OpenTelemetry is an open-source, industry-standard framework for creating and managing telemetry data (traces, metrics, and logs). It provides a unified way to instrument your code, so you can send your observability data to any compatible tool without being locked into a single vendor. It's the foundation for DIY observability for LLMs with OpenTelemetry.

2. What is the difference between automatic and manual instrumentation? Automatic instrumentation uses pre-built libraries that automatically create spans for common operations, like calls to a database or an LLM API, without requiring you to change your code. Manual instrumentation involves adding code to create custom spans around specific parts of your application, giving you more granular control over your traces. Most production systems use a combination of both.

3. How does this help with debugging RAG applications? In a RAG application, a trace can show you the entire flow: the user's query, the search query sent to the vector database, the exact documents that were retrieved, and the final prompt sent to the LLM. If you get a bad response, you can look at the trace and immediately see if the problem was with the retrieval step (bad documents) or the generation step (the LLM failed to use the documents correctly). This is essential for using tools to detect and reduce hallucinations.

4. Can I use traces and spans to monitor costs? Yes. By capturing token counts as attributes on your LLM spans, an observability platform like Traceloop can aggregate this data to show you exactly how much different parts of your application are costing. You can identify your most expensive prompts or users and optimize accordingly.

Conclusion

Traces and spans are the foundation of modern observability, and they are essential for debugging and maintaining reliable LLM applications. By providing a detailed, step-by-step record of every request, they transform your application from an opaque black box into a transparent system. This philosophy is central to the mission we're building at Traceloop. While implementing this level of observability can be complex, a dedicated platform can provide the tools needed to harness the power of traces and spans, giving you the confidence to build, deploy, and scale your LLM applications effectively.

Ready to gain full visibility into your LLM application? Book a demo today.