Skip to main content

OpenTelemetry and Observability: A Beginner's Guide

OpenTelemetry is a vendor-neutral collection of tools and specifications for instrumenting applications to emit metrics, logs, and traces. In Rust, OpenTelemetry gives you a unified way to understand application behavior: how many requests you handled, how long they took, where they failed, and what happens when one service calls another.

Observability is the ability to understand the internal state of a system by examining its external outputs. Without it, you are flying blind. You cannot optimize code you cannot measure, cannot debug errors you cannot trace, and cannot scale services you cannot monitor. OpenTelemetry solves this by providing a standardized interface that works with every major monitoring platform: Prometheus for metrics, Jaeger for distributed tracing, and Grafana for dashboards.

This article introduces why observability matters, what OpenTelemetry actually is, and the mental model you need to understand the rest of this series.

Why Observability Matters in Production

When you run code locally, you see everything: console output, stack traces, breakpoints in your debugger. The moment that code moves to production across distributed servers, you lose that visibility. You cannot SSH into a container and inspect memory mid-request. You cannot step through production code line-by-line.

Observability fills that gap. It is the practice of instrumenting your code to emit data about what it is doing. There are three pillars:

  • Metrics — numerical measurements aggregated over time (requests per second, memory usage, latency percentiles). They answer "how much?" and "how fast?".
  • Logs — structured records of discrete events (user logged in, payment processed, error thrown). They answer "what happened?".
  • Traces — the journey of a single request or transaction across service boundaries. They answer "what path did it take?" and "where did it slow down?".

Modern applications are distributed: a user request hits your API server, which calls a database, which triggers a cache miss, which invokes a background worker. Without tracing, you only see isolated log lines. With traces, you see the entire causality chain.

According to the Cloud Native Computing Foundation's 2024 observability survey, 76% of engineers report that distributed tracing is critical to production stability. OpenTelemetry adoption in Rust grew 42% year-over-year as of 2026 because it works with existing tools and avoids vendor lock-in.

What Is OpenTelemetry?

OpenTelemetry (often abbreviated OTel) is a specification and a set of APIs for generating, collecting, and exporting observability data. It does three things:

  1. Provides SDKs — language-specific libraries (Rust has opentelemetry and opentelemetry-prometheus) that let you instrument your code.
  2. Defines protocols — standardized ways to export data (OTLP, gRPC, Prometheus text format) that work with any backend.
  3. Stays vendor-neutral — unlike proprietary agents, OpenTelemetry works with Prometheus, Jaeger, Datadog, New Relic, Honeycomb, and dozens of backends without rewriting.

Think of OpenTelemetry as a translation layer. Your Rust code calls OpenTelemetry APIs. OpenTelemetry exports the data to any backend you choose. If you switch from Jaeger to Lightstep next year, you only swap the backend configuration; your instrumentation code stays the same.

The architecture looks like this:

┌─────────────────┐
│ Your Rust Code │ ← emit metrics/traces via OTel APIs
└────────┬────────┘

┌────▼──────────┐
│ OpenTelemetry SDK │ ← process and batch data
└────┬──────────┘

┌────▼──────────────┐
│ Exporters (Prom, │ ← send to backends
│ OTLP, Jaeger, etc)│
└────┬──────────────┘

┌────────▼──────────────────┐
│ Backends (Prometheus, │ ← store and visualize
│ Jaeger, Grafana, etc) │
└───────────────────────────┘

You do not choose one exporter — you can chain them. You might export metrics to both Prometheus (for dashboards) and a time-series database. You might export traces to Jaeger (for visualization) and a logging backend (for long-term archival).

Three Types of Telemetry Data

OpenTelemetry standardizes collection of three data types:

Metrics

Metrics are numerical observations, usually aggregated. Examples:

  • Counter: http_requests_total increments by 1 each request.
  • Gauge: memory_usage_bytes is the current value, no aggregation.
  • Histogram: request_duration_ms tracks a distribution (min, max, p50, p99) of latencies.

Metrics are cheap to emit (negligible CPU/memory overhead) and are designed for dashboarding and alerting. A single histogram can generate dozens of pre-computed percentiles with minimal overhead.

Logs

Logs are structured events with timestamps and key-value pairs. OpenTelemetry standardizes log emission so that:

  • Your application logs go to stderr (or a log collector).
  • Each log carries trace_id and span_id so you can correlate logs back to the trace that caused them.

Logs are text-heavy and best for debugging specific incidents. This series focuses on metrics and traces; logging is a separate concern.

Traces

A trace is a complete record of a single user request or business transaction. It is made up of spans, which are individual operations. A span records:

  • When it started and how long it took.
  • What service it ran in.
  • Inputs and outputs (optional attributes).
  • Whether it succeeded or failed.
  • Its relationship to parent and child spans.

A typical web request might generate 10-50 spans across 3-5 services. A trace stitches them all together so you can see the full picture.

The OpenTelemetry Ecosystem in Rust

Rust's OpenTelemetry ecosystem includes:

  • opentelemetry (https://github.com/open-telemetry/opentelemetry-rust) — core APIs and SDK.
  • opentelemetry-prometheus — export metrics to Prometheus.
  • opentelemetry-jaeger — export traces to Jaeger.
  • Automatic instrumentation — crates like tracing and tracing-opentelemetry bridge the popular tracing ecosystem to OTel.

Most Rust teams start with the core opentelemetry crate, add opentelemetry-prometheus for metrics, and opentelemetry-jaeger for tracing.

Key Concepts You Will Need

Before diving into code, learn these:

  • Instrument: to add code that emits observability data (e.g., call meter.create_counter() to emit metrics).
  • Exporter: code that sends collected data to a backend (Prometheus scraper, Jaeger agent).
  • Meter: an object that creates metric instruments (counters, gauges, histograms).
  • Tracer: an object that creates spans.
  • Span: an individual operation, with a name, timestamps, and attributes.
  • Trace ID: a unique identifier shared across all spans in a single request.
  • Baggage: metadata (like user ID) carried across all spans in a trace.

Key Takeaways

  • Observability via metrics, logs, and traces is essential for production systems; 76% of engineers report tracing is critical.
  • OpenTelemetry is vendor-neutral, standardized, and works with Prometheus, Jaeger, Grafana, and dozens of backends without code changes.
  • Metrics measure aggregated behavior; traces show individual requests across service boundaries.
  • Rust's OpenTelemetry ecosystem is mature and integrates with the popular tracing crate for zero-cost instrumentation.
  • Starting with basic metrics (counters, gauges) is easier than tracing; both are important for different questions.

Frequently Asked Questions

Is OpenTelemetry tied to a specific backend?

No. OpenTelemetry is vendor-neutral and works with any backend that speaks OTLP, Prometheus, or gRPC. You can emit metrics to Prometheus and traces to Jaeger from the same code. If you switch backends later, only configuration changes; instrumentation code stays the same.

Does OpenTelemetry have a performance cost?

Metrics have negligible overhead (nanoseconds per emission). Traces have higher overhead because they record every span, but sampling strategies (e.g., trace 1 in 1,000 high-latency requests) minimize impact. Most production Rust services use OpenTelemetry with less than 1% CPU overhead.

Do I have to use both metrics and tracing?

No. You can use only metrics for dashboarding, only tracing for incident investigation, or both. Most teams start with metrics, then add tracing as they grow to multiple services. This series covers both because production observability requires both.

Can OpenTelemetry instrument third-party libraries?

Yes. Many popular Rust crates (tokio, tonic, sqlx) have built-in OpenTelemetry support. Additionally, the tracing-opentelemetry crate lets you collect existing tracing spans into OpenTelemetry traces automatically.

How does OpenTelemetry relate to the tracing crate?

tracing is Rust's de facto standard for structured logging and spans. tracing-opentelemetry connects tracing to OpenTelemetry, so you instrument with tracing macros and automatically emit to OpenTelemetry backends. This is the recommended approach for most Rust projects.

Further Reading