Skip to main content

Visualizing Distributed Traces with Jaeger: Setup and UI

Jaeger is an open-source distributed tracing platform that collects, stores, and visualizes OpenTelemetry traces. It answers critical questions: "Why was this request slow?", "Where did the error occur?", and "What was the dependency chain?" Jaeger's UI lets you search traces by service, duration, tags, and errors, then drill into individual spans to see exact timing and attributes.

This article covers deploying Jaeger locally and in production, exploring the UI, and analyzing traces to debug latency issues.

Understanding Jaeger Architecture

Jaeger has four main components:

  1. Agent (UDP port 6831) — receives spans from applications and batches them.
  2. Collector (HTTP port 14268) — validates and stores spans in a backend.
  3. Backend (Elasticsearch, Cassandra, Badger) — persistent storage for spans.
  4. UI (web server, port 16686) — web interface for searching and visualizing traces.

For development, the "all-in-one" image bundles all components. For production, you typically run agent and collector separately and configure a persistent backend.

Starting Jaeger Locally with Docker

The quickest way to run Jaeger locally:

docker run -p 6831:6831/udp \
-p 16686:16686 \
-p 14268:14268 \
jaegertracing/all-in-one:latest

This starts:

  • Agent on UDP port 6831 (where your Rust app sends spans).
  • UI on HTTP port 16686.
  • Collector on HTTP port 14268.

Verify it is running:

curl http://localhost:16686  # Should return HTML

Configuring Your Rust App to Export to Jaeger

Use opentelemetry-jaeger (configured in earlier articles) to export spans:

use opentelemetry_jaeger::new_pipeline;
use opentelemetry::sdk::Resource;
use opentelemetry::KeyValue;

let resource = Resource::new(vec![
KeyValue::new("service.name", "my_rust_service"),
KeyValue::new("service.version", "1.0.0"),
]);

let jaeger_tracer = new_pipeline()
.install_simple()
.expect("Jaeger tracer failed");

By default, .install_simple() sends spans to localhost:6831 (Jaeger agent). For production, configure the collector endpoint:

let jaeger_tracer = new_pipeline()
.install_batch(opentelemetry::runtime::TokioCurrentThread)
.expect("Jaeger tracer failed");

// Or specify a custom collector URL:
let jaeger_tracer = opentelemetry_jaeger::new_pipeline()
.http_client(reqwest::Client::new())
.collector_endpoint("http://jaeger-collector.production:14268/api/traces")
.install_batch(opentelemetry::runtime::TokioCurrentThread)
.expect("Jaeger tracer failed");

For batching (recommended for production), use .install_batch() instead of .install_simple(). Batching collects spans in a buffer and sends them periodically (every few seconds), reducing overhead.

Exploring the Jaeger UI

Open http://localhost:16686. The UI has four sections:

On the left sidebar, you can filter traces by:

  • Service — dropdown listing all services that sent spans.
  • Operation — spans with specific names (http_request, database_query).
  • Tags — key-value filters (http.status_code=500, error=true).
  • Min/Max Duration — filter by latency range.
  • Limit — number of traces to display.

Example: Find all failed requests to service "payment-service":

  1. Select Service: "payment-service".
  2. Select Tags: add error=true.
  3. Click "Find Traces".

You will see a list of traces matching the criteria, sorted by newest first.

2. Trace Timeline

Click a trace to expand it. The timeline view shows:

  • Service rows — each service involved in the trace.
  • Span bars — each operation, with color indicating success (green) or error (red).
  • Duration labels — elapsed time for each span.

Hover over a span to see:

  • Span name and ID.
  • Exact timing (start time, duration).
  • Attributes and events.

Example timeline for a three-service call chain:

service-a: handle_request ████████ 150ms
└─ service-a: call_service_b ████ 140ms
└─ service-b: fetch_item █████ 135ms
└─ service-b: database_query ███ 100ms
└─ service-b: cache_lookup █ 5ms

From this, you can see that the database query is the bottleneck.

3. Span Details

Click a specific span to see its details panel:

  • Tags — all attributes (http.method, user.id, etc.).
  • Logs — events emitted during the span (errors, warnings).
  • Process — service name and version.

Example span details:

Span: database_query (100.5 ms)
Tags:
db.operation: SELECT
db.name: users
db.table: accounts
db.rows_affected: 42

Logs:
[2026-06-02 10:23:45.123] Query started
[2026-06-02 10:23:45.223] Query completed

4. Statistics and Comparisons

Click "Statistics" to see aggregate metrics:

  • Services — list of services in the trace.
  • Operation latencies — min/max/avg per operation.
  • Error rate — percentage of spans that failed.

This helps identify which operations are consistently slow.

Debugging Common Issues with Jaeger

No traces appearing in Jaeger

Problem: You started Jaeger but your spans are not showing up.

Diagnosis:

  1. Check that your Rust app is running and calling instrumented functions.
  2. Verify the Jaeger agent is listening: netstat -an | grep 6831 (on Unix) or netstat -an | findstr :6831 (on Windows).
  3. Test connectivity: curl http://localhost:16686/api/services — should return JSON list of services.

Fix:

  • Ensure you called new_pipeline().install_simple() in your Rust app.
  • Ensure the Jaeger container is running: docker ps | grep jaeger.
  • If using a custom collector, verify the URL is correct and reachable.

Traces are incomplete (missing services)

Problem: You see spans from service A but not service B, even though B was called.

Diagnosis:

  • Service B may not have OpenTelemetry instrumentation.
  • Context propagation may be broken (service B did not receive trace context headers).

Fix:

  • Add #[instrument] macros to service B's handler.
  • Verify that you are injecting trace context headers when calling service B from service A.
  • Check that service B is extracting trace context from headers correctly.

High memory usage in Jaeger

Problem: Jaeger container is using excessive memory.

Diagnosis:

  • Too many unique spans or high cardinality tags.
  • Traces are not being cleaned up (no retention configured).

Fix:

  • Lower sampling rate (record 1 in 100 traces instead of all).
  • Remove high-cardinality tags (user IDs, arbitrary strings).
  • Configure retention in Jaeger (default is unlimited):
sampling:
type: probabilistic
param: 0.1 # Sample 10% of traces

Exporting Traces from Jaeger for Long-Term Storage

Jaeger's all-in-one uses in-memory storage (lost on restart). For production:

  1. Configure Elasticsearch backend:
docker run \
-e COLLECTOR_OTLP_ENABLED=true \
-e SPAN_STORAGE_TYPE=elasticsearch \
-e ES_SERVER_URLS=http://elasticsearch:9200 \
jaegertracing/all-in-one
  1. Or use Cassandra:
docker run \
-e SPAN_STORAGE_TYPE=cassandra \
-e CASSANDRA_SERVERS=cassandra:9042 \
jaegertracing/all-in-one

Elasticsearch is recommended for ease of use. Cassandra scales better for high-volume tracing (millions of spans per day).

Integrating Jaeger with Prometheus Metrics

Jaeger has built-in Prometheus integration. It exposes metrics on port 14269:

curl http://localhost:14269/metrics | grep jaeger

You can add Jaeger itself as a Prometheus scrape target:

scrape_configs:
- job_name: 'jaeger'
static_configs:
- targets: ['localhost:14269']

This lets you alert on Jaeger health (e.g., "Alert if Jaeger drops traces").

Key Takeaways

  • Jaeger collects, stores, and visualizes OpenTelemetry traces.
  • The all-in-one Docker image is perfect for local development.
  • Use Jaeger's UI to search traces by service, tags, and duration.
  • Analyze timeline views to identify bottlenecks (slow operations).
  • Configure persistent backends (Elasticsearch) for production.
  • Monitor Jaeger itself with Prometheus to track trace collection health.

Frequently Asked Questions

What is the difference between Jaeger and Grafana Loki?

Jaeger is designed for distributed tracing (causality between operations). Loki is designed for log aggregation (text search). Jaeger is better for understanding latency; Loki is better for debugging with logs. Many teams use both.

Does Jaeger support sampling?

Yes. Jaeger can sample traces (record 1 in 100 or based on rules). Configure sampling in your Rust app:

let sampler = opentelemetry::sdk::trace::Sampler::TraceIdRatioBased(0.1);
let jaeger_tracer = new_pipeline()
.with_sampler(sampler)
.install_simple()
.expect("Jaeger failed");

Can I export to multiple backends simultaneously?

Yes. You can export traces to Jaeger and also to OTLP, or Datadog, etc. Use multiple exporters in your Rust app.

How long does Jaeger retain traces by default?

The all-in-one retains traces indefinitely (in-memory). With Elasticsearch backend, it depends on your Elasticsearch retention policy (typically 7 days to never).

Can I correlate Jaeger traces with Prometheus metrics?

Yes. Both should use the same trace ID and service name. In Grafana, you can set up links from a metric panel to Jaeger, passing the trace ID to open the corresponding trace.

Further Reading