Most teams I work with have the same problem: they’ve instrumented their services with OpenTelemetry, they have traces flowing into their backend, and yet — when something breaks in production — the traces don’t actually help them find the problem.

This isn’t an OpenTelemetry problem. It’s a design problem.

The Instrumentation Trap

When engineers first discover OpenTelemetry, the reaction is almost always the same: instrument everything. Add spans to every function, every database call, every HTTP request. More data means more visibility, right?

Not quite.

The problem with instrumenting everything is that you end up with a trace that looks like a call stack — hundreds of spans, nested five levels deep, each one named after a function or a class method. It’s technically a trace, but it’s not useful observability data.

// What most people do: instrument the implementation
tracer.spanBuilder("UserService._loadUserFromCache").startSpan().use {
    user = cache.get(userId)
}

tracer.spanBuilder("UserService._queryDatabase").startSpan().use {
    user = db.query("SELECT * FROM users WHERE id = ?", userId)
}

This gives you timing data for your internal functions. That’s rarely what you need when debugging a production incident.

What Spans Should Tell You

A well-designed span should answer a business question, not describe an implementation detail. The key shift is moving from “what is the code doing?” to “what is the system doing?”

Compare these two approaches:

Bad span nameGood span name
UserService._loadFromDbuser.fetch
CacheManager.getcache.lookup{resource=user}
HttpClient.executeGET /api/users/{id}
JSONSerializer.serialize(not worth a span)

The good names describe what happened in terms your team can understand during an incident, not how the code works internally.

The Three Questions Every Span Should Answer

Before adding a span, ask yourself:

  1. Would this span help me during an incident? If you can’t imagine a scenario where you’d be grateful this span existed at 3am, don’t add it.
  2. Does this span represent a meaningful unit of work? Spans should map to something a non-engineer could understand — “fetch user”, “send email”, “process payment” — not “call method”.
  3. Does this span have useful attributes? A span without context is almost useless. If you’re adding a user.fetch span, make sure it carries user.id, db.name, and whether it was a cache hit.

Attributes Are More Important Than Spans

Here’s the thing most teams get wrong: the value of a trace isn’t the spans themselves — it’s the attributes attached to those spans.

tracer.spanBuilder("user.fetch").startSpan().use { span ->
    span.setAttribute("user.id", userId)
    span.setAttribute("db.name", "users_primary")
    span.setAttribute("cache.hit", false)

    val user = db.fetchUser(userId)

    span.setAttribute("user.tier", user.subscriptionTier)
    span.setAttribute("user.region", user.region)

    user
}

Now when you’re debugging a latency spike, you can filter by user.region = "eu-west" or user.tier = "free" and immediately narrow down whether the issue affects all users or a specific segment.

A Practical Heuristic

When reviewing instrumentation, I use this rule: one span per I/O boundary, plus spans for significant business operations.

  • Every external HTTP call: one span
  • Every database query (not each query in a loop): one span per logical operation
  • Every cache operation (check + set): one span
  • Payment processing, order creation, email sending: one span each
  • Internal method calls, JSON serialization, string manipulation: no spans

This keeps your traces readable and your trace storage costs manageable.

A well-structured Kotlin service might look like this:

class OrderService(private val tracer: Tracer) {

    fun processOrder(orderId: String, userId: String): Order {
        // ONE span for the business operation
        return tracer.spanBuilder("order.process")
            .startSpan()
            .use { span ->
                span.setAttribute("order.id", orderId)
                span.setAttribute("user.id", userId)

                // Internal methods — no extra spans needed
                val order = fetchOrder(orderId)
                validateOrder(order)

                // ONE span per external I/O boundary
                val payment = chargePayment(order)  // has its own span inside
                notifyWarehouse(order)               // has its own span inside

                span.setAttribute("order.amount", order.totalAmount)
                span.setAttribute("payment.id", payment.id)
                order
            }
    }

    private fun chargePayment(order: Order): Payment {
        return tracer.spanBuilder("payment.charge")
            .setSpanKind(SpanKind.CLIENT)
            .startSpan()
            .use { span ->
                span.setAttribute("payment.provider", "stripe")
                span.setAttribute("payment.amount", order.totalAmount)
                paymentGateway.charge(order)
            }
    }
}

Conclusion

Effective observability isn’t about collecting more data — it’s about collecting the right data. Start by defining what questions you need to answer during incidents, then design your instrumentation to answer those questions.

If you can look at a trace during a production incident and immediately understand what the system was doing, which external calls were slow, and what user context was involved — your instrumentation is good. If you’re staring at a wall of nested function names, start over.