Skip to content

Functorium Tracing Manual

Learn how to use distributed tracing in the Functorium framework to visualize the entire journey of a request and identify performance bottlenecks.

“Why did this request take 3 seconds?” “Which service caused the delay?” “Where exactly did the failed request fail?”

Modern applications operate through the collaboration of multiple services and components. While a single HTTP request is being processed, various tasks such as database queries, external API calls, and cache accesses are executed sequentially or in parallel. In such an environment, answering the question “where did it slow down?” is extremely difficult with logs or metrics alone.

Distributed Tracing is a technique that visualizes these complex request flows as a single “journey.” It traces the entire path of a request through the system and measures the time spent at each step.

Functorium automatically provides distributed tracing capabilities that follow the OpenTelemetry Tracing standard.

This document covers the following topics:

  1. Core concepts of distributed tracing - Relationship between Trace, Span, and Context
  2. Structure of Spans automatically generated by Functorium - Span design per architecture layer
  3. Request flow tracing through Parent-Child relationships - Understanding hierarchical structures
  4. Trace analysis methods using Jaeger and Grafana Tempo - Identifying bottleneck segments

A basic understanding of the following concepts is needed to understand this document:

  • Content from the Functorium Logging Manual (field naming, architecture layers)
  • Basic concepts of asynchronous programming
  • HTTP request/response model

Core principle: Distributed tracing visualizes the entire journey of a request through the system using Parent-Child Span relationships. Functorium automatically generates Spans at the Application Layer and Adapter Layer, recording even Expected errors with Error status while distinguishing their nature through the error.type tag.

# Search for Spans of a specific handler
{span.request.handler="CreateOrderCommandHandler" && span.response.status="failure"}
# Search for slow Spans
{span.response.elapsed > 1.0}
# Search for system error Spans
{span.error.type="exceptional"}
  1. Activate Tracing Pipeline with ConfigurePipelines(p => p.UseObservability()) (UseObservability() enables CtxEnricher, Metrics, Tracing, and Logging all at once)
  2. Application Layer: UsecaseTracingPipeline automatically generates Spans (Kind: Internal)
  3. Adapter Layer: Source Generator automatically generates Span code
  4. Visualize request flow and identify bottlenecks via Parent-Child relationships in Jaeger/Tempo
ConceptDescription
TraceThe entire journey of a single request through the system (unique Trace ID)
SpanAn individual work unit within a Trace (includes start time, duration, and tags)
Parent-ChildUsecase Span is the parent of Adapter Span - represents hierarchical call structure
Span NameApplication: {layer} {category}.{cqrs} {handler}.{method}, Adapter: {layer} {category} {handler}.{method}
StatusOk/Error - Even Expected errors have Error status (distinguished by error.type tag)
response.elapsedIncluded as a tag in tracing (no cardinality issues since each Span is stored as an individual document)

To understand distributed tracing, you need to know three core concepts.

A Trace represents the entire journey of a single request through the system. For example, all operations triggered when a user clicks the “Place Order” button are grouped into a single Trace.

Each Trace has a unique Trace ID:

Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736

This ID is a 128-bit random value that is globally unique. All Spans with the same Trace ID belong to a single request processing flow.

A Span is an individual work unit within a Trace. A single Trace consists of multiple Spans. Each Span records “when it started and how long it took.”

Example: Order Processing Trace

Trace: Order Processing (Trace ID: 4bf92f...)
|
+-- Span: HTTP POST /api/orders (1.5s)
|
+-- Span: CreateOrderCommandHandler.Handle (1.2s)
|
+-- Span: OrderRepository.Save (0.3s)
|
+-- Span: PaymentGateway.ProcessPayment (0.8s)
|
+-- Span: NotificationService.SendEmail (0.1s)

Each Span contains the following information:

attributeDescriptionExample
NameWhat operation is it?”CreateOrderCommandHandler.Handle”
Start TimeWhen did it start?2024-01-15T10:30:45.123Z
DurationHow long did it take?1.2 seconds
TagsAdditional metadataresponse.status = "success"
Parent SpanWhich Span invoked this operation?HTTP POST /api/orders
Statussuccess/failureOk / Error

Context is the information that links Spans together into a single Trace. Context includes the Trace ID and the current Span ID.

When a request is passed between services, the Context is also propagated along with it. For HTTP, it is propagated via the traceparent header:

HTTP Header:
traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01
| | | |
| | | +-- Flags
| | +-- Parent Span ID (16 chars)
| +-- Trace ID (32 chars)
+-- Version

Thanks to this header, Spans created in different services can be linked together into a single Trace.

Spans have a hierarchical structure. Parent Spans contain child Spans. This relationship is similar to a call stack.

Visualization:

Time -->
| |
| CreateOrderCommandHandler.Handle (1.2s) |
| [=================================================] |
| | | |
| | OrderRepository.Save (0.3s) | |
| | [============] | |
| | | | |
| | | PaymentGateway.ProcessPayment (0.8s) |
| | | [================================] |
| | | | |
| | | | Email |
| | | | [==] |
| | | | (0.1s) |
| | | | |
| 0s 0.3s 1.1s 1.2s

Looking at this structure:

  1. The entire request (Handle) took 1.2 seconds.
  2. PaymentGateway took the longest at 0.8 seconds (66%).
  3. Optimizing PaymentGateway would be the most effective way to improve overall performance.

In .NET, OpenTelemetry’s Span is implemented using the System.Diagnostics.Activity class. When you see the term “Activity” in Functorium code, it is equivalent to OpenTelemetry’s Span.

OpenTelemetry Term.NET Term
SpanActivity
SpanContextActivityContext
TracerActivitySource
Status.OKActivityStatusCode.Ok
Status.ERRORActivityStatusCode.Error

This document uses OpenTelemetry terminology (Span) by default. Activity is only used in code examples that directly work with the .NET API.

The three observability tools answer different questions:

ToolQuestionData Type
LoggingWhat happened?Individual events
MetricsHow much/how fast?Aggregated numbers
TracingWhere was time spent?Request paths

Real-world scenario:

  1. Alert triggered: Metrics show “P99 response time exceeded 2 seconds”
  2. Root cause investigation: Tracing reveals “1.8 second delay at PaymentGateway”
  3. Detailed verification: Logging confirms “PaymentGateway timeout error message”

Now that we understand the relationship between Trace, Span, and Context from the distributed tracing fundamentals, let’s look at how Functorium automates these concepts per architecture layer.


Functorium automatically generates Spans at two architecture layers.

HTTP Request Arrives
|
v
+------------------------------------------------------------------+
| Application Layer |
| +------------------------------------------------------------+ |
| | Span: "application usecase.command | |
| | CreateOrderCommandHandler.Handle" | |
| | Kind: Internal | |
| | Tags: request.layer, request.category, etc. | |
| | Status: Ok or Error | |
| +------------------------------------------------------------+ |
| | |
| v (Parent-Child Relationship) |
| +------------------------------------------------------------+ |
| | Adapter Layer | |
| | +------------------------------------------------------+ | |
| | | Span: "adapter repository | | |
| | | OrderRepository.Save" | | |
| | | Kind: Internal | | |
| | | Tags: request.layer, request.category, etc. | | |
| | | Status: Ok or Error | | |
| | +------------------------------------------------------+ | |
| +------------------------------------------------------------+ |
+------------------------------------------------------------------+

The Application Layer Span becomes the parent of the Adapter Layer Span. Thanks to this relationship, you can clearly trace what path a request took.

OpenTelemetry defines Kind to indicate the role of a Span:

KindDescriptionExample
ServerReceives and processes external requestsHTTP server endpoint
ClientCalls external servicesHTTP client, DB query
InternalInternal processingBusiness logic processing
ProducerPublishes asynchronous messagesMessage queue publishing
ConsumerReceives asynchronous messagesMessage queue consumption

All auto-generated Spans in Functorium use the Internal Kind. Spans for HTTP request reception or database calls are generated separately by ASP.NET Core and database libraries.

Functorium uses a consistent Span naming pattern:

Application Layer:

{layer} {category}.{cqrs} {handler}.{method}
Examples:
- application usecase.command CreateOrderCommandHandler.Handle
- application usecase.query GetOrderQueryHandler.Handle

Adapter Layer:

{layer} {category} {handler}.{method}
Examples:
- adapter repository OrderRepository.Save
- adapter repository OrderRepository.GetById
- adapter gateway PaymentGateway.ProcessPayment

Benefits of this naming convention:

  1. Consistency: All Spans follow the same pattern
  2. Searchability: Enables quick filtering by Span name
  3. Self-descriptive: You can understand the operation just by looking at the name

Each Span has the following basic attributes:

attributeDescriptionExample
TraceIdID of the Trace it belongs to4bf92f3577b34da6a3ce929d0e0e4736
SpanIdUnique ID of the Span00f067aa0ba902b7
ParentSpanIdID of the parent Span (root if none)5b8a8f6d3e7c9a1b
NameSpan nameapplication usecase.command…
KindSpan kindInternal
StartTimeStart time2024-01-15T10:30:45.123Z
EndTimeEnd time2024-01-15T10:30:46.323Z
DurationElapsed time1.2s
StatusStatus codeOk / Error
TagsAdditional metadatarequest.handler = ”…”

A Span’s Status indicates the success/failure of the operation:

StatusDescriptionWhen to Use
UnsetStatus not setDefault
OksuccessNormal processing complete
ErrorfailureError occurred

In Functorium:

  • response.status = "success"ActivityStatusCode.Ok
  • response.status = "failure"ActivityStatusCode.Error

Important: Even Expected errors (business errors) have Error status. This is because it means “the request did not achieve the desired result.” The nature of the error (Expected vs Exceptional) is distinguished by the error.type tag.

Time measurement for a Span works as follows:

StartTime EndTime
| |
v v
+--------------------------------------------------+
| Duration (1.2s) |
| |
| +----------+ +--------------------+ +----+ |
| | 0.3s | | 0.8s | |0.1s| |
| +----------+ +--------------------+ +----+ |
| OrderRepo PaymentGateway Email |
+--------------------------------------------------+

Duration calculation:

Duration = EndTime - StartTime = 1.2s
Child Span Total = 0.3 + 0.8 + 0.1 = 1.2s

If the sum of child Spans equals the parent Span’s Duration, it means no additional work was done in the parent. If there is a difference, that time was spent on work performed directly by the parent Span (logic execution, data transformation, etc.).


Tags provide additional context to Spans. Functorium uses the same tag keys as logging and metrics to maintain 3-Pillar consistency.

Tag structure table:

tag keysuccessfailureDescription
request.layer”application""application”layer identifier
request.category.name”usecase""usecase”category identifier
request.category.type”command”/“query""command”/“query”CQRS type
request.handler.namehandler namehandler namehandler class name
request.handler.method”Handle""Handle”method name
response.elapsedprocessing timeprocessing timein seconds
response.status”success""failure”response status
error.type-”expected”/“exceptional”/“aggregate”error classification
error.code-error codedomain error code
total tag count79

Example - Command success:

{
"name": "application usecase.command CreateOrderCommandHandler.Handle",
"status": "Ok",
"tags": {
"request.layer": "application",
"request.category.name": "usecase",
"request.category.type": "command",
"request.handler.name": "CreateOrderCommandHandler",
"request.handler.method": "Handle",
"response.elapsed": 0.1234,
"response.status": "success"
}
}

Example - Command failure:

{
"name": "application usecase.command CreateOrderCommandHandler.Handle",
"status": "Error",
"tags": {
"request.layer": "application",
"request.category.name": "usecase",
"request.category.type": "command",
"request.handler.name": "CreateOrderCommandHandler",
"request.handler.method": "Handle",
"response.elapsed": 0.0567,
"response.status": "failure",
"error.type": "expected",
"error.code": "Order.InsufficientStock"
}
}

The Adapter Layer does not have CQRS distinction, so there is no request.category.type tag.

Tag structure table:

tag keysuccessfailureDescription
request.layer”adapter""adapter”layer identifier
request.category.namecategory namecategory namecategory identifier
request.handler.namehandler namehandler namehandler class name
request.handler.methodmethod namemethod namemethod name
response.elapsedprocessing timeprocessing timein seconds
response.status”success""failure”response status
error.type-”expected”/“exceptional”/“aggregate”error classification
error.code-error codedomain error code
total tag count68

Example - Repository success:

{
"name": "adapter repository OrderRepository.GetById",
"status": "Ok",
"tags": {
"request.layer": "adapter",
"request.category.name": "repository",
"request.handler.name": "OrderRepository",
"request.handler.method": "GetById",
"response.elapsed": 0.0456,
"response.status": "success"
}
}

Why response.elapsed Is Included in Tracing

Section titled “Why response.elapsed Is Included in Tracing”

In metrics, we explained that response.elapsed is recorded as a Histogram rather than a tag. However, in tracing it is included as a tag. Why is that?

Differences:

AspectMetricsTracing
PurposeAggregate analysisIndividual request tracking
CardinalityNeed to limit time series countSpans are individual events
Storage methodTime series per tag combinationSpan document unit

In tracing, each Span is stored as an individual document. Different response.elapsed values do not create separate time series. Therefore, there are no cardinality issues when including it as a tag.

Additionally, being able to check the exact processing time as a tag in individual Spans allows you to quickly assess the performance of a specific request.

ctx.* Span Attribute — User-Defined Business Context

Section titled “ctx.* Span Attribute — User-Defined Business Context”

CtxEnricherPipeline runs first in the pipeline, automatically setting ctx.* fields with the CtxPillar.Tracing flag as Span Attributes via Activity.Current?.SetTag. With the default setting (CtxPillar.Default = Logging | Tracing), all ctx. fields are included in Span Attributes.*

Pipeline execution order:
CtxEnricher → Metrics → Tracing → Logging → ... → Handler
CtxEnricherPipeline:
ctx.customer_id = "CUST-001" → Activity.Current.SetTag("ctx.customer_id", "CUST-001")
ctx.region_code = "us-west" → Activity.Current.SetTag("ctx.region_code", "us-west")
ctx.internal_note = "..." → No SetTag ([CtxTarget(CtxPillar.Logging)] → Excluded from Tracing)

Since Spans are stored individually, high-cardinality fields (customer_id, Guid) are safe as Span Attributes. OpenTelemetry recommends rich attributes on Spans for debugging purposes.

To exclude from Tracing:

[CtxTarget(CtxPillar.Logging)] // Logging only — not included in Tracing Span
string InternalNote

Application Layer tracing is automatically performed by UsecaseTracingPipeline.

public class UsecaseTracingPipeline<TRequest, TResponse>
{
public async ValueTask<TResponse> Handle(TRequest request, ...)
{
// 1. Create and start Span
using var activity = _activitySource.StartActivity(spanName);
// 2. Add request tags
activity?.SetTag("request.layer", "application");
activity?.SetTag("request.category.name", "usecase");
// ... remaining tags
// 3. Execute handler
var response = await next(request, cancellationToken);
// 4. Add response tags
activity?.SetTag("response.status", response.IsSucc ? "success" : "failure");
activity?.SetTag("response.elapsed", elapsed.TotalSeconds);
// 5. Add additional tags on error
if (response.IsFail)
{
activity?.SetTag("error.type", GetErrorType(response));
activity?.SetTag("error.code", GetErrorCode(response));
activity?.SetStatus(ActivityStatusCode.Error);
}
else
{
activity?.SetStatus(ActivityStatusCode.Ok);
}
// 6. End Span (automatic via using)
return response;
}
}

Application Layer Span names follow this format:

{layer} {category}.{cqrs} {handler}.{method}

Generation logic:

var cqrsType = GetCqrsType<TRequest>(); // "command" or "query"
var handlerName = typeof(TRequest).Name.Replace("Request", "Handler");
var spanName = $"application usecase.{cqrsType} {handlerName}.Handle";

Examples:

Request TypeSpan Name
CreateOrderCommandRequestapplication usecase.command CreateOrderCommandHandler.Handle
GetOrderQueryRequestapplication usecase.query GetOrderQueryHandler.Handle

Custom Tracing Extension (UsecaseTracingCustomPipelineBase)

Section titled “Custom Tracing Extension (UsecaseTracingCustomPipelineBase)”

In addition to the Spans automatically generated by the default UsecaseTracingPipeline, you can add custom Activities (Spans) per Usecase. Inherit from UsecaseTracingCustomPipelineBase<TRequest> to implement fine-grained tracing that fits your business context.

public abstract class UsecaseTracingCustomPipelineBase<TRequest>
: UsecasePipelineBase<TRequest>, ICustomUsecasePipeline
{
protected Activity? StartCustomActivity(string operationName, ActivityKind kind = ActivityKind.Internal);
protected string GetActivityName(string operationName);
protected static void SetStandardRequestTags(Activity activity, string method);
}
  • StartCustomActivity(operationName, kind): Creates a custom Activity (Span). If a parent Activity.Current exists, it is created as a child span. Activity name format: {layer} {category}.{cqrs} {handler}.{operationName}
  • GetActivityName(operationName): Retrieves the Activity name.
  • SetStandardRequestTags(activity, method): Automatically sets the 5 standard request tags:
    • request.layer (application)
    • request.category.name (usecase)
    • request.category.type (command/query)
    • request.handler.name (Handler name)
    • request.handler.method (method name)

Implementation Example (PlaceOrderCommand.TracingPipeline)

Section titled “Implementation Example (PlaceOrderCommand.TracingPipeline)”
public sealed class PlaceOrderTracingPipeline
: UsecaseTracingCustomPipelineBase<PlaceOrderCommand.Request>
, IPipelineBehavior<PlaceOrderCommand.Request, FinResponse<PlaceOrderCommand.Response>>
{
public PlaceOrderTracingPipeline(ActivitySource activitySource) : base(activitySource) { }
public async ValueTask<FinResponse<PlaceOrderCommand.Response>> Handle(
PlaceOrderCommand.Request request,
MessageHandlerDelegate<PlaceOrderCommand.Request, FinResponse<PlaceOrderCommand.Response>> next,
CancellationToken ct)
{
using Activity? activity = StartCustomActivity("ValidateOrder");
if (activity != null)
{
SetStandardRequestTags(activity, "ValidateOrder");
activity.SetTag("order.line_count", request.Lines.Count);
activity.SetTag("order.customer_id", request.CustomerId);
}
return await next(request, ct);
}
}

UsecaseTracingCustomPipelineBase<TRequest> implements ICustomUsecasePipeline, so it is explicitly registered using AddCustomPipeline<T>(). Individual registration is used instead of assembly scanning to guarantee deterministic pipeline execution order:

.ConfigurePipelines(p => p
.UseObservability()
.AddCustomPipeline<PlaceOrderCommandTracingPipeline>())

Reference: Custom Extension


Adapter Layer tracing is performed by code automatically generated by the Source Generator.

The Source Generator automatically generates tracing code for interfaces annotated with the [ObservabilityPipeline] attribute.

Original interface:

[ObservabilityPipeline("repository")]
public interface IOrderRepository
{
FinT<IO, Order> GetById(Guid id);
FinT<IO, Unit> Save(Order order);
}

Generated code (simplified):

public partial class OrderRepositoryPipeline : IOrderRepository
{
public FinT<IO, Order> GetById(Guid id)
{
return FinT<IO, Order>.LiftIO(async () =>
{
using var activity = _activitySource.StartActivity(
"adapter repository OrderRepository.GetById");
activity?.SetTag("request.layer", "adapter");
activity?.SetTag("request.category.name", "repository");
activity?.SetTag("request.handler.name", "OrderRepository");
activity?.SetTag("request.handler.method", "GetById");
var stopwatch = Stopwatch.StartNew();
var result = await _inner.GetById(id).Run().RunAsync();
stopwatch.Stop();
activity?.SetTag("response.elapsed", stopwatch.Elapsed.TotalSeconds);
activity?.SetTag("response.status",
result.IsFail ? "failure" : "success");
if (result.IsFail)
{
activity?.SetTag("error.type", GetErrorType(result));
activity?.SetTag("error.code", GetErrorCode(result));
activity?.SetStatus(ActivityStatusCode.Error);
}
else
{
activity?.SetStatus(ActivityStatusCode.Ok);
}
return result;
});
}
}

Adapter Layer Span names follow this format:

{layer} {category} {handler}.{method}

Examples:

HandlerMethodSpan Name
OrderRepositoryGetByIdadapter repository OrderRepository.GetById
OrderRepositorySaveadapter repository OrderRepository.Save
PaymentGatewayProcessPaymentadapter gateway PaymentGateway.ProcessPayment

DomainEvent tracing records the event publishing and handling process as Spans. It forms Parent-Child relationships of Usecase Span → Publisher Span → Handler Span(s).

application usecase.command CreateProductCommandHandler.Handle [Parent]
├─ adapter repository InMemoryProductRepository.ExistsByName [Child]
├─ adapter repository InMemoryProductRepository.Create [Child]
└─ adapter event PublishTrackedEvents.PublishTrackedEvents [Child - Publisher]
└─ application usecase.event OnProductCreated.Handle [Grandchild - Handler]

Publisher Spans belong to the Adapter layer, and Handler Spans belong to the Application layer. When a single Publisher calls multiple Handlers, multiple Handler Spans are generated.

Span Name:

MethodSpan Name PatternExample
Publishadapter event {EventType}.Publishadapter event CreatedEvent.Publish
PublishTrackedEventsadapter event PublishTrackedEvents.PublishTrackedEventsadapter event PublishTrackedEvents.PublishTrackedEvents

Kind: Internal

Tag structure for single event publishing:

tag keyRequestSuccessFailure
request.layer”adapter""adapter""adapter”
request.category.name”event""event""event”
request.handler.nameevent type nameevent type nameevent type name
request.handler.method”Publish""Publish""Publish”
response.elapsed-processing time (sec)processing time (sec)
response.status-”success""failure”
error.type--“expected”/“exceptional”
error.code--error code
total tag count468

Publisher Tag Structure (PublishTrackedEvents)

Section titled “Publisher Tag Structure (PublishTrackedEvents)”

Tag structure for tracked Aggregate event publishing:

tag keyRequestSuccessPartial FailureTotal Failure
request.layer”adapter""adapter""adapter""adapter”
request.category.name”event""event""event""event”
request.handler.name”PublishTrackedEvents""PublishTrackedEvents""PublishTrackedEvents""PublishTrackedEvents”
request.handler.method”PublishTrackedEvents""PublishTrackedEvents""PublishTrackedEvents""PublishTrackedEvents”
request.aggregate.countaggregate countaggregate countaggregate countaggregate count
request.event.countevent countevent countevent countevent count
response.elapsed-processing time (sec)processing time (sec)processing time (sec)
response.status-”success""failure""failure”
response.event.success_count--success count-
response.event.failure_count--failure count-
error.type---“expected”/“exceptional”
error.code---error code
total tag count681010

Span Name:

application usecase.event {HandlerName}.Handle

Example: application usecase.event OnProductCreated.Handle

Kind: Internal

tag keySuccessFailure
request.layer”application""application”
request.category.name”usecase""usecase”
request.category.type”event""event”
request.handler.namehandler namehandler name
request.handler.method”Handle""Handle”
request.event.typeevent type nameevent type name
request.event.idevent idevent id
response.status”success""failure”
error.type-”expected”/“exceptional”
error.code-error code
total tag count810

Note: Handler Spans do not record response.elapsed. Since Spans inherently have their own start/end times (duration), a separate elapsed field would be redundant. Logging, on the other hand, does not have an inherent duration concept, so the response.elapsed field is needed.

request.event.type and request.event.id Fields

Section titled “request.event.type and request.event.id Fields”

Handler Spans have unique tags called request.event.type and request.event.id:

  • request.event.type: The event type name. This is a different value from request.handler.name (handler name).

    • Example: request.handler = "OnProductCreated", request.event.type = "CreatedEvent"
    • Distinction is needed because multiple handlers can be registered for a single event type.
  • request.event.id: A GUID per event instance. Tracks correlation between multiple handlers processing the same event.

    • Example: request.event.id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"

request.event.type vs request.handler.name relationship:

request.handler.name represents the handler class that processes the event, while request.event.type represents the event type that the handler subscribes to. This distinction is important when multiple handlers exist for a single event:

# When ProductCreatedEvent is subscribed to by two handlers:
Span 1: application usecase.event OnProductCreated.Handle
request.handler.name = "OnProductCreated" ← handler class
request.event.type = "ProductCreatedEvent" ← event type
request.event.id = "a1b2c3d4-..." ← same event instance
Span 2: application usecase.event SyncInventoryOnProductCreated.Handle
request.handler.name = "SyncInventoryOnProductCreated" ← different handler
request.event.type = "ProductCreatedEvent" ← same event type
request.event.id = "a1b2c3d4-..." ← same event instance

Since request.event.id is the same, you can see that both Spans were triggered by the same event instance.

Product creation success (POST /api/products):

application usecase.command CreateProductCommand.Handle [Ok]
├─ adapter repository InMemoryProductRepository.ExistsByName [Ok]
├─ adapter repository InMemoryProductRepository.Create [Ok]
└─ adapter event PublishTrackedEvents.PublishTrackedEvents [Ok]
└─ application usecase.event OnProductCreated.Handle [Ok]
├─ request.event.type = "CreatedEvent"
└─ request.event.id = "515711cd-..."

Handler exception (POST /api/products with [handler-error]):

application usecase.command CreateProductCommand.Handle [Error]
├─ adapter repository InMemoryProductRepository.ExistsByName [Ok]
├─ adapter repository InMemoryProductRepository.Create [Ok]
└─ adapter event PublishTrackedEvents.PublishTrackedEvents [Error]
└─ application usecase.event OnProductCreated.Handle [Error]
├─ request.event.type = "CreatedEvent"
├─ request.event.id = "f385a945-..."
├─ error.type = "exceptional"
└─ error.code = "InvalidOperationException"

Note: The Handler’s error.code records the exception type name (InvalidOperationException), while the Publisher’s error.code records the wrapped error code (Application.DomainEventPublisher.PublishFailed).

Adapter exception (POST /api/products with [adapter-error]):

Adapter exceptions occur at the Repository, so they do not reach event publishing:

application usecase.command CreateProductCommand.Handle [Error]
├─ adapter repository InMemoryProductRepository.ExistsByName [Ok]
└─ adapter repository InMemoryProductRepository.Create [Error]
├─ error.type = "exceptional"
└─ error.code = "Exceptional"

Search for DomainEvent Publisher Spans:

{span.request.category="event" && span.request.layer="adapter"}

Search for DomainEvent Handler Spans:

{span.request.category.type="event" && span.request.layer="application"}

Handler Spans with errors:

{span.request.category.type="event" && span.error.type="exceptional"}

A Span’s Status and error.type tag convey different information:

attributeMeaningValue
StatusWhether the operation succeededOk / Error
error.typeNature of the errorexpected / exceptional / aggregate

Examples:

ScenarioStatuserror.typeDescription
Order successOk-Normal processing
Insufficient stockErrorexpectedRejection per business rules
DB connection failureErrorexceptionalSystem issue

Most Trace UIs (Jaeger, Tempo) display Spans with Status = Error in red. This allows you to quickly identify at which step the problem occurred.

CreateOrderCommandHandler.Handle [Error] (1.2s)
+-- OrderRepository.GetById [Ok] (0.1s)
+-- InventoryService.CheckStock [Error] (0.05s) <-- Failed here
+-- PaymentGateway.Process [Not Started] <-- Not executed

When an error occurs in a child Span, the parent Span also typically becomes Error status. This is because “a child’s failure causes the parent’s failure.”

Application Layer: CreateOrderCommand -> Error (due to child failure)
|
+-- Adapter Layer: InventoryRepository.CheckStock -> Error (root cause)

However, if the parent handles the child’s error (fallback, retry, etc.), the parent can have Ok status.

Now that we understand the relationship between Status and error.type tags in error tracing, let’s learn practical methods for searching and analyzing Traces in Jaeger and Grafana Tempo.


Search Traces by service:

service=orderservice

Search for slow Traces:

service=orderservice minDuration=1s

Search for error Traces:

service=orderservice tags={"response.status":"failure"}

Traces for a specific handler:

service=orderservice tags={"request.handler.name":"CreateOrderCommandHandler"}

TraceQL basic search:

{resource.service.name="orderservice"}

Search for specific Spans:

{span.request.handler="CreateOrderCommandHandler" && span.response.status="failure"}

Search for slow Spans:

{span.response.elapsed > 1.0}

Search by error type:

{span.error.type="exceptional"}
  1. Identify the issue: Detect “P99 response time > 2 seconds” from metrics
  2. Retrieve example Traces: Search for slow Traces in that time range
  3. Identify bottleneck segments: Compare Duration across Spans
  4. Determine root cause: Check the longest-running Span
  5. Detailed investigation: Review the Span’s tags and logs

Situation: The “Create Order” API’s P99 response time exceeds 3 seconds.

Step 1: Retrieve slow Trace examples

Search in Jaeger with the following conditions:

service=orderservice
operation=application usecase.command CreateOrderCommandHandler.Handle
minDuration=2s

Step 2: Detailed Trace analysis

Expand the retrieved Trace to check the Duration of each Span:

CreateOrderCommandHandler.Handle (2.8s)
+-- OrderRepository.GetCustomer (0.1s)
+-- InventoryService.CheckStock (0.2s)
+-- PaymentGateway.ProcessPayment (2.3s) <-- Bottleneck!
+-- NotificationService.SendEmail (0.2s)

Step 3: Bottleneck Span analysis

Check the tags of the PaymentGateway.ProcessPayment Span:

{
"request.handler.name": "PaymentGateway",
"request.handler.method": "ProcessPayment",
"response.elapsed": 2.3,
"response.status": "success"
}

Step 4: Further investigation

Check the external call Span (Client Kind) of PaymentGateway if available:

PaymentGateway.ProcessPayment (2.3s)
+-- HTTP POST payment-provider.com/api/charge (2.2s) <-- External service delay

Conclusion: The root cause is response delay from the external payment service (payment-provider.com).

Remediation options:

  1. Review payment service timeout settings
  2. Consider asynchronous processing (create order without waiting for payment completion)
  3. Inquire with the payment service provider about the delay

Situation: “Create Order” errors spike during specific time periods.

Step 1: Retrieve error Traces

service=orderservice
tags={"response.status":"failure","error.type":"exceptional"}

Step 2: Error pattern analysis

Compare multiple Traces to identify commonalities:

  • All failed at DatabaseRepository.Save
  • error.code = "Database.ConnectionFailed"

Step 3: Time correlation

Compare the error time periods with other events (deployments, traffic spikes, infrastructure changes)

Conclusion: Database connection pool exhaustion is the suspected cause


Symptom: A specific Span is not visible in the Trace.

Check the following:

  1. Verify Pipeline registration:

    services.AddMediator(options =>
    {
    options.AddOpenBehavior(typeof(UsecaseTracingPipeline<,>));
    });
  2. Verify ActivitySource registration:

    builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
    .AddSource("Functorium.*"));
  3. Verify Sampling settings:

    .SetSampler(new AlwaysOnSampler()) // Collect all Traces

When Parent-Child Relationships Are Broken

Section titled “When Parent-Child Relationships Are Broken”

Symptom: Child Spans are displayed as separate Traces.

Cause: Context was not propagated.

Check the following:

  1. Context propagation in async calls:

    // Bad example: Context is not propagated
    Task.Run(() => adapter.DoSomething());
    // Good example: Context propagation
    await adapter.DoSomething();
  2. Header propagation for external service calls:

    httpClient.DefaultRequestHeaders.Add("traceparent", activity?.Id);

Symptom: A Span’s Duration is much larger than the sum of its child Spans.

Possible causes:

  1. Time spent outside the Span:

    Thread.Sleep(1000); // Waiting before Span creation
    using var activity = source.StartActivity("...");
    // Actual work
  2. Async waiting:

    using var activity = source.StartActivity("...");
    await Task.Delay(1000); // Waiting within the Span
    // Only waiting without child Spans

A: In most production environments, sampling is applied. Tracing all requests incurs significant storage costs and performance overhead.

Common sampling strategies:

  • Error requests: 100% collection
  • Success requests: 1-10% collection
  • Specific conditions: 100% collection (e.g., specific users, specific APIs)
.SetSampler(new ParentBasedSampler(new TraceIdRatioBasedSampler(0.1))) // 10% sampling

Q: How do you set the Trace retention period?

Section titled “Q: How do you set the Trace retention period?”

A: It depends on the storage backend:

  • Jaeger: --es.max-span-age flag
  • Tempo: compactor.compaction.block_retention

Generally, 7-30 days of retention is recommended. Important Traces can be stored separately.

Q: How do you connect logging and tracing?

Section titled “Q: How do you connect logging and tracing?”

A: You can connect them by including the Trace ID in logs:

Log.ForContext("TraceId", Activity.Current?.TraceId.ToString())
.Information("Order created");

If you set up Trace → Log integration in Grafana, you can view related logs with a single click.

Q: Are external service calls also traced?

Section titled “Q: Are external service calls also traced?”

A: You need to add instrumentation for HttpClient, database drivers, etc.:

builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddHttpClientInstrumentation()
.AddSqlClientInstrumentation()
.AddNpgsql());

With this configuration, HTTP calls and DB queries are automatically recorded as Spans.

Q: How much performance overhead is there?

Section titled “Q: How much performance overhead is there?”

A: OpenTelemetry’s overhead is generally very low:

  • CPU: 1-5% additional
  • Memory: A few MB additional
  • Latency: < 1ms additional

However, exporting all Spans increases network bandwidth costs. Applying sampling can minimize the overhead.

A: This occurs when a Span has not been started or Context was not propagated.

Check the following:

  1. Verify that ActivitySource is registered
  2. Verify that ActivityListener is listening to the relevant source
  3. Verify that the Sampler is not excluding the Activity
// Debugging code
Console.WriteLine($"Current Activity: {Activity.Current?.DisplayName ?? "null"}");
Console.WriteLine($"TraceId: {Activity.Current?.TraceId}");

Internal documents:


Trace Parent-Child Hierarchy Troubleshooting

Section titled “Trace Parent-Child Hierarchy Troubleshooting”

This section covers the issue where Adapter Spans are created as siblings of the HTTP request Span instead of children of the Usecase Span, and its resolution.

Expected hierarchy:

HttpRequestIn (ROOT)
└── GetAllProductsQuery.Handle
└── InMemoryProductRepository.GetAll ← Child of Usecase

Actual hierarchy:

HttpRequestIn (ROOT)
├── GetAllProductsQuery.Handle
└── InMemoryProductRepository.GetAll ← Sibling of HTTP request (problem!)

In the priority order of DetermineParentContext, IObservabilityContext (Scoped — captures the Activity at the time the HTTP request started) was matched before Activity.Current (the closest parent in the current execution context), causing Adapter Spans to use the HTTP request level as parent.

Solution: Change DetermineParentContext Priority

Section titled “Solution: Change DetermineParentContext Priority”
private static ActivityContext DetermineParentContext(IObservabilityContext? parentContext)
{
// 1. Activity.Current - closest parent (standard OpenTelemetry behavior)
Activity? currentActivity = Activity.Current;
if (currentActivity != null)
return currentActivity.Context;
// 2. AsyncLocal - workaround for FinT async context restoration issues
Activity? traverseActivity = ActivityContextHolder.GetCurrentActivity();
if (traverseActivity != null)
return traverseActivity.Context;
// 3. Explicit parentContext - context injected from outside
if (parentContext is ObservabilityContext otelContext)
return otelContext.ActivityContext;
return default;
}

Priority meaning:

PrioritySourcePurpose
1Activity.CurrentClosest parent in the current execution context (synchronous flow)
2ActivityContextHolderWorkaround for FinT/IO monad AsyncLocal restoration issues
3parentContextExplicitly passed external context (HTTP request level)

Check in Jaeger or Zipkin that the Adapter Span (InMemoryProductRepository.GetAll) is displayed as a child of the Usecase Span (GetAllProductsQuery.Handle).

FileRole
OpenTelemetrySpanFactory.csAdapter Span creation, parent context determination
UsecaseTracingPipeline.csUsecase Activity creation
ActivityContextHolder.csAsyncLocal-based Activity context storage
ObservabilityContext.csHTTP request-level Activity context wrapper