Functorium Tracing Manual
Learn how to use distributed tracing in the Functorium framework to visualize the entire journey of a request and identify performance bottlenecks.
Introduction
Section titled “Introduction”“Why did this request take 3 seconds?” “Which service caused the delay?” “Where exactly did the failed request fail?”
Modern applications operate through the collaboration of multiple services and components. While a single HTTP request is being processed, various tasks such as database queries, external API calls, and cache accesses are executed sequentially or in parallel. In such an environment, answering the question “where did it slow down?” is extremely difficult with logs or metrics alone.
Distributed Tracing is a technique that visualizes these complex request flows as a single “journey.” It traces the entire path of a request through the system and measures the time spent at each step.
Functorium automatically provides distributed tracing capabilities that follow the OpenTelemetry Tracing standard.
What You Will Learn
Section titled “What You Will Learn”This document covers the following topics:
- Core concepts of distributed tracing - Relationship between Trace, Span, and Context
- Structure of Spans automatically generated by Functorium - Span design per architecture layer
- Request flow tracing through Parent-Child relationships - Understanding hierarchical structures
- Trace analysis methods using Jaeger and Grafana Tempo - Identifying bottleneck segments
Prerequisites
Section titled “Prerequisites”A basic understanding of the following concepts is needed to understand this document:
- Content from the Functorium Logging Manual (field naming, architecture layers)
- Basic concepts of asynchronous programming
- HTTP request/response model
Core principle: Distributed tracing visualizes the entire journey of a request through the system using Parent-Child Span relationships. Functorium automatically generates Spans at the Application Layer and Adapter Layer, recording even Expected errors with Error status while distinguishing their nature through the
error.typetag.
Summary
Section titled “Summary”Key Commands
Section titled “Key Commands”# Search for Spans of a specific handler{span.request.handler="CreateOrderCommandHandler" && span.response.status="failure"}
# Search for slow Spans{span.response.elapsed > 1.0}
# Search for system error Spans{span.error.type="exceptional"}Key Procedures
Section titled “Key Procedures”- Activate Tracing Pipeline with
ConfigurePipelines(p => p.UseObservability())(UseObservability()enables CtxEnricher, Metrics, Tracing, and Logging all at once) - Application Layer:
UsecaseTracingPipelineautomatically generates Spans (Kind: Internal) - Adapter Layer: Source Generator automatically generates Span code
- Visualize request flow and identify bottlenecks via Parent-Child relationships in Jaeger/Tempo
Key Concepts
Section titled “Key Concepts”| Concept | Description |
|---|---|
| Trace | The entire journey of a single request through the system (unique Trace ID) |
| Span | An individual work unit within a Trace (includes start time, duration, and tags) |
| Parent-Child | Usecase Span is the parent of Adapter Span - represents hierarchical call structure |
| Span Name | Application: {layer} {category}.{cqrs} {handler}.{method}, Adapter: {layer} {category} {handler}.{method} |
| Status | Ok/Error - Even Expected errors have Error status (distinguished by error.type tag) |
response.elapsed | Included as a tag in tracing (no cardinality issues since each Span is stored as an individual document) |
Distributed Tracing Fundamentals
Section titled “Distributed Tracing Fundamentals”Understanding Trace, Span, and Context
Section titled “Understanding Trace, Span, and Context”To understand distributed tracing, you need to know three core concepts.
A Trace represents the entire journey of a single request through the system. For example, all operations triggered when a user clicks the “Place Order” button are grouped into a single Trace.
Each Trace has a unique Trace ID:
Trace ID: 4bf92f3577b34da6a3ce929d0e0e4736This ID is a 128-bit random value that is globally unique. All Spans with the same Trace ID belong to a single request processing flow.
A Span is an individual work unit within a Trace. A single Trace consists of multiple Spans. Each Span records “when it started and how long it took.”
Example: Order Processing Trace
Trace: Order Processing (Trace ID: 4bf92f...)|+-- Span: HTTP POST /api/orders (1.5s) | +-- Span: CreateOrderCommandHandler.Handle (1.2s) | +-- Span: OrderRepository.Save (0.3s) | +-- Span: PaymentGateway.ProcessPayment (0.8s) | +-- Span: NotificationService.SendEmail (0.1s)Each Span contains the following information:
| attribute | Description | Example |
|---|---|---|
| Name | What operation is it? | ”CreateOrderCommandHandler.Handle” |
| Start Time | When did it start? | 2024-01-15T10:30:45.123Z |
| Duration | How long did it take? | 1.2 seconds |
| Tags | Additional metadata | response.status = "success" |
| Parent Span | Which Span invoked this operation? | HTTP POST /api/orders |
| Status | success/failure | Ok / Error |
Context
Section titled “Context”Context is the information that links Spans together into a single Trace. Context includes the Trace ID and the current Span ID.
When a request is passed between services, the Context is also propagated along with it. For HTTP, it is propagated via the traceparent header:
HTTP Header:traceparent: 00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01 | | | | | | | +-- Flags | | +-- Parent Span ID (16 chars) | +-- Trace ID (32 chars) +-- VersionThanks to this header, Spans created in different services can be linked together into a single Trace.
Parent-Child Relationship
Section titled “Parent-Child Relationship”Spans have a hierarchical structure. Parent Spans contain child Spans. This relationship is similar to a call stack.
Visualization:
Time -->| || CreateOrderCommandHandler.Handle (1.2s) || [=================================================] || | | || | OrderRepository.Save (0.3s) | || | [============] | || | | | || | | PaymentGateway.ProcessPayment (0.8s) || | | [================================] || | | | || | | | Email || | | | [==] || | | | (0.1s) || | | | || 0s 0.3s 1.1s 1.2sLooking at this structure:
- The entire request (Handle) took 1.2 seconds.
- PaymentGateway took the longest at 0.8 seconds (66%).
- Optimizing PaymentGateway would be the most effective way to improve overall performance.
Activity and Span
Section titled “Activity and Span”In .NET, OpenTelemetry’s Span is implemented using the System.Diagnostics.Activity class. When you see the term “Activity” in Functorium code, it is equivalent to OpenTelemetry’s Span.
| OpenTelemetry Term | .NET Term |
|---|---|
| Span | Activity |
| SpanContext | ActivityContext |
| Tracer | ActivitySource |
| Status.OK | ActivityStatusCode.Ok |
| Status.ERROR | ActivityStatusCode.Error |
This document uses OpenTelemetry terminology (Span) by default. Activity is only used in code examples that directly work with the .NET API.
Comparing Logging, Metrics, and Tracing
Section titled “Comparing Logging, Metrics, and Tracing”The three observability tools answer different questions:
| Tool | Question | Data Type |
|---|---|---|
| Logging | What happened? | Individual events |
| Metrics | How much/how fast? | Aggregated numbers |
| Tracing | Where was time spent? | Request paths |
Real-world scenario:
- Alert triggered: Metrics show “P99 response time exceeded 2 seconds”
- Root cause investigation: Tracing reveals “1.8 second delay at PaymentGateway”
- Detailed verification: Logging confirms “PaymentGateway timeout error message”
Now that we understand the relationship between Trace, Span, and Context from the distributed tracing fundamentals, let’s look at how Functorium automates these concepts per architecture layer.
Functorium Tracing Architecture
Section titled “Functorium Tracing Architecture”Functorium automatically generates Spans at two architecture layers.
Architecture Overview
Section titled “Architecture Overview”HTTP Request Arrives | v+------------------------------------------------------------------+| Application Layer || +------------------------------------------------------------+ || | Span: "application usecase.command | || | CreateOrderCommandHandler.Handle" | || | Kind: Internal | || | Tags: request.layer, request.category, etc. | || | Status: Ok or Error | || +------------------------------------------------------------+ || | || v (Parent-Child Relationship) || +------------------------------------------------------------+ || | Adapter Layer | || | +------------------------------------------------------+ | || | | Span: "adapter repository | | || | | OrderRepository.Save" | | || | | Kind: Internal | | || | | Tags: request.layer, request.category, etc. | | || | | Status: Ok or Error | | || | +------------------------------------------------------+ | || +------------------------------------------------------------+ |+------------------------------------------------------------------+The Application Layer Span becomes the parent of the Adapter Layer Span. Thanks to this relationship, you can clearly trace what path a request took.
Span Kind
Section titled “Span Kind”OpenTelemetry defines Kind to indicate the role of a Span:
| Kind | Description | Example |
|---|---|---|
| Server | Receives and processes external requests | HTTP server endpoint |
| Client | Calls external services | HTTP client, DB query |
| Internal | Internal processing | Business logic processing |
| Producer | Publishes asynchronous messages | Message queue publishing |
| Consumer | Receives asynchronous messages | Message queue consumption |
All auto-generated Spans in Functorium use the Internal Kind. Spans for HTTP request reception or database calls are generated separately by ASP.NET Core and database libraries.
Span Naming Convention
Section titled “Span Naming Convention”Functorium uses a consistent Span naming pattern:
Application Layer:
{layer} {category}.{cqrs} {handler}.{method}
Examples:- application usecase.command CreateOrderCommandHandler.Handle- application usecase.query GetOrderQueryHandler.HandleAdapter Layer:
{layer} {category} {handler}.{method}
Examples:- adapter repository OrderRepository.Save- adapter repository OrderRepository.GetById- adapter gateway PaymentGateway.ProcessPaymentBenefits of this naming convention:
- Consistency: All Spans follow the same pattern
- Searchability: Enables quick filtering by Span name
- Self-descriptive: You can understand the operation just by looking at the name
Understanding Span Structure
Section titled “Understanding Span Structure”Span Basic Attributes
Section titled “Span Basic Attributes”Each Span has the following basic attributes:
| attribute | Description | Example |
|---|---|---|
| TraceId | ID of the Trace it belongs to | 4bf92f3577b34da6a3ce929d0e0e4736 |
| SpanId | Unique ID of the Span | 00f067aa0ba902b7 |
| ParentSpanId | ID of the parent Span (root if none) | 5b8a8f6d3e7c9a1b |
| Name | Span name | application usecase.command… |
| Kind | Span kind | Internal |
| StartTime | Start time | 2024-01-15T10:30:45.123Z |
| EndTime | End time | 2024-01-15T10:30:46.323Z |
| Duration | Elapsed time | 1.2s |
| Status | Status code | Ok / Error |
| Tags | Additional metadata | request.handler = ”…” |
Status Code
Section titled “Status Code”A Span’s Status indicates the success/failure of the operation:
| Status | Description | When to Use |
|---|---|---|
| Unset | Status not set | Default |
| Ok | success | Normal processing complete |
| Error | failure | Error occurred |
In Functorium:
response.status = "success"→ActivityStatusCode.Okresponse.status = "failure"→ActivityStatusCode.Error
Important: Even Expected errors (business errors) have Error status. This is because it means “the request did not achieve the desired result.” The nature of the error (Expected vs Exceptional) is distinguished by the error.type tag.
Time Measurement
Section titled “Time Measurement”Time measurement for a Span works as follows:
StartTime EndTime | | v v +--------------------------------------------------+ | Duration (1.2s) | | | | +----------+ +--------------------+ +----+ | | | 0.3s | | 0.8s | |0.1s| | | +----------+ +--------------------+ +----+ | | OrderRepo PaymentGateway Email | +--------------------------------------------------+Duration calculation:
Duration = EndTime - StartTime = 1.2s
Child Span Total = 0.3 + 0.8 + 0.1 = 1.2sIf the sum of child Spans equals the parent Span’s Duration, it means no additional work was done in the parent. If there is a difference, that time was spent on work performed directly by the parent Span (logic execution, data transformation, etc.).
Tag System Detailed Guide
Section titled “Tag System Detailed Guide”Tags provide additional context to Spans. Functorium uses the same tag keys as logging and metrics to maintain 3-Pillar consistency.
Application Layer Tag Structure
Section titled “Application Layer Tag Structure”Tag structure table:
| tag key | success | failure | Description |
|---|---|---|---|
request.layer | ”application" | "application” | layer identifier |
request.category.name | ”usecase" | "usecase” | category identifier |
request.category.type | ”command”/“query" | "command”/“query” | CQRS type |
request.handler.name | handler name | handler name | handler class name |
request.handler.method | ”Handle" | "Handle” | method name |
response.elapsed | processing time | processing time | in seconds |
response.status | ”success" | "failure” | response status |
error.type | - | ”expected”/“exceptional”/“aggregate” | error classification |
error.code | - | error code | domain error code |
| total tag count | 7 | 9 |
Example - Command success:
{ "name": "application usecase.command CreateOrderCommandHandler.Handle", "status": "Ok", "tags": { "request.layer": "application", "request.category.name": "usecase", "request.category.type": "command", "request.handler.name": "CreateOrderCommandHandler", "request.handler.method": "Handle", "response.elapsed": 0.1234, "response.status": "success" }}Example - Command failure:
{ "name": "application usecase.command CreateOrderCommandHandler.Handle", "status": "Error", "tags": { "request.layer": "application", "request.category.name": "usecase", "request.category.type": "command", "request.handler.name": "CreateOrderCommandHandler", "request.handler.method": "Handle", "response.elapsed": 0.0567, "response.status": "failure", "error.type": "expected", "error.code": "Order.InsufficientStock" }}Adapter Layer Tag Structure
Section titled “Adapter Layer Tag Structure”The Adapter Layer does not have CQRS distinction, so there is no request.category.type tag.
Tag structure table:
| tag key | success | failure | Description |
|---|---|---|---|
request.layer | ”adapter" | "adapter” | layer identifier |
request.category.name | category name | category name | category identifier |
request.handler.name | handler name | handler name | handler class name |
request.handler.method | method name | method name | method name |
response.elapsed | processing time | processing time | in seconds |
response.status | ”success" | "failure” | response status |
error.type | - | ”expected”/“exceptional”/“aggregate” | error classification |
error.code | - | error code | domain error code |
| total tag count | 6 | 8 |
Example - Repository success:
{ "name": "adapter repository OrderRepository.GetById", "status": "Ok", "tags": { "request.layer": "adapter", "request.category.name": "repository", "request.handler.name": "OrderRepository", "request.handler.method": "GetById", "response.elapsed": 0.0456, "response.status": "success" }}Why response.elapsed Is Included in Tracing
Section titled “Why response.elapsed Is Included in Tracing”In metrics, we explained that response.elapsed is recorded as a Histogram rather than a tag. However, in tracing it is included as a tag. Why is that?
Differences:
| Aspect | Metrics | Tracing |
|---|---|---|
| Purpose | Aggregate analysis | Individual request tracking |
| Cardinality | Need to limit time series count | Spans are individual events |
| Storage method | Time series per tag combination | Span document unit |
In tracing, each Span is stored as an individual document. Different response.elapsed values do not create separate time series. Therefore, there are no cardinality issues when including it as a tag.
Additionally, being able to check the exact processing time as a tag in individual Spans allows you to quickly assess the performance of a specific request.
ctx.* Span Attribute — User-Defined Business Context
Section titled “ctx.* Span Attribute — User-Defined Business Context”CtxEnricherPipeline runs first in the pipeline, automatically setting ctx.* fields with the CtxPillar.Tracing flag as Span Attributes via Activity.Current?.SetTag. With the default setting (CtxPillar.Default = Logging | Tracing), all ctx. fields are included in Span Attributes.*
Pipeline execution order:CtxEnricher → Metrics → Tracing → Logging → ... → Handler
CtxEnricherPipeline: ctx.customer_id = "CUST-001" → Activity.Current.SetTag("ctx.customer_id", "CUST-001") ctx.region_code = "us-west" → Activity.Current.SetTag("ctx.region_code", "us-west") ctx.internal_note = "..." → No SetTag ([CtxTarget(CtxPillar.Logging)] → Excluded from Tracing)Since Spans are stored individually, high-cardinality fields (customer_id, Guid) are safe as Span Attributes. OpenTelemetry recommends rich attributes on Spans for debugging purposes.
To exclude from Tracing:
[CtxTarget(CtxPillar.Logging)] // Logging only — not included in Tracing Spanstring InternalNoteApplication Layer Tracing
Section titled “Application Layer Tracing”Application Layer tracing is automatically performed by UsecaseTracingPipeline.
Pipeline Behavior
Section titled “Pipeline Behavior”public class UsecaseTracingPipeline<TRequest, TResponse>{ public async ValueTask<TResponse> Handle(TRequest request, ...) { // 1. Create and start Span using var activity = _activitySource.StartActivity(spanName);
// 2. Add request tags activity?.SetTag("request.layer", "application"); activity?.SetTag("request.category.name", "usecase"); // ... remaining tags
// 3. Execute handler var response = await next(request, cancellationToken);
// 4. Add response tags activity?.SetTag("response.status", response.IsSucc ? "success" : "failure"); activity?.SetTag("response.elapsed", elapsed.TotalSeconds);
// 5. Add additional tags on error if (response.IsFail) { activity?.SetTag("error.type", GetErrorType(response)); activity?.SetTag("error.code", GetErrorCode(response)); activity?.SetStatus(ActivityStatusCode.Error); } else { activity?.SetStatus(ActivityStatusCode.Ok); }
// 6. End Span (automatic via using) return response; }}Span Name Generation
Section titled “Span Name Generation”Application Layer Span names follow this format:
{layer} {category}.{cqrs} {handler}.{method}Generation logic:
var cqrsType = GetCqrsType<TRequest>(); // "command" or "query"var handlerName = typeof(TRequest).Name.Replace("Request", "Handler");var spanName = $"application usecase.{cqrsType} {handlerName}.Handle";Examples:
| Request Type | Span Name |
|---|---|
CreateOrderCommandRequest | application usecase.command CreateOrderCommandHandler.Handle |
GetOrderQueryRequest | application usecase.query GetOrderQueryHandler.Handle |
Custom Tracing Extension (UsecaseTracingCustomPipelineBase)
Section titled “Custom Tracing Extension (UsecaseTracingCustomPipelineBase)”In addition to the Spans automatically generated by the default UsecaseTracingPipeline, you can add custom Activities (Spans) per Usecase. Inherit from UsecaseTracingCustomPipelineBase<TRequest> to implement fine-grained tracing that fits your business context.
Base Class API
Section titled “Base Class API”public abstract class UsecaseTracingCustomPipelineBase<TRequest> : UsecasePipelineBase<TRequest>, ICustomUsecasePipeline{ protected Activity? StartCustomActivity(string operationName, ActivityKind kind = ActivityKind.Internal); protected string GetActivityName(string operationName); protected static void SetStandardRequestTags(Activity activity, string method);}StartCustomActivity(operationName, kind): Creates a custom Activity (Span). If a parentActivity.Currentexists, it is created as a child span. Activity name format:{layer} {category}.{cqrs} {handler}.{operationName}GetActivityName(operationName): Retrieves the Activity name.SetStandardRequestTags(activity, method): Automatically sets the 5 standard request tags:request.layer(application)request.category.name(usecase)request.category.type(command/query)request.handler.name(Handler name)request.handler.method(method name)
Implementation Example (PlaceOrderCommand.TracingPipeline)
Section titled “Implementation Example (PlaceOrderCommand.TracingPipeline)”public sealed class PlaceOrderTracingPipeline : UsecaseTracingCustomPipelineBase<PlaceOrderCommand.Request> , IPipelineBehavior<PlaceOrderCommand.Request, FinResponse<PlaceOrderCommand.Response>>{ public PlaceOrderTracingPipeline(ActivitySource activitySource) : base(activitySource) { }
public async ValueTask<FinResponse<PlaceOrderCommand.Response>> Handle( PlaceOrderCommand.Request request, MessageHandlerDelegate<PlaceOrderCommand.Request, FinResponse<PlaceOrderCommand.Response>> next, CancellationToken ct) { using Activity? activity = StartCustomActivity("ValidateOrder"); if (activity != null) { SetStandardRequestTags(activity, "ValidateOrder"); activity.SetTag("order.line_count", request.Lines.Count); activity.SetTag("order.customer_id", request.CustomerId); }
return await next(request, ct); }}Registration Method
Section titled “Registration Method”UsecaseTracingCustomPipelineBase<TRequest> implements ICustomUsecasePipeline, so it is explicitly registered using AddCustomPipeline<T>(). Individual registration is used instead of assembly scanning to guarantee deterministic pipeline execution order:
.ConfigurePipelines(p => p .UseObservability() .AddCustomPipeline<PlaceOrderCommandTracingPipeline>())Reference: Custom Extension
Adapter Layer Tracing
Section titled “Adapter Layer Tracing”Adapter Layer tracing is performed by code automatically generated by the Source Generator.
Source Generated Code
Section titled “Source Generated Code”The Source Generator automatically generates tracing code for interfaces annotated with the [ObservabilityPipeline] attribute.
Original interface:
[ObservabilityPipeline("repository")]public interface IOrderRepository{ FinT<IO, Order> GetById(Guid id); FinT<IO, Unit> Save(Order order);}Generated code (simplified):
public partial class OrderRepositoryPipeline : IOrderRepository{ public FinT<IO, Order> GetById(Guid id) { return FinT<IO, Order>.LiftIO(async () => { using var activity = _activitySource.StartActivity( "adapter repository OrderRepository.GetById");
activity?.SetTag("request.layer", "adapter"); activity?.SetTag("request.category.name", "repository"); activity?.SetTag("request.handler.name", "OrderRepository"); activity?.SetTag("request.handler.method", "GetById");
var stopwatch = Stopwatch.StartNew(); var result = await _inner.GetById(id).Run().RunAsync(); stopwatch.Stop();
activity?.SetTag("response.elapsed", stopwatch.Elapsed.TotalSeconds); activity?.SetTag("response.status", result.IsFail ? "failure" : "success");
if (result.IsFail) { activity?.SetTag("error.type", GetErrorType(result)); activity?.SetTag("error.code", GetErrorCode(result)); activity?.SetStatus(ActivityStatusCode.Error); } else { activity?.SetStatus(ActivityStatusCode.Ok); }
return result; }); }}Span Name Generation
Section titled “Span Name Generation”Adapter Layer Span names follow this format:
{layer} {category} {handler}.{method}Examples:
| Handler | Method | Span Name |
|---|---|---|
OrderRepository | GetById | adapter repository OrderRepository.GetById |
OrderRepository | Save | adapter repository OrderRepository.Save |
PaymentGateway | ProcessPayment | adapter gateway PaymentGateway.ProcessPayment |
DomainEvent Tracing
Section titled “DomainEvent Tracing”DomainEvent tracing records the event publishing and handling process as Spans. It forms Parent-Child relationships of Usecase Span → Publisher Span → Handler Span(s).
Parent-Child Relationship
Section titled “Parent-Child Relationship”application usecase.command CreateProductCommandHandler.Handle [Parent] ├─ adapter repository InMemoryProductRepository.ExistsByName [Child] ├─ adapter repository InMemoryProductRepository.Create [Child] └─ adapter event PublishTrackedEvents.PublishTrackedEvents [Child - Publisher] └─ application usecase.event OnProductCreated.Handle [Grandchild - Handler]Publisher Spans belong to the Adapter layer, and Handler Spans belong to the Application layer. When a single Publisher calls multiple Handlers, multiple Handler Spans are generated.
Publisher Span Structure
Section titled “Publisher Span Structure”Span Name:
| Method | Span Name Pattern | Example |
|---|---|---|
| Publish | adapter event {EventType}.Publish | adapter event CreatedEvent.Publish |
| PublishTrackedEvents | adapter event PublishTrackedEvents.PublishTrackedEvents | adapter event PublishTrackedEvents.PublishTrackedEvents |
Kind: Internal
Publisher Tag Structure (Publish)
Section titled “Publisher Tag Structure (Publish)”Tag structure for single event publishing:
| tag key | Request | Success | Failure |
|---|---|---|---|
request.layer | ”adapter" | "adapter" | "adapter” |
request.category.name | ”event" | "event" | "event” |
request.handler.name | event type name | event type name | event type name |
request.handler.method | ”Publish" | "Publish" | "Publish” |
response.elapsed | - | processing time (sec) | processing time (sec) |
response.status | - | ”success" | "failure” |
error.type | - | - | “expected”/“exceptional” |
error.code | - | - | error code |
| total tag count | 4 | 6 | 8 |
Publisher Tag Structure (PublishTrackedEvents)
Section titled “Publisher Tag Structure (PublishTrackedEvents)”Tag structure for tracked Aggregate event publishing:
| tag key | Request | Success | Partial Failure | Total Failure |
|---|---|---|---|---|
request.layer | ”adapter" | "adapter" | "adapter" | "adapter” |
request.category.name | ”event" | "event" | "event" | "event” |
request.handler.name | ”PublishTrackedEvents" | "PublishTrackedEvents" | "PublishTrackedEvents" | "PublishTrackedEvents” |
request.handler.method | ”PublishTrackedEvents" | "PublishTrackedEvents" | "PublishTrackedEvents" | "PublishTrackedEvents” |
request.aggregate.count | aggregate count | aggregate count | aggregate count | aggregate count |
request.event.count | event count | event count | event count | event count |
response.elapsed | - | processing time (sec) | processing time (sec) | processing time (sec) |
response.status | - | ”success" | "failure" | "failure” |
response.event.success_count | - | - | success count | - |
response.event.failure_count | - | - | failure count | - |
error.type | - | - | - | “expected”/“exceptional” |
error.code | - | - | - | error code |
| total tag count | 6 | 8 | 10 | 10 |
Handler Span Structure
Section titled “Handler Span Structure”Span Name:
application usecase.event {HandlerName}.HandleExample: application usecase.event OnProductCreated.Handle
Kind: Internal
Handler Tag Structure
Section titled “Handler Tag Structure”| tag key | Success | Failure |
|---|---|---|
request.layer | ”application" | "application” |
request.category.name | ”usecase" | "usecase” |
request.category.type | ”event" | "event” |
request.handler.name | handler name | handler name |
request.handler.method | ”Handle" | "Handle” |
request.event.type | event type name | event type name |
request.event.id | event id | event id |
response.status | ”success" | "failure” |
error.type | - | ”expected”/“exceptional” |
error.code | - | error code |
| total tag count | 8 | 10 |
Note: Handler Spans do not record
response.elapsed. Since Spans inherently have their own start/end times (duration), a separate elapsed field would be redundant. Logging, on the other hand, does not have an inherent duration concept, so theresponse.elapsedfield is needed.
request.event.type and request.event.id Fields
Section titled “request.event.type and request.event.id Fields”Handler Spans have unique tags called request.event.type and request.event.id:
-
request.event.type: The event type name. This is a different value fromrequest.handler.name(handler name).- Example:
request.handler = "OnProductCreated",request.event.type = "CreatedEvent" - Distinction is needed because multiple handlers can be registered for a single event type.
- Example:
-
request.event.id: A GUID per event instance. Tracks correlation between multiple handlers processing the same event.- Example:
request.event.id = "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
- Example:
request.event.type vs request.handler.name relationship:
request.handler.name represents the handler class that processes the event, while request.event.type represents the event type that the handler subscribes to. This distinction is important when multiple handlers exist for a single event:
# When ProductCreatedEvent is subscribed to by two handlers:
Span 1: application usecase.event OnProductCreated.Handle request.handler.name = "OnProductCreated" ← handler class request.event.type = "ProductCreatedEvent" ← event type request.event.id = "a1b2c3d4-..." ← same event instance
Span 2: application usecase.event SyncInventoryOnProductCreated.Handle request.handler.name = "SyncInventoryOnProductCreated" ← different handler request.event.type = "ProductCreatedEvent" ← same event type request.event.id = "a1b2c3d4-..." ← same event instanceSince request.event.id is the same, you can see that both Spans were triggered by the same event instance.
LayeredArch Trace Visualization
Section titled “LayeredArch Trace Visualization”Product creation success (POST /api/products):
application usecase.command CreateProductCommand.Handle [Ok] ├─ adapter repository InMemoryProductRepository.ExistsByName [Ok] ├─ adapter repository InMemoryProductRepository.Create [Ok] └─ adapter event PublishTrackedEvents.PublishTrackedEvents [Ok] └─ application usecase.event OnProductCreated.Handle [Ok] ├─ request.event.type = "CreatedEvent" └─ request.event.id = "515711cd-..."Handler exception (POST /api/products with [handler-error]):
application usecase.command CreateProductCommand.Handle [Error] ├─ adapter repository InMemoryProductRepository.ExistsByName [Ok] ├─ adapter repository InMemoryProductRepository.Create [Ok] └─ adapter event PublishTrackedEvents.PublishTrackedEvents [Error] └─ application usecase.event OnProductCreated.Handle [Error] ├─ request.event.type = "CreatedEvent" ├─ request.event.id = "f385a945-..." ├─ error.type = "exceptional" └─ error.code = "InvalidOperationException"Note: The Handler’s
error.coderecords the exception type name (InvalidOperationException), while the Publisher’serror.coderecords the wrapped error code (Application.DomainEventPublisher.PublishFailed).
Adapter exception (POST /api/products with [adapter-error]):
Adapter exceptions occur at the Repository, so they do not reach event publishing:
application usecase.command CreateProductCommand.Handle [Error] ├─ adapter repository InMemoryProductRepository.ExistsByName [Ok] └─ adapter repository InMemoryProductRepository.Create [Error] ├─ error.type = "exceptional" └─ error.code = "Exceptional"Trace Search Queries
Section titled “Trace Search Queries”Search for DomainEvent Publisher Spans:
{span.request.category="event" && span.request.layer="adapter"}Search for DomainEvent Handler Spans:
{span.request.category.type="event" && span.request.layer="application"}Handler Spans with errors:
{span.request.category.type="event" && span.error.type="exceptional"}Understanding Error Tracing
Section titled “Understanding Error Tracing”Status vs error.type
Section titled “Status vs error.type”A Span’s Status and error.type tag convey different information:
| attribute | Meaning | Value |
|---|---|---|
| Status | Whether the operation succeeded | Ok / Error |
| error.type | Nature of the error | expected / exceptional / aggregate |
Examples:
| Scenario | Status | error.type | Description |
|---|---|---|---|
| Order success | Ok | - | Normal processing |
| Insufficient stock | Error | expected | Rejection per business rules |
| DB connection failure | Error | exceptional | System issue |
Error Display in Trace UI
Section titled “Error Display in Trace UI”Most Trace UIs (Jaeger, Tempo) display Spans with Status = Error in red. This allows you to quickly identify at which step the problem occurred.
CreateOrderCommandHandler.Handle [Error] (1.2s)+-- OrderRepository.GetById [Ok] (0.1s)+-- InventoryService.CheckStock [Error] (0.05s) <-- Failed here+-- PaymentGateway.Process [Not Started] <-- Not executedError Propagation
Section titled “Error Propagation”When an error occurs in a child Span, the parent Span also typically becomes Error status. This is because “a child’s failure causes the parent’s failure.”
Application Layer: CreateOrderCommand -> Error (due to child failure) | +-- Adapter Layer: InventoryRepository.CheckStock -> Error (root cause)However, if the parent handles the child’s error (fallback, retry, etc.), the parent can have Ok status.
Now that we understand the relationship between Status and error.type tags in error tracing, let’s learn practical methods for searching and analyzing Traces in Jaeger and Grafana Tempo.
Analyzing Traces
Section titled “Analyzing Traces”Jaeger Query Examples
Section titled “Jaeger Query Examples”Search Traces by service:
service=orderserviceSearch for slow Traces:
service=orderservice minDuration=1sSearch for error Traces:
service=orderservice tags={"response.status":"failure"}Traces for a specific handler:
service=orderservice tags={"request.handler.name":"CreateOrderCommandHandler"}Grafana Tempo Query Examples
Section titled “Grafana Tempo Query Examples”TraceQL basic search:
{resource.service.name="orderservice"}Search for specific Spans:
{span.request.handler="CreateOrderCommandHandler" && span.response.status="failure"}Search for slow Spans:
{span.response.elapsed > 1.0}Search by error type:
{span.error.type="exceptional"}Trace Analysis Workflow
Section titled “Trace Analysis Workflow”- Identify the issue: Detect “P99 response time > 2 seconds” from metrics
- Retrieve example Traces: Search for slow Traces in that time range
- Identify bottleneck segments: Compare Duration across Spans
- Determine root cause: Check the longest-running Span
- Detailed investigation: Review the Span’s tags and logs
Exercise: Finding Bottlenecks
Section titled “Exercise: Finding Bottlenecks”Scenario: Slow Order Creation
Section titled “Scenario: Slow Order Creation”Situation: The “Create Order” API’s P99 response time exceeds 3 seconds.
Step 1: Retrieve slow Trace examples
Search in Jaeger with the following conditions:
service=orderserviceoperation=application usecase.command CreateOrderCommandHandler.HandleminDuration=2sStep 2: Detailed Trace analysis
Expand the retrieved Trace to check the Duration of each Span:
CreateOrderCommandHandler.Handle (2.8s)+-- OrderRepository.GetCustomer (0.1s)+-- InventoryService.CheckStock (0.2s)+-- PaymentGateway.ProcessPayment (2.3s) <-- Bottleneck!+-- NotificationService.SendEmail (0.2s)Step 3: Bottleneck Span analysis
Check the tags of the PaymentGateway.ProcessPayment Span:
{ "request.handler.name": "PaymentGateway", "request.handler.method": "ProcessPayment", "response.elapsed": 2.3, "response.status": "success"}Step 4: Further investigation
Check the external call Span (Client Kind) of PaymentGateway if available:
PaymentGateway.ProcessPayment (2.3s)+-- HTTP POST payment-provider.com/api/charge (2.2s) <-- External service delayConclusion: The root cause is response delay from the external payment service (payment-provider.com).
Remediation options:
- Review payment service timeout settings
- Consider asynchronous processing (create order without waiting for payment completion)
- Inquire with the payment service provider about the delay
Scenario: Intermittent Errors
Section titled “Scenario: Intermittent Errors”Situation: “Create Order” errors spike during specific time periods.
Step 1: Retrieve error Traces
service=orderservicetags={"response.status":"failure","error.type":"exceptional"}Step 2: Error pattern analysis
Compare multiple Traces to identify commonalities:
- All failed at
DatabaseRepository.Save error.code = "Database.ConnectionFailed"
Step 3: Time correlation
Compare the error time periods with other events (deployments, traffic spikes, infrastructure changes)
Conclusion: Database connection pool exhaustion is the suspected cause
Troubleshooting
Section titled “Troubleshooting”When Spans Are Not Generated
Section titled “When Spans Are Not Generated”Symptom: A specific Span is not visible in the Trace.
Check the following:
-
Verify Pipeline registration:
services.AddMediator(options =>{options.AddOpenBehavior(typeof(UsecaseTracingPipeline<,>));}); -
Verify ActivitySource registration:
builder.Services.AddOpenTelemetry().WithTracing(tracing => tracing.AddSource("Functorium.*")); -
Verify Sampling settings:
.SetSampler(new AlwaysOnSampler()) // Collect all Traces
When Parent-Child Relationships Are Broken
Section titled “When Parent-Child Relationships Are Broken”Symptom: Child Spans are displayed as separate Traces.
Cause: Context was not propagated.
Check the following:
-
Context propagation in async calls:
// Bad example: Context is not propagatedTask.Run(() => adapter.DoSomething());// Good example: Context propagationawait adapter.DoSomething(); -
Header propagation for external service calls:
httpClient.DefaultRequestHeaders.Add("traceparent", activity?.Id);
When Duration Is Longer Than Expected
Section titled “When Duration Is Longer Than Expected”Symptom: A Span’s Duration is much larger than the sum of its child Spans.
Possible causes:
-
Time spent outside the Span:
Thread.Sleep(1000); // Waiting before Span creationusing var activity = source.StartActivity("...");// Actual work -
Async waiting:
using var activity = source.StartActivity("...");await Task.Delay(1000); // Waiting within the Span// Only waiting without child Spans
Q: Should all requests be traced?
Section titled “Q: Should all requests be traced?”A: In most production environments, sampling is applied. Tracing all requests incurs significant storage costs and performance overhead.
Common sampling strategies:
- Error requests: 100% collection
- Success requests: 1-10% collection
- Specific conditions: 100% collection (e.g., specific users, specific APIs)
.SetSampler(new ParentBasedSampler(new TraceIdRatioBasedSampler(0.1))) // 10% samplingQ: How do you set the Trace retention period?
Section titled “Q: How do you set the Trace retention period?”A: It depends on the storage backend:
- Jaeger:
--es.max-span-ageflag - Tempo:
compactor.compaction.block_retention
Generally, 7-30 days of retention is recommended. Important Traces can be stored separately.
Q: How do you connect logging and tracing?
Section titled “Q: How do you connect logging and tracing?”A: You can connect them by including the Trace ID in logs:
Log.ForContext("TraceId", Activity.Current?.TraceId.ToString()) .Information("Order created");If you set up Trace → Log integration in Grafana, you can view related logs with a single click.
Q: Are external service calls also traced?
Section titled “Q: Are external service calls also traced?”A: You need to add instrumentation for HttpClient, database drivers, etc.:
builder.Services.AddOpenTelemetry() .WithTracing(tracing => tracing .AddHttpClientInstrumentation() .AddSqlClientInstrumentation() .AddNpgsql());With this configuration, HTTP calls and DB queries are automatically recorded as Spans.
Q: How much performance overhead is there?
Section titled “Q: How much performance overhead is there?”A: OpenTelemetry’s overhead is generally very low:
- CPU: 1-5% additional
- Memory: A few MB additional
- Latency: < 1ms additional
However, exporting all Spans increases network bandwidth costs. Applying sampling can minimize the overhead.
Q: What if Activity.Current is null?
Section titled “Q: What if Activity.Current is null?”A: This occurs when a Span has not been started or Context was not propagated.
Check the following:
- Verify that ActivitySource is registered
- Verify that ActivityListener is listening to the relevant source
- Verify that the Sampler is not excluding the Activity
// Debugging codeConsole.WriteLine($"Current Activity: {Activity.Current?.DisplayName ?? "null"}");Console.WriteLine($"TraceId: {Activity.Current?.TraceId}");References
Section titled “References”- OpenTelemetry Tracing Specification
- W3C Trace Context
- Jaeger Documentation
- Grafana Tempo Documentation
- .NET Activity and DiagnosticSource
Internal documents:
- 08-observability.md — Observability Specification (Field/Tag, Meter, Message Template)
- 18b-observability-naming.md — Observability Naming Guide
- 19-observability-logging.md — Observability Logging Details
- 20-observability-metrics.md — Observability Metrics Details
Trace Parent-Child Hierarchy Troubleshooting
Section titled “Trace Parent-Child Hierarchy Troubleshooting”This section covers the issue where Adapter Spans are created as siblings of the HTTP request Span instead of children of the Usecase Span, and its resolution.
Symptom
Section titled “Symptom”Expected hierarchy:
HttpRequestIn (ROOT)└── GetAllProductsQuery.Handle └── InMemoryProductRepository.GetAll ← Child of UsecaseActual hierarchy:
HttpRequestIn (ROOT)├── GetAllProductsQuery.Handle└── InMemoryProductRepository.GetAll ← Sibling of HTTP request (problem!)In the priority order of DetermineParentContext, IObservabilityContext (Scoped — captures the Activity at the time the HTTP request started) was matched before Activity.Current (the closest parent in the current execution context), causing Adapter Spans to use the HTTP request level as parent.
Solution: Change DetermineParentContext Priority
Section titled “Solution: Change DetermineParentContext Priority”private static ActivityContext DetermineParentContext(IObservabilityContext? parentContext){ // 1. Activity.Current - closest parent (standard OpenTelemetry behavior) Activity? currentActivity = Activity.Current; if (currentActivity != null) return currentActivity.Context;
// 2. AsyncLocal - workaround for FinT async context restoration issues Activity? traverseActivity = ActivityContextHolder.GetCurrentActivity(); if (traverseActivity != null) return traverseActivity.Context;
// 3. Explicit parentContext - context injected from outside if (parentContext is ObservabilityContext otelContext) return otelContext.ActivityContext;
return default;}Priority meaning:
| Priority | Source | Purpose |
|---|---|---|
| 1 | Activity.Current | Closest parent in the current execution context (synchronous flow) |
| 2 | ActivityContextHolder | Workaround for FinT/IO monad AsyncLocal restoration issues |
| 3 | parentContext | Explicitly passed external context (HTTP request level) |
Verification Method
Section titled “Verification Method”Check in Jaeger or Zipkin that the Adapter Span (InMemoryProductRepository.GetAll) is displayed as a child of the Usecase Span (GetAllProductsQuery.Handle).
Related Files
Section titled “Related Files”| File | Role |
|---|---|
OpenTelemetrySpanFactory.cs | Adapter Span creation, parent context determination |
UsecaseTracingPipeline.cs | Usecase Activity creation |
ActivityContextHolder.cs | AsyncLocal-based Activity context storage |
ObservabilityContext.cs | HTTP request-level Activity context wrapper |