Cloud and platform

OpenTelemetry in Production: Unified Observability Strategies

Fragmented observability (separate logs, metrics, tracing) makes debugging difficult. OpenTelemetry unifies collection, standardizes data, and facilitates integration with multiple tools.

3/16/20268 min readCloud
OpenTelemetry in Production: Unified Observability Strategies

Executive summary

Fragmented observability (separate logs, metrics, tracing) makes debugging difficult. OpenTelemetry unifies collection, standardizes data, and facilitates integration with multiple tools.

Last updated: 3/16/2026

Executive summary

The traditional observability approach—implementing logs in one place, metrics in another, tracing elsewhere—creates fragmented data, inconsistent instrumentation, and operational complexity. Debugging a failed request requires accessing multiple different dashboards to see related logs, metrics, and traces.

OpenTelemetry (OTel) solves this fragmentation by providing a unified specification and SDKs for collecting logs, metrics, and traces. In 2026, OpenTelemetry has consolidated as the de facto standard for observability, with native support in all major cloud providers and APM vendors.

Organizations that implement OpenTelemetry correctly reduce instrumentation effort, eliminate data fragmentation, and facilitate migration between observability tools without changing application code.

The fragmented observability problem

Symptoms of separate implementation

1. Lost context between signals

typescript// ❌ Fragmented: each signal has different context
// logging.ts
logger.info('User created', { userId: '123' });

// metrics.ts
metrics.increment('user.created', { userId: '123' });

// tracing.ts
tracer.startSpan('createUser', { userId: '123' });

// ✅ Unified: context propagated automatically
import { trace, metrics, logger } from '@opentelemetry/api';

const context = trace.setSpanContext({
  'user.id': '123',
  'user.role': 'admin'
});

logger.info('User created', context);
metrics.increment('user.created', context);
tracer.startSpan('createUser', context);

2. Inconsistent attributes

typescript// ❌ Different names for the same concept
logger.info({ 'user_id': '123' });
metrics.record('user_created', { 'userId': '123' });
tracer.recordEvent('UserCreate', { 'UserID': '123' });

// ✅ Standardized: semantic attributes
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';

logger.info('User created', {
  [SemanticAttributes.USER_ID]: '123',
  [SemanticAttributes.USER_ROLE]: 'admin'
});
metrics.record('user.created', {
  [SemanticAttributes.USER_ID]: '123',
  [SemanticAttributes.USER_ROLE]: 'admin'
});

3. Difficult migration

When you switch from one APM vendor to another, you need to reimplement all instrumentation.

typescript// ❌ Vendor lock-in
import { DataDogLogger } from '@datadog/browser-logs';
import { DataDogMetrics } from '@datadog/browser-metrics';
import { DataDogTracer } from '@datadog/browser-tracing';

// If switching to New Relic, everything changes
import { NewRelicLogger } from 'newrelic/browser-logs';
import { NewRelicMetrics } from 'newrelic/browser-metrics';

// ✅ Vendor-agnostic: just swap the exporter
import { trace, metrics, logger } from '@opentelemetry/api';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';

// Switching from Jaeger to Prometheus = just change exporter

OpenTelemetry architecture

OTel specification layers

┌─────────────────────────────────────────────────────────────────┐
│                  OPEN TELEMETRY SPECIFICATION                 │
├─────────────────────────────────────────────────────────────────┤
│                                                           │
│  API Layer                                                │
│  ├── Tracing API            (spans, context, links)      │
│  ├── Metrics API             (counter, gauge, histogram)      │
│  ├── Logs API                (structured logs)                │
│  └── Baggage API             (context propagation)          │
│                                                           │
│  SDK Layer                                                 │
│  ├── Instrumentation Libraries (auto-instrumentation)        │
│  ├── Language SDKs           (JavaScript, Python, Go)     │
│  └── Semantic Conventions     (standard attribute names)    │
│                                                           │
│  Collector Layer                                           │
│  ├── OTLP Receiver          (receives telemetry data)      │
│  ├── Processors           (batch, transform, filter)      │
│  └── Exporters            (Jaeger, Prometheus, vendors)    │
│                                                           │
└─────────────────────────────────────────────────────────────────┘

Essential components

1. Tracer Provider

typescript// src/tracing.ts
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { trace } from '@opentelemetry/api';

const resource = new Resource({
  [SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
  [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
  [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
});

const provider = new NodeTracerProvider({
  resource,
  spanProcessors: [
    new BatchSpanProcessor(
      new OTLPTraceExporter({
        url: 'http://otel-collector:4317/v1/traces',
      })
    ),
  ],
});

provider.register();

export { trace };

2. Metrics Provider

typescript// src/metrics.ts
import { MeterProvider } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics-base';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { metrics } from '@opentelemetry/api';

const resource = new Resource({
  [SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
  [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
});

const provider = new MeterProvider({
  resource,
  metricReader: new PeriodicExportingMetricReader(
    new OTLPMetricExporter({
      url: 'http://otel-collector:4317/v1/metrics',
    })
  ),
});

provider.register();

export const meter = metrics.getMeter('my-app');

3. Logger Provider

typescript// src/logging.ts
import { LoggerProvider } from '@opentelemetry/sdk-logs';
import { Resource } from '@opentelemetry/resources';
import { SimpleLogRecordProcessor } from '@opentelemetry/sdk-logs-base';
import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-grpc';
import { logs } from '@opentelemetry/api';
import { SeverityNumber } from '@opentelemetry/api-logs';

const resource = new Resource({
  [SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
  [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
});

const provider = new LoggerProvider({
  resource,
  logRecordProcessors: [
    new SimpleLogRecordProcessor(
      new OTLPLogExporter({
        url: 'http://otel-collector:4317/v1/logs',
      })
    ),
  ],
});

provider.register();

export const logger = logs.getLogger('my-app');

Manual instrumentation

Instrumenting HTTP handlers

typescript// src/instrumentation/http.ts
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
import { HTTP } from '@opentelemetry/semantic-conventions';

export function instrumentHttpHandler(handler: RequestHandler) {
  return async (req, res, next) => {
    // Create span for HTTP request
    const tracer = trace.getTracer('http');
    const span = tracer.startSpan(`${req.method} ${req.path}`, {
      kind: 'server',
      attributes: {
        [SemanticAttributes.HTTP_METHOD]: req.method,
        [SemanticAttributes.HTTP_URL]: req.url,
        [SemanticAttributes.HTTP_ROUTE]: req.path,
      },
    });

    try {
      const result = await handler(req, res, next);

      // Add response attributes
      span.setAttribute(SemanticAttributes.HTTP_STATUS_CODE, res.statusCode);
      span.setStatus({
        code: SpanStatusCode.OK,
      });

      return result;
    } catch (error) {
      // Record error on span
      span.recordException(error as Error);
      span.setStatus({
        code: SpanStatusCode.ERROR,
      });
      throw error;
    } finally {
      // End span
      span.end();
    }
  };
}

Instrumenting database operations

typescript// src/instrumentation/database.ts
import { trace, SpanKind } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';

export function instrumentDatabase<T>(
  operation: string,
  dbSystem: string,
  fn: () => Promise<T>
): Promise<T> {
  const tracer = trace.getTracer('database');

  return tracer.startActiveSpan(operation, {
    kind: SpanKind.CLIENT,
    attributes: {
      [SemanticAttributes.DB_SYSTEM]: dbSystem,
      [SemanticAttributes.DB_OPERATION]: operation,
    },
  }, async (span) => {
    try {
      const startTime = Date.now();
      const result = await fn();
      const duration = Date.now() - startTime;

      // Record duration as metric
      meter
        .createHistogram('db.operation.duration', {
          unit: 'ms',
          description: 'Duration of database operations',
        })
        .record(duration, {
          [SemanticAttributes.DB_SYSTEM]: dbSystem,
          [SemanticAttributes.DB_OPERATION]: operation,
        });

      span.setAttribute(SemanticAttributes.DB_STATEMENT, 'SUCCESS');
      return result;
    } catch (error) {
      span.recordException(error as Error);
      span.setAttribute(SemanticAttributes.DB_STATEMENT, 'ERROR');
      throw error;
    } finally {
      span.end();
    }
  });
}

Instrumenting with context propagation

typescript// src/api/user.ts
import { trace, context } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
import { instrumentDatabase } from '../instrumentation/database';

export async function getUser(userId: string) {
  // Context already exists from HTTP request
  const tracer = trace.getTracer('api');
  const span = tracer.startSpan('getUser');

  try {
    // Propagate context to database
    const user = await instrumentDatabase(
      'SELECT',
      'postgresql',
      () => db.users.findUnique({ where: { id: userId } })
    );

    span.setAttribute(SemanticAttributes.DB_USER, userId);
    span.setAttribute('success', 'true');

    return user;
  } catch (error) {
    span.recordException(error as Error);
    throw error;
  } finally {
    span.end();
  }
}

Semantic conventions

Standard service attributes

typescriptimport { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';

const serviceAttributes = {
  // Service identification
  [SemanticResourceAttributes.SERVICE_NAME]: 'api-gateway',
  [SemanticResourceAttributes.SERVICE_VERSION]: '2.1.0',
  [SemanticResourceAttributes.SERVICE_INSTANCE_ID]: 'gateway-1',

  // Deployment environment
  [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
  [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT_NAME]: 'production',
  [SemanticResourceAttributes.HOST_NAME]: 'api-gateway-prod',

  // Process identification
  [SemanticResourceAttributes.PROCESS_PID]: process.pid,
  [SemanticResourceAttributes.PROCESS_EXECUTABLE_PATH]: process.execPath,
  [SemanticResourceAttributes.PROCESS_COMMAND]: process.argv[0],
  [SemanticResourceAttributes.PROCESS_COMMAND_ARGS]: JSON.stringify(process.argv.slice(1)),
};

Standard HTTP attributes

typescriptimport { SemanticAttributes } from '@opentelemetry/semantic-conventions';

const httpAttributes = {
  // Request
  [SemanticAttributes.HTTP_METHOD]: 'POST',
  [SemanticAttributes.HTTP_URL]: 'https://api.example.com/users',
  [SemanticAttributes.HTTP_TARGET]: '/api/v1/users',
  [SemanticAttributes.HTTP_SCHEME]: 'https',
  [SemanticAttributes.HTTP_HOST]: 'api.example.com',

  // Response
  [SemanticAttributes.HTTP_STATUS_CODE]: 200,
  [SemanticAttributes.HTTP_STATUS_TEXT]: 'OK',
  [SemanticAttributes.HTTP_FLAVOR]: '1.1',

  // Client
  [SemanticAttributes.HTTP_CLIENT_IP]: '192.168.1.1',
  [SemanticAttributes.HTTP_USER_AGENT]: 'Mozilla/5.0...',
};

Standard database attributes

typescriptimport { SemanticAttributes } from '@opentelemetry/semantic-conventions';

const dbAttributes = {
  // Database system
  [SemanticAttributes.DB_SYSTEM]: 'postgresql',

  // Operation
  [SemanticAttributes.DB_OPERATION]: 'SELECT',
  [SemanticAttributes.DB_STATEMENT]: 'SELECT * FROM users WHERE id = $1',

  // Name and table
  [SemanticAttributes.DB_NAME]: 'app_db',
  [SemanticAttributes.DB_USER]: 'app_user',

  // Connection
  [SemanticAttributes.DB_CONNECTION_STRING]: 'postgresql://localhost:5432/app_db',
  [SemanticAttributes.DB_JDBC_DRIVER]: 'org.postgresql.Driver',
};

Production patterns

Pattern 1: Intelligent sampling

typescript// src/tracing/sampler.ts
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';

// Sampling based on trace ID (consistent)
const idBasedSampler = new TraceIdRatioBasedSampler(0.1); // 10% of traces

// Sampling based on parent (if parent is sampled, children are too)
const parentBasedSampler = new ParentBasedSampler({
  root: idBasedSampler,
  remoteParentSampled: new TraceIdRatioBasedSampler(0.05),
});

const provider = new NodeTracerProvider({
  sampler: parentBasedSampler,
  // ...rest of configuration
});

Pattern 2: Batching to reduce overhead

typescript// src/telemetry/processor.ts
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';

// Batching of spans
const batchProcessor = new BatchSpanProcessor(
  new OTLPTraceExporter({
    url: 'http://otel-collector:4317/v1/traces',
    // Send in batches of 100 spans or 5 seconds
    batchSizeLimit: 100,
    batchTimeout: 5000,
  })
);

Pattern 3: Centralized resource attributes

typescript// src/telemetry/resource.ts
import { detectResources } from '@opentelemetry/resources';

const detectedResources = detectResources({
  detectors: [
    // Automatically detects: AWS, GCP, Azure, containers, etc.
  ],
});

export const resource = detectedResources;

Integration with OTel Collector

Collector configuration

yaml# otel-collector-config.yaml
receivers:
  # Receive traces via OTLP
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

  # Receive traces from Jaeger (legacy)
  jaeger:
    protocols:
      thrift_compact:
        endpoint: jaeger:14268/api/traces

processors:
  # Batching of spans
  batch:
    timeout: 5s
    send_batch_size: 1000

  # Add resource attributes to all spans
  resource:
    attributes:
      - key: service.name
        value: api-gateway
      - key: service.version
        value: 2.1.0
      - key: deployment.environment
        value: production

exporters:
  # Send to Jaeger
  jaeger:
    endpoint: jaeger:14250/api/traces
    tls:
      insecure: true

  # Send to Prometheus (metrics)
  prometheus:
    endpoint: "0.0.0.0:9090/metrics"

  # Send logs to Loki
  loki:
    endpoint: http://loki:3100/loki/api/v1/push

Collector deployment

yaml# k8s/otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 3
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:latest
        args:
          - --config=/etc/otel-collector-config.yaml
        ports:
        - containerPort: 4317  # OTLP gRPC
        - containerPort: 4318  # OTLP HTTP
        - containerPort: 14268 # Jaeger gRPC
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "500m"
        volumeMounts:
        - name: config-volume
          mountPath: /etc/otel-collector-config.yaml
          readOnly: true
      volumes:
      - name: config-volume
        configMap:
          name: otel-collector-config

Monitoring OTel overhead

Collector self-metrics

yaml# otel-collector-config.yaml
receivers:
  prometheus:
    config:
      scrape_configs:
        - job_name: 'otel-collector'
          static_configs:
            - targets: ['0.0.0.0:8888']

service:
  pipelines:
    traces:
      receivers: [otlp, jaeger]
      processors: [batch]
      exporters: [jaeger]

    metrics:
      receivers: [prometheus]
      processors: [batch]
      exporters: [prometheus]

Overhead alerts

yaml# alerts/otel-overhead.yaml
groups:
  - name: otel-overhead
    interval: 30s
    rules:
      - alert: HighSpanProcessingTime
        expr: histogram_quantile(0.99, rate(otel_collector_exporter_enqueue_span_latency_seconds[5m])) > 0.1
        annotations:
          summary: "Processing time for 99th percentile of spans is high"
          description: "The collector is taking more than 100ms to process spans"

      - alert: HighDropRate
        expr: rate(otel_collector_processor_spans_dropped_total[5m]) > 10
        annotations:
          summary: "High drop rate of spans"
          description: "The collector is dropping more than 10 spans/second"

      - alert: HighMemoryUsage
        expr: container_memory_usage_bytes{container="otel-collector"} / container_spec_memory_limit_bytes{container="otel-collector"} > 0.8
        annotations:
          summary: "High memory usage of Otel Collector"
          description: "The collector is using more than 80% of allocated memory"

Implementation plan in 60 days

Weeks 1-2: Foundation

  • Install OTel Collector in cluster
  • Configure exporters (Jaeger, Prometheus, Loki)
  • Define standard semantic conventions
  • Create instrumentation packages

Weeks 3-4: Instrumentation

  • Instrument API HTTP handlers
  • Instrument database operations
  • Add custom spans for business logic
  • Implement context propagation

Weeks 5-6: Optimization

  • Configure intelligent sampling
  • Implement batching to reduce overhead
  • Create unified observability dashboards
  • Validate performance gains and debugging quality

Conclusion

OpenTelemetry in 2026 is the unified standard for observability that eliminates fragmentation of logs, metrics, and traces. By implementing OTel, organizations reduce instrumentation effort, standardize observability data, and facilitate migration between tools without changing application code.

OpenTelemetry maturity is established: well-defined semantic conventions, mature SDKs for all major languages, native support from major cloud providers, and a robust ecosystem of visualization tools.

The key is to start with structured instrumentation: use semantic conventions, implement intelligent sampling to reduce overhead, and configure appropriate collectors for your specific requirements.

Closing practical question: Can your organization debug production issues tracing a complete request through correlated logs, metrics, and spans in a unified context, or do you need to access multiple fragmented dashboards?


Need to implement OpenTelemetry to unify observability and improve production debugging capabilities? Talk to Imperialis specialists about observability architecture, OTel implementation, and instrumentation strategies.

Sources

Related reading