OpenTelemetry in Production: Unified Observability Strategies
Fragmented observability (separate logs, metrics, tracing) makes debugging difficult. OpenTelemetry unifies collection, standardizes data, and facilitates integration with multiple tools.
Executive summary
Fragmented observability (separate logs, metrics, tracing) makes debugging difficult. OpenTelemetry unifies collection, standardizes data, and facilitates integration with multiple tools.
Last updated: 3/16/2026
Executive summary
The traditional observability approach—implementing logs in one place, metrics in another, tracing elsewhere—creates fragmented data, inconsistent instrumentation, and operational complexity. Debugging a failed request requires accessing multiple different dashboards to see related logs, metrics, and traces.
OpenTelemetry (OTel) solves this fragmentation by providing a unified specification and SDKs for collecting logs, metrics, and traces. In 2026, OpenTelemetry has consolidated as the de facto standard for observability, with native support in all major cloud providers and APM vendors.
Organizations that implement OpenTelemetry correctly reduce instrumentation effort, eliminate data fragmentation, and facilitate migration between observability tools without changing application code.
The fragmented observability problem
Symptoms of separate implementation
1. Lost context between signals
typescript// ❌ Fragmented: each signal has different context
// logging.ts
logger.info('User created', { userId: '123' });
// metrics.ts
metrics.increment('user.created', { userId: '123' });
// tracing.ts
tracer.startSpan('createUser', { userId: '123' });
// ✅ Unified: context propagated automatically
import { trace, metrics, logger } from '@opentelemetry/api';
const context = trace.setSpanContext({
'user.id': '123',
'user.role': 'admin'
});
logger.info('User created', context);
metrics.increment('user.created', context);
tracer.startSpan('createUser', context);2. Inconsistent attributes
typescript// ❌ Different names for the same concept
logger.info({ 'user_id': '123' });
metrics.record('user_created', { 'userId': '123' });
tracer.recordEvent('UserCreate', { 'UserID': '123' });
// ✅ Standardized: semantic attributes
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
logger.info('User created', {
[SemanticAttributes.USER_ID]: '123',
[SemanticAttributes.USER_ROLE]: 'admin'
});
metrics.record('user.created', {
[SemanticAttributes.USER_ID]: '123',
[SemanticAttributes.USER_ROLE]: 'admin'
});3. Difficult migration
When you switch from one APM vendor to another, you need to reimplement all instrumentation.
typescript// ❌ Vendor lock-in
import { DataDogLogger } from '@datadog/browser-logs';
import { DataDogMetrics } from '@datadog/browser-metrics';
import { DataDogTracer } from '@datadog/browser-tracing';
// If switching to New Relic, everything changes
import { NewRelicLogger } from 'newrelic/browser-logs';
import { NewRelicMetrics } from 'newrelic/browser-metrics';
// ✅ Vendor-agnostic: just swap the exporter
import { trace, metrics, logger } from '@opentelemetry/api';
import { JaegerExporter } from '@opentelemetry/exporter-jaeger';
import { PrometheusExporter } from '@opentelemetry/exporter-prometheus';
// Switching from Jaeger to Prometheus = just change exporterOpenTelemetry architecture
OTel specification layers
┌─────────────────────────────────────────────────────────────────┐
│ OPEN TELEMETRY SPECIFICATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ API Layer │
│ ├── Tracing API (spans, context, links) │
│ ├── Metrics API (counter, gauge, histogram) │
│ ├── Logs API (structured logs) │
│ └── Baggage API (context propagation) │
│ │
│ SDK Layer │
│ ├── Instrumentation Libraries (auto-instrumentation) │
│ ├── Language SDKs (JavaScript, Python, Go) │
│ └── Semantic Conventions (standard attribute names) │
│ │
│ Collector Layer │
│ ├── OTLP Receiver (receives telemetry data) │
│ ├── Processors (batch, transform, filter) │
│ └── Exporters (Jaeger, Prometheus, vendors) │
│ │
└─────────────────────────────────────────────────────────────────┘Essential components
1. Tracer Provider
typescript// src/tracing.ts
import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
import { trace } from '@opentelemetry/api';
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
});
const provider = new NodeTracerProvider({
resource,
spanProcessors: [
new BatchSpanProcessor(
new OTLPTraceExporter({
url: 'http://otel-collector:4317/v1/traces',
})
),
],
});
provider.register();
export { trace };2. Metrics Provider
typescript// src/metrics.ts
import { MeterProvider } from '@opentelemetry/sdk-metrics';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics-base';
import { OTLPMetricExporter } from '@opentelemetry/exporter-metrics-otlp-grpc';
import { metrics } from '@opentelemetry/api';
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
});
const provider = new MeterProvider({
resource,
metricReader: new PeriodicExportingMetricReader(
new OTLPMetricExporter({
url: 'http://otel-collector:4317/v1/metrics',
})
),
});
provider.register();
export const meter = metrics.getMeter('my-app');3. Logger Provider
typescript// src/logging.ts
import { LoggerProvider } from '@opentelemetry/sdk-logs';
import { Resource } from '@opentelemetry/resources';
import { SimpleLogRecordProcessor } from '@opentelemetry/sdk-logs-base';
import { OTLPLogExporter } from '@opentelemetry/exporter-logs-otlp-grpc';
import { logs } from '@opentelemetry/api';
import { SeverityNumber } from '@opentelemetry/api-logs';
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: 'my-app',
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
});
const provider = new LoggerProvider({
resource,
logRecordProcessors: [
new SimpleLogRecordProcessor(
new OTLPLogExporter({
url: 'http://otel-collector:4317/v1/logs',
})
),
],
});
provider.register();
export const logger = logs.getLogger('my-app');Manual instrumentation
Instrumenting HTTP handlers
typescript// src/instrumentation/http.ts
import { trace, context, SpanStatusCode } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
import { HTTP } from '@opentelemetry/semantic-conventions';
export function instrumentHttpHandler(handler: RequestHandler) {
return async (req, res, next) => {
// Create span for HTTP request
const tracer = trace.getTracer('http');
const span = tracer.startSpan(`${req.method} ${req.path}`, {
kind: 'server',
attributes: {
[SemanticAttributes.HTTP_METHOD]: req.method,
[SemanticAttributes.HTTP_URL]: req.url,
[SemanticAttributes.HTTP_ROUTE]: req.path,
},
});
try {
const result = await handler(req, res, next);
// Add response attributes
span.setAttribute(SemanticAttributes.HTTP_STATUS_CODE, res.statusCode);
span.setStatus({
code: SpanStatusCode.OK,
});
return result;
} catch (error) {
// Record error on span
span.recordException(error as Error);
span.setStatus({
code: SpanStatusCode.ERROR,
});
throw error;
} finally {
// End span
span.end();
}
};
}Instrumenting database operations
typescript// src/instrumentation/database.ts
import { trace, SpanKind } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
export function instrumentDatabase<T>(
operation: string,
dbSystem: string,
fn: () => Promise<T>
): Promise<T> {
const tracer = trace.getTracer('database');
return tracer.startActiveSpan(operation, {
kind: SpanKind.CLIENT,
attributes: {
[SemanticAttributes.DB_SYSTEM]: dbSystem,
[SemanticAttributes.DB_OPERATION]: operation,
},
}, async (span) => {
try {
const startTime = Date.now();
const result = await fn();
const duration = Date.now() - startTime;
// Record duration as metric
meter
.createHistogram('db.operation.duration', {
unit: 'ms',
description: 'Duration of database operations',
})
.record(duration, {
[SemanticAttributes.DB_SYSTEM]: dbSystem,
[SemanticAttributes.DB_OPERATION]: operation,
});
span.setAttribute(SemanticAttributes.DB_STATEMENT, 'SUCCESS');
return result;
} catch (error) {
span.recordException(error as Error);
span.setAttribute(SemanticAttributes.DB_STATEMENT, 'ERROR');
throw error;
} finally {
span.end();
}
});
}Instrumenting with context propagation
typescript// src/api/user.ts
import { trace, context } from '@opentelemetry/api';
import { SemanticAttributes } from '@opentelemetry/semantic-conventions';
import { instrumentDatabase } from '../instrumentation/database';
export async function getUser(userId: string) {
// Context already exists from HTTP request
const tracer = trace.getTracer('api');
const span = tracer.startSpan('getUser');
try {
// Propagate context to database
const user = await instrumentDatabase(
'SELECT',
'postgresql',
() => db.users.findUnique({ where: { id: userId } })
);
span.setAttribute(SemanticAttributes.DB_USER, userId);
span.setAttribute('success', 'true');
return user;
} catch (error) {
span.recordException(error as Error);
throw error;
} finally {
span.end();
}
}Semantic conventions
Standard service attributes
typescriptimport { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
const serviceAttributes = {
// Service identification
[SemanticResourceAttributes.SERVICE_NAME]: 'api-gateway',
[SemanticResourceAttributes.SERVICE_VERSION]: '2.1.0',
[SemanticResourceAttributes.SERVICE_INSTANCE_ID]: 'gateway-1',
// Deployment environment
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: process.env.NODE_ENV,
[SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT_NAME]: 'production',
[SemanticResourceAttributes.HOST_NAME]: 'api-gateway-prod',
// Process identification
[SemanticResourceAttributes.PROCESS_PID]: process.pid,
[SemanticResourceAttributes.PROCESS_EXECUTABLE_PATH]: process.execPath,
[SemanticResourceAttributes.PROCESS_COMMAND]: process.argv[0],
[SemanticResourceAttributes.PROCESS_COMMAND_ARGS]: JSON.stringify(process.argv.slice(1)),
};Standard HTTP attributes
typescriptimport { SemanticAttributes } from '@opentelemetry/semantic-conventions';
const httpAttributes = {
// Request
[SemanticAttributes.HTTP_METHOD]: 'POST',
[SemanticAttributes.HTTP_URL]: 'https://api.example.com/users',
[SemanticAttributes.HTTP_TARGET]: '/api/v1/users',
[SemanticAttributes.HTTP_SCHEME]: 'https',
[SemanticAttributes.HTTP_HOST]: 'api.example.com',
// Response
[SemanticAttributes.HTTP_STATUS_CODE]: 200,
[SemanticAttributes.HTTP_STATUS_TEXT]: 'OK',
[SemanticAttributes.HTTP_FLAVOR]: '1.1',
// Client
[SemanticAttributes.HTTP_CLIENT_IP]: '192.168.1.1',
[SemanticAttributes.HTTP_USER_AGENT]: 'Mozilla/5.0...',
};Standard database attributes
typescriptimport { SemanticAttributes } from '@opentelemetry/semantic-conventions';
const dbAttributes = {
// Database system
[SemanticAttributes.DB_SYSTEM]: 'postgresql',
// Operation
[SemanticAttributes.DB_OPERATION]: 'SELECT',
[SemanticAttributes.DB_STATEMENT]: 'SELECT * FROM users WHERE id = $1',
// Name and table
[SemanticAttributes.DB_NAME]: 'app_db',
[SemanticAttributes.DB_USER]: 'app_user',
// Connection
[SemanticAttributes.DB_CONNECTION_STRING]: 'postgresql://localhost:5432/app_db',
[SemanticAttributes.DB_JDBC_DRIVER]: 'org.postgresql.Driver',
};Production patterns
Pattern 1: Intelligent sampling
typescript// src/tracing/sampler.ts
import { ParentBasedSampler, TraceIdRatioBasedSampler } from '@opentelemetry/sdk-trace-base';
// Sampling based on trace ID (consistent)
const idBasedSampler = new TraceIdRatioBasedSampler(0.1); // 10% of traces
// Sampling based on parent (if parent is sampled, children are too)
const parentBasedSampler = new ParentBasedSampler({
root: idBasedSampler,
remoteParentSampled: new TraceIdRatioBasedSampler(0.05),
});
const provider = new NodeTracerProvider({
sampler: parentBasedSampler,
// ...rest of configuration
});Pattern 2: Batching to reduce overhead
typescript// src/telemetry/processor.ts
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-grpc';
// Batching of spans
const batchProcessor = new BatchSpanProcessor(
new OTLPTraceExporter({
url: 'http://otel-collector:4317/v1/traces',
// Send in batches of 100 spans or 5 seconds
batchSizeLimit: 100,
batchTimeout: 5000,
})
);Pattern 3: Centralized resource attributes
typescript// src/telemetry/resource.ts
import { detectResources } from '@opentelemetry/resources';
const detectedResources = detectResources({
detectors: [
// Automatically detects: AWS, GCP, Azure, containers, etc.
],
});
export const resource = detectedResources;Integration with OTel Collector
Collector configuration
yaml# otel-collector-config.yaml
receivers:
# Receive traces via OTLP
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
# Receive traces from Jaeger (legacy)
jaeger:
protocols:
thrift_compact:
endpoint: jaeger:14268/api/traces
processors:
# Batching of spans
batch:
timeout: 5s
send_batch_size: 1000
# Add resource attributes to all spans
resource:
attributes:
- key: service.name
value: api-gateway
- key: service.version
value: 2.1.0
- key: deployment.environment
value: production
exporters:
# Send to Jaeger
jaeger:
endpoint: jaeger:14250/api/traces
tls:
insecure: true
# Send to Prometheus (metrics)
prometheus:
endpoint: "0.0.0.0:9090/metrics"
# Send logs to Loki
loki:
endpoint: http://loki:3100/loki/api/v1/pushCollector deployment
yaml# k8s/otel-collector-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-collector
spec:
replicas: 3
selector:
matchLabels:
app: otel-collector
template:
metadata:
labels:
app: otel-collector
spec:
containers:
- name: otel-collector
image: otel/opentelemetry-collector-contrib:latest
args:
- --config=/etc/otel-collector-config.yaml
ports:
- containerPort: 4317 # OTLP gRPC
- containerPort: 4318 # OTLP HTTP
- containerPort: 14268 # Jaeger gRPC
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
volumeMounts:
- name: config-volume
mountPath: /etc/otel-collector-config.yaml
readOnly: true
volumes:
- name: config-volume
configMap:
name: otel-collector-configMonitoring OTel overhead
Collector self-metrics
yaml# otel-collector-config.yaml
receivers:
prometheus:
config:
scrape_configs:
- job_name: 'otel-collector'
static_configs:
- targets: ['0.0.0.0:8888']
service:
pipelines:
traces:
receivers: [otlp, jaeger]
processors: [batch]
exporters: [jaeger]
metrics:
receivers: [prometheus]
processors: [batch]
exporters: [prometheus]Overhead alerts
yaml# alerts/otel-overhead.yaml
groups:
- name: otel-overhead
interval: 30s
rules:
- alert: HighSpanProcessingTime
expr: histogram_quantile(0.99, rate(otel_collector_exporter_enqueue_span_latency_seconds[5m])) > 0.1
annotations:
summary: "Processing time for 99th percentile of spans is high"
description: "The collector is taking more than 100ms to process spans"
- alert: HighDropRate
expr: rate(otel_collector_processor_spans_dropped_total[5m]) > 10
annotations:
summary: "High drop rate of spans"
description: "The collector is dropping more than 10 spans/second"
- alert: HighMemoryUsage
expr: container_memory_usage_bytes{container="otel-collector"} / container_spec_memory_limit_bytes{container="otel-collector"} > 0.8
annotations:
summary: "High memory usage of Otel Collector"
description: "The collector is using more than 80% of allocated memory"Implementation plan in 60 days
Weeks 1-2: Foundation
- Install OTel Collector in cluster
- Configure exporters (Jaeger, Prometheus, Loki)
- Define standard semantic conventions
- Create instrumentation packages
Weeks 3-4: Instrumentation
- Instrument API HTTP handlers
- Instrument database operations
- Add custom spans for business logic
- Implement context propagation
Weeks 5-6: Optimization
- Configure intelligent sampling
- Implement batching to reduce overhead
- Create unified observability dashboards
- Validate performance gains and debugging quality
Conclusion
OpenTelemetry in 2026 is the unified standard for observability that eliminates fragmentation of logs, metrics, and traces. By implementing OTel, organizations reduce instrumentation effort, standardize observability data, and facilitate migration between tools without changing application code.
OpenTelemetry maturity is established: well-defined semantic conventions, mature SDKs for all major languages, native support from major cloud providers, and a robust ecosystem of visualization tools.
The key is to start with structured instrumentation: use semantic conventions, implement intelligent sampling to reduce overhead, and configure appropriate collectors for your specific requirements.
Closing practical question: Can your organization debug production issues tracing a complete request through correlated logs, metrics, and spans in a unified context, or do you need to access multiple fragmented dashboards?
Need to implement OpenTelemetry to unify observability and improve production debugging capabilities? Talk to Imperialis specialists about observability architecture, OTel implementation, and instrumentation strategies.
Sources
- OpenTelemetry Documentation — Official OpenTelemetry documentation
- OpenTelemetry Collector — Collector specification
- Semantic Conventions — Standard semantic attributes
- OpenTelemetry JavaScript SDK — JavaScript SDK
- OpenTelemetry Registry — Available instrumentation and exporters