Tracking Microservices Performance
Scenario: You manage a distributed microservices architecture and need to track requests across multiple services, identify performance bottlenecks, correlate failures, and maintain system health.
Features Used:
- Trace (Request correlation across services)
- API Dashboard (Performance metrics)
- Metadata (Business context)
- Requests (Detailed request analysis)
Overview
In microservices architectures, a single user action often triggers requests across multiple services. Understanding the complete flow and identifying where issues occur requires:
- Request Correlation: Track a request as it flows through multiple services
- Performance Visibility: Monitor latency, errors, and throughput across all services
- Business Context: Understand which customers, features, or regions are affected
- Root Cause Analysis: Quickly identify which service in the chain is causing problems
This workflow demonstrates how to use Treblle’s tracing and monitoring features to gain complete visibility into your distributed system.
Step 1: Implement Trace ID Propagation
To track requests across microservices, you need to propagate a trace ID through your entire request chain.
Understanding Trace IDs
A trace ID is a unique identifier that follows a request through your entire system:
- User makes request to API Gateway (trace ID created)
- API Gateway calls Auth Service (trace ID passed)
- Auth Service calls User Service (trace ID passed)
- User Service calls Database (trace ID logged)
All these requests share the same trace ID, allowing you to see the complete picture.
Implementation Methods
Treblle supports trace ID propagation through the treblle-metadata header:
Note
Recommended Approach: Include trace-id inside the treblle-metadata header as a flat key-value pair. This approach provides the best integration with Treblle’s Trace feature.
Method 1: Using treblle-metadata Header (Recommended)
API Gateway (Request Entry Point)
// Node.js / Express - Generate and propagate trace ID
const { v4: uuidv4 } = require('uuid');
app.use((req, res, next) => {
// Generate trace ID if not present
const traceId = req.headers['x-trace-id'] || uuidv4();
// Add to treblle-metadata for Treblle tracking
req.headers['treblle-metadata'] = JSON.stringify({
'trace-id': traceId,
'service': 'api-gateway',
'environment': process.env.NODE_ENV
});
// Also pass as standard header to downstream services
req.headers['x-trace-id'] = traceId;
next();
});Downstream Microservices
// Auth Service, User Service, etc.
app.use((req, res, next) => {
// Extract trace ID from incoming request
const traceId = req.headers['x-trace-id'];
// Propagate in treblle-metadata for this service
req.headers['treblle-metadata'] = JSON.stringify({
'trace-id': traceId,
'service': 'auth-service',
'environment': process.env.NODE_ENV
});
next();
});
// When making calls to other services
async function callUserService(userId) {
const traceId = req.headers['x-trace-id'];
const response = await axios.get(`https://user-service/users/${userId}`, {
headers: {
'x-trace-id': traceId,
'treblle-metadata': JSON.stringify({
'trace-id': traceId,
'service': 'user-service',
'caller': 'auth-service'
})
}
});
return response.data;
}Method 2: Using Alternative Tracing Headers
Treblle also supports the treblle-tag-id header for tracing:
// Alternative approach
req.headers['treblle-tag-id'] = traceId;Caution
Important: If both treblle-metadata with trace-id and treblle-tag-id are present, the trace-id in treblle-metadata takes precedence. Choose one approach and use it consistently across all services.
Python Implementation
# Flask - API Gateway
import uuid
import json
from flask import request
@app.before_request
def add_trace_id():
trace_id = request.headers.get('x-trace-id', str(uuid.uuid4()))
request.environ['treblle-metadata'] = json.dumps({
'trace-id': trace_id,
'service': 'api-gateway',
'environment': os.getenv('ENVIRONMENT')
})
request.environ['x-trace-id'] = trace_id
# Downstream service call
def call_downstream_service(endpoint):
trace_id = request.headers.get('x-trace-id')
response = requests.get(
f'https://downstream-service{endpoint}',
headers={
'x-trace-id': trace_id,
'treblle-metadata': json.dumps({
'trace-id': trace_id,
'service': 'downstream-service',
'caller': 'api-gateway'
})
}
)
return response.json()PHP Implementation
// Laravel - Middleware
namespace App\Http\Middleware;
class TraceIdMiddleware
{
public function handle($request, Closure $next)
{
$traceId = $request->header('x-trace-id', Str::uuid()->toString());
$request->headers->set('treblle-metadata', json_encode([
'trace-id' => $traceId,
'service' => 'api-gateway',
'environment' => env('APP_ENV')
]));
$request->headers->set('x-trace-id', $traceId);
return $next($request);
}
}
// Downstream service call
use Illuminate\Support\Facades\Http;
function callDownstreamService($endpoint) {
$traceId = request()->header('x-trace-id');
$response = Http::withHeaders([
'x-trace-id' => $traceId,
'treblle-metadata' => json_encode([
'trace-id' => $traceId,
'service' => 'user-service',
'caller' => 'api-gateway'
])
])->get("https://user-service{$endpoint}");
return $response->json();
}Verification Checklist
Tip
Testing Tip: Use curl to test trace ID propagation: curl -H "x-trace-id: test-123" https://your-api.com/endpoint and verify the trace ID appears in Treblle for all services involved in handling the request.
Step 2: View Complete Request Traces
Once trace ID propagation is implemented, you can view the complete flow of requests through your system.
Navigate to Trace Dashboard
- Go to Trace in the left navigation bar
- You’ll see a list of all traces with their associated requests
- Switch between List and Table views
The Trace dashboard in List view displays each trace as a card showing:
- Trace ID: Unique identifier (e.g.,
b52a4bc0-210a-4501-9ab3-7c50236f7eaa) - Duration: Total time (e.g.,
0ms,1000ms) - Status: Success (green indicator) or Failed
- Requests: Number of API calls in the trace (e.g.,
2) - APIs: Number of unique APIs involved (e.g.,
2) - Parent API: The first API called (e.g., “Platform API (Forge)”, “Identity API”)
- Timestamp: When the trace was created
Table View
The Table view provides a compact, tabular format showing:
- Trace Name: The trace ID
- Requests #: Number of requests
- Api #: Number of APIs
- Parent Api Name: First API in the chain
- Environment: Environment indicator (pink “P” badge)
- Status: Pass/Fail indicator
- Duration: Time taken
- Time: Timestamp
Filtering Traces
Use filters to find specific traces:
Available Filters:
Status:
- Filter by Success or Failed traces
- Quickly isolate problematic request flows
Duration:
Filter by time ranges:
- 0ms - 200ms (very fast)
- 200ms - 500ms (fast)
- 500ms - 1s (moderate)
- 1s - 2s (slow)
- 2s - 3s (very slow)
- 3s - 5s (extremely slow)
APIs:
- Filter traces involving specific microservices
- Search for particular APIs in the trace chain
- Analyze cross-service dependencies
Note
Performance Baseline: After implementing tracing, monitor for a week to establish baseline performance. This helps you identify anomalies when they occur.
Step 3: View Metadata in Requests
Deep dive into specific requests to see the business context and trace information.
Navigate to Request Details
- Go to Requests in the left navigation
- Click on any request to open detailed view
- Click on the Metadata tab
The Metadata tab displays all custom metadata fields you’ve added:
- Customer: Shows the
user-idfield (e.g., “I5paNJD0miydDAJ”) - Trace ID: Displays the trace ID for correlation (e.g., “5adf904e-3e14-4571-8866-7b76494790da”)
- Custom Fields: All other metadata like
company_SAqzO,treblle-username,x-customer-id
Viewing Metadata in Headers
You can also see the raw treblle-metadata header:
- Click on any request
- Go to the General tab
- Click on Headers sub-tab
The treblle-metadata header (line 11 in the screenshot) shows the escaped JSON string containing:
{
"tag-id": "5adf904e-3e14-4571-8866-7b76494790da",
...other metadata fields
}Understanding Metadata Structure
Based on the documentation, metadata must be:
- Flat key-value pairs: No nested objects
- JSON stringified: Use
JSON.stringify()when sending - Maximum 2000 characters: Total size limit
Special Fields:
trace-idortag-id: Used for grouping requests in Trace sectionuser-id: Links to Customer Dashboard- All other fields: Available for filtering and custom analysis
Step 4: Add Business Context with Metadata
Metadata enriches traces with business information, making it easier to understand impact and prioritize fixes.
Implementing Metadata
Beyond trace IDs, add contextual information to every request:
// Example: E-commerce API
req.headers['treblle-metadata'] = JSON.stringify({
'trace-id': traceId,
'customer-id': user.customerId,
'customer-tier': user.tier, // 'free', 'pro', 'enterprise'
'feature': 'checkout',
'region': 'us-east-1',
'environment': 'production',
'version': 'v2.1.0',
'session-id': sessionId
});# Example: SaaS Platform
request.environ['treblle-metadata'] = json.dumps({
'trace-id': trace_id,
'organization-id': org.id,
'plan': org.subscription_plan,
'user-role': current_user.role,
'feature-flag': 'new-dashboard-v2',
'region': get_region(),
'tenant': org.tenant_id
})Useful Metadata Fields for DevOps
Customer Identification
customer-id, organization-id, tenant-id - Track which customers are experiencing issues. Prioritize fixes for high-value customers.
Deployment Context
version, build, commit-sha - Correlate performance issues with specific deployments. Quickly identify if a new release introduced problems.
Infrastructure Details
region, availability-zone, container-id, instance-id - Identify if issues are isolated to specific infrastructure. Useful for cloud provider outages.
Feature Flags
feature-flag-name, experiment-id, variant - Track performance of new features behind flags. Rollback if new feature causes degradation.
Viewing Metadata in Traces
Once metadata is implemented, you can:
- Filter traces by metadata: Find all traces for a specific customer or feature
- Group by metadata values: See performance across different customer tiers
- Correlate issues: Identify if problems affect specific regions or versions
Caution
PII Warning: Never include sensitive personal information (passwords, credit card numbers, SSN) in metadata. Use anonymized IDs and aggregate categories only.
Step 4: Monitor API Performance Dashboard
The API Dashboard provides high-level performance metrics across all services.
Navigate to API Dashboard
- Click on APIs in the left navigation
- Select the API you want to monitor
- View the Dashboard with performance widgets
The API Dashboard shows multiple widgets:
- DDoS Threat Level: None (+0.97% vs avg)
- Missing Security Headers: Bar chart showing security header compliance
- SQL Injection: Donut chart (0% Failed, 100% Pass)
- API compliance: 62% compliance score
- New Requests: 893.0K total requests
- New Endpoints: 8 endpoints
- New Customers: 20 customers
- Governance Score: D (65)
- Zombie Endpoints: 0
- CO2 Emissions: 935.36 kg
- Recent requests table with Method, Response, Name, Load time, Threat, and Time columns
- New Problems: 1 problem detected
Key Performance Metrics
Request Volume:
- Requests per second (RPS)
- Requests per minute (RPM)
- Daily/weekly trends
- Peak traffic times
Latency Metrics:
- Average response time
- P50 (median), P95, P99 latency
- Slowest endpoints
- Latency distribution graph
Error Rates:
- 4xx errors (client errors)
- 5xx errors (server errors)
- Error rate percentage
- Error trends over time
Success Rate:
- Percentage of successful requests (2xx responses)
- Availability (uptime based on successful responses)
- Success rate by endpoint
Setting Up Dashboard Widgets
Customize your dashboard to focus on critical metrics:
- Click Customize Dashboard (grid icon)
- Enable relevant widgets for your monitoring needs
- Toggle individual widgets on/off
- Click Save Changes
Available Dashboard Widgets (Part 1):
- Recent Requests: List of recent requests made to your API
- Top Cities: List of top cities from which users access your API
- Top Countries: List of top countries from which users access your API
- Requests Per Day: Overview of request volume per time period
- Recent Requests Map: Recent requests on a live map
- Top Devices: List of top devices used to access your API
- Client App Versions: Which versions of apps access your API
- Average Load Time: The average load time on your API
- Average Response Size: The average response size on your API
Available Dashboard Widgets (Part 2):
- Performance Per Day: Overview of request load time per time period
- Top Customers: List of top customers accessing your API
- Recent Questions: List of recent questions people asked Alfred AI
- Top Questions: List of top questions people asked Alfred AI
- Problems Heartbeat: Average health of your API
- Total Requests: Number of requests in your API in the selected period
- Total Endpoints: Number of endpoints in your API in the selected period
- Compliance: Average compliance percentage of your API
- Governance: Average Governance score of your API
- Total Customers: Number of customers in your API in the selected period
Available Dashboard Widgets (Part 3):
- Co2 Emissions: Gain insight into the CO2 emissions generated by your APIs
- Security Headers: Percentage of requests that failed security header check
- Denial Of Service: Monitor your APIs threat level based on real-time traffic
- SQL Injection: Percentage of requests that failed or passed SQL injection check
- Zombie Endpoints: Number of endpoints with no activity in last 30 days
Recommended Widgets for DevOps:
- Total Requests: Track request volume changes
- Average Load Time: Monitor average latency trends
- Performance Per Day: See performance trends over time
- Recent Requests: Quick access to latest activity
- Problems Heartbeat: Overall API health status
Step 5: Filter and Analyze Requests
When performance issues occur, use the Requests section to drill down into specific requests.
Navigate to Requests
- Click Requests in the left navigation
- Use the Filter button to narrow down to problematic requests
Available Filters
The filter panel provides multiple options:
REQUEST Filters:
- Method: GET, POST, PUT, DELETE, PATCH, etc.
- Response code: Filter by status codes (200, 404, 500, etc.)
- Endpoints: Search for specific endpoints
- Request Parameters: Filter by query parameters
- Has Problems: Filter requests with detected issues (Any dropdown)
METADATA Filters:
- Customer: Search for specific customer IDs
- Trace ID: Enter trace ID to see all requests in a trace
- IP Address: Filter by client IP
- Parameter/Value: Custom parameter filtering
Saved Searches
Click Save search to save frequently used filter combinations for quick access later.
Tip
Pro Tip: Create saved searches for common investigation patterns (e.g., “Slow Requests”, “Customer Errors”). This speeds up recurring troubleshooting tasks.
Analyzing Request Details
Click on any request to see complete details across multiple tabs:
General Tab:
- Request body, headers (including
treblle-metadata), and response data - HTTP method, path, status code
- Request/response body content
Info Tab:
- User data (IP, location, device, AI Agent detection)
- Server data (timezone, OS, software)
- Geographic map of request origin
Security Tab:
- 13 OWASP security checks
- Threat level assessment
- IP reputation analysis
API Compliance Tab:
- Compliance standards validation
- API governance metrics
Metadata Tab:
- All custom metadata fields
- Customer ID
- Trace ID for correlation
- Business context (company, environment, custom fields)
Note
Complete Visibility: The combination of trace ID in metadata, request details, and performance metrics gives you end-to-end visibility into your distributed system’s behavior.
Complete DevOps Monitoring Workflow
Here’s how all the features work together for comprehensive monitoring:
1. Proactive Monitoring
API Dashboard shows overall system health. DevOps team monitors request volume, latency trends, and error rates throughout the day.
2. Issue Detection
Spike in latency or errors detected on dashboard. Team receives alert and begins investigation immediately.
3. Trace Analysis
Navigate to Trace dashboard, filter for slow/failed requests during incident window. Identify which service in the chain is causing the problem.
4. Context Understanding
Review metadata to understand impact. Is it affecting all customers or just one tier? Specific region? New deployment version?
5. Deep Dive Investigation
Click into specific requests, review request/response details, analyze timing breakdown. Identify root cause (database query, external API, resource exhaustion).
6. Resolution & Validation
Deploy fix, monitor dashboard and traces to confirm issue resolved. Performance returns to baseline, error rate drops to normal.
Troubleshooting Common Issues
Issue 1: Traces Not Appearing
Problem: Implemented trace ID propagation but not seeing traces in Treblle
Checklist:
Issue 2: Incomplete Traces
Problem: Traces show some services but missing others
Causes:
- Service not instrumented with Treblle
- Trace ID not propagated to that service
- Service using different header name
Solution:
- Verify Treblle SDK installed on missing service
- Log headers at service entry point
- Confirm trace ID matches across all services
Issue 3: Performance Overhead Concerns
Problem: Worried about Treblle SDK adding latency
Reality:
- SDK overhead: < 5ms per request
- Async data transmission (non-blocking)
- Sampling available for high-traffic APIs
Configuration:
// Sample 10% of requests in production
treblle.init({
apiKey: process.env.TREBLLE_API_KEY,
projectId: process.env.TREBLLE_PROJECT_ID,
sampling: process.env.NODE_ENV === 'production' ? 0.1 : 1.0
});Next Steps
Now that you’ve implemented comprehensive microservices monitoring:
- Create runbooks: Document common trace patterns and their solutions
- Train your team: Ensure all engineers know how to read traces
- Set up dashboards: Create team-specific views (frontend, backend, infrastructure)
- Automate responses: Build automation for common issues (auto-scaling, circuit breakers)
- Regular reviews: Weekly performance review meetings using Treblle data
Your microservices architecture is now fully observable, enabling rapid troubleshooting and continuous performance optimization.