SkySignal Agent
Official APM agent for monitoring Meteor.js applications with SkySignal.
Features
- System Metrics Monitoring - CPU, memory, disk, network, V8 heap, event loop utilization, and process resource usage
- Method Performance Traces - Track Meteor Method execution with operation-level profiling and COLLSCAN detection
- Publication Monitoring - Monitor publication performance and subscriptions
- Publication Efficiency Analysis - Detect over-fetching (missing field projections) and unbounded cursors
- Error Tracking - Automatic server-side and client-side error capture with browser context
- Log Collection - Capture
console.*and MeteorLog.*output with structured metadata and sampling - HTTP Request Monitoring - Track outgoing HTTP requests
- Outbound HTTP Instrumentation - Zero-patch
diagnostics_channeltracing for outbound HTTP/HTTPS andfetchrequests - Database Query Monitoring - MongoDB query performance tracking with COLLSCAN flagging
- Live Query Monitoring - Per-observer driver detection for Change Streams (Meteor 3.5+), oplog, and polling
- DNS Timing - Measure DNS resolution latency by wrapping
dns.lookupanddns.resolve - CPU Profiling - On-demand inspector-based CPU profiling when CPU exceeds a configurable threshold
- Deprecated API Detection - Track sync vs async Meteor API usage to guide Meteor 3.x migration
- Environment Snapshots - Periodic capture of package versions, Node.js flags, and OS metadata
- Vulnerability Scanning - Hourly
npm auditwith severity reporting for high/critical CVEs - Real User Monitoring (RUM) - Browser-side Core Web Vitals (LCP, FID, CLS, TTFB, FCP, TTI) with automatic performance warnings
- SPA Route Tracking - Automatic performance collection on every route change
- Session Tracking - 30-minute user sessions with localStorage persistence
- Browser Context - Automatic device, browser, OS, and network information collection
- Batch Processing - Efficient batching and async delivery to minimize performance impact
- Worker Thread Offloading - Optional
worker_threadspool for compression to keep the host event loop clear
Installation
Add the package to your Meteor application:
meteor add skysignal:agent
Quick Start
1. Get Your API Key
Sign up at SkySignal and create a new site to get your API key.
2. Configure the Agent
In your Meteor server startup code (e.g., server/main.js):
1import { Meteor } from 'meteor/meteor'; 2import { SkySignalAgent } from 'meteor/skysignal:agent'; 3 4Meteor.startup(() => { 5 // Configure the agent 6 SkySignalAgent.configure({ 7 apiKey: process.env.SKYSIGNAL_API_KEY || 'your-api-key-here', 8 enabled: true, 9 host: 'my-app-server-1', // Optional: defaults to hostname 10 appVersion: '1.2.3', // Optional: auto-detected from package.json 11 12 // Optional: Customize collection intervals 13 systemMetricsInterval: 60000, // 1 minute (default) 14 flushInterval: 10000, // 10 seconds (default) 15 batchSize: 50, // Max items per batch (default) 16 17 // Optional: Sampling for high-traffic apps 18 traceSampleRate: 1.0, // 100% of traces (reduce for high volume) 19 20 // Optional: Feature toggles 21 collectTraces: true, 22 collectMongoPool: true, 23 collectDDPConnections: true, 24 collectJobs: true 25 }); 26 27 // Start monitoring 28 SkySignalAgent.start(); 29});
3. Add to Settings File
For production, use Meteor settings. The agent auto-initializes from settings if configured:
settings-production.json:
1{ 2 "skysignal": { 3 "apiKey": "sk_your_api_key_here", 4 "enabled": true, 5 "host": "production-server-1", 6 "appVersion": "1.2.3", 7 "traceSampleRate": 0.5, 8 "collectTraces": true, 9 "collectMongoPool": true, 10 "collectDDPConnections": true, 11 "collectJobs": true, 12 "collectLogs": true, 13 "logLevels": ["warn", "error", "fatal"], 14 "logSampleRate": 0.5, 15 "captureIndexUsage": true, 16 "indexUsageSampleRate": 0.05, 17 "collectDnsTimings": true, 18 "collectOutboundHttp": true, 19 "collectCpuProfiles": true, 20 "cpuProfileThreshold": 80, 21 "collectDeprecatedApis": true, 22 "collectPublications": true, 23 "collectEnvironment": true, 24 "collectVulnerabilities": true 25 }, 26 "public": { 27 "skysignal": { 28 "publicKey": "pk_your_public_key_here", 29 "rum": { 30 "enabled": true, 31 "sampleRate": 0.5 32 }, 33 "errorTracking": { 34 "enabled": true, 35 "captureUnhandledRejections": true 36 } 37 } 38 } 39}
The agent auto-starts when it finds valid configuration in Meteor.settings.skysignal.
Manual initialization (optional):
1import { SkySignalAgent } from 'meteor/skysignal:agent'; 2 3Meteor.startup(() => { 4 // Only needed if not using settings auto-initialization 5 const config = Meteor.settings.skysignal; 6 7 if (config && config.apiKey) { 8 SkySignalAgent.configure(config); 9 SkySignalAgent.start(); 10 } else { 11 console.warn('⚠️ SkySignal not configured - monitoring disabled'); 12 } 13});
Configuration Options
API Configuration
| Option | Type | Default | Description |
|---|---|---|---|
apiKey | String | required | Your SkySignal API key (sk_ prefix) |
endpoint | String | https://dash.skysignal.app | SkySignal API endpoint |
enabled | Boolean | true | Enable/disable the agent |
Host & Version Identification
| Option | Type | Default | Description |
|---|---|---|---|
host | String | os.hostname() | Host identifier for this instance |
appVersion | String | Auto-detect | App version from package.json or manually configured |
buildHash | String | Auto-detect | Build hash for source map lookup. Auto-detects from BUILD_HASH or GIT_SHA environment variables |
Batching Configuration
| Option | Type | Default | Description |
|---|---|---|---|
batchSize | Number | 50 | Max items per batch before auto-flush |
batchSizeBytes | Number | 262144 | Max bytes (256KB) per batch |
flushInterval | Number | 10000 | Interval (ms) to flush batched data |
Sampling Rates
| Option | Type | Default | Description |
|---|---|---|---|
traceSampleRate | Number | 1.0 | Server trace sample rate (0-1). Set to 0.1 for 10% |
rumSampleRate | Number | 0.5 | RUM sample rate (0-1). 50% by default for high-volume |
Collection Intervals
| Option | Type | Default | Description |
|---|---|---|---|
systemMetricsInterval | Number | 60000 | System metrics collection interval (1 minute) |
mongoPoolInterval | Number | 60000 | MongoDB pool metrics interval (1 minute) |
collectionStatsInterval | Number | 300000 | Collection stats interval (5 minutes) |
ddpConnectionsInterval | Number | 30000 | DDP connection updates interval (30 seconds) |
jobsInterval | Number | 30000 | Background job stats interval (30 seconds) |
dnsTimingsInterval | Number | 60000 | DNS timing aggregation interval (1 minute) |
outboundHttpInterval | Number | 60000 | Outbound HTTP aggregation interval (1 minute) |
cpuProfileCheckInterval | Number | 30000 | CPU check interval for threshold profiling (30 seconds) |
deprecatedApisInterval | Number | 300000 | Deprecated API usage reporting interval (5 minutes) |
publicationsInterval | Number | 300000 | Publication efficiency reporting interval (5 minutes) |
environmentInterval | Number | 1800000 | Environment snapshot interval (30 minutes) |
vulnerabilitiesInterval | Number | 3600000 | Vulnerability scan interval (1 hour) |
Feature Flags
| Option | Type | Default | Description |
|---|---|---|---|
collectSystemMetrics | Boolean | true | Collect system metrics (CPU, memory, disk, network) |
collectTraces | Boolean | true | Collect method/publication traces |
collectErrors | Boolean | true | Collect errors and exceptions |
collectHttpRequests | Boolean | true | Collect HTTP request metrics |
collectMongoPool | Boolean | true | Collect MongoDB connection pool metrics |
collectCollectionStats | Boolean | true | Collect MongoDB collection statistics |
collectDDPConnections | Boolean | true | Collect DDP/WebSocket connection metrics |
collectLiveQueries | Boolean | true | Collect Meteor live query metrics (change streams, oplog, polling) |
collectJobs | Boolean | true | Collect background job metrics |
collectLogs | Boolean | true | Collect server-side logs from console and Meteor Log |
collectRUM | Boolean | false | Client-side RUM (disabled by default, requires publicKey) |
collectDnsTimings | Boolean | true | Collect DNS resolution latency by wrapping dns.lookup/dns.resolve |
collectOutboundHttp | Boolean | true | Collect outbound HTTP metrics via diagnostics_channel (Node 16+) |
collectCpuProfiles | Boolean | true | Enable on-demand CPU profiling when CPU exceeds threshold |
collectDeprecatedApis | Boolean | true | Track sync vs async Meteor API usage (migration readiness) |
collectPublications | Boolean | true | Detect publication over-fetching and missing projections |
collectEnvironment | Boolean | true | Capture environment metadata (packages, flags, OS info) |
collectVulnerabilities | Boolean | true | Run npm audit scans and report high/critical CVEs |
MongoDB Pool Configuration
| Option | Type | Default | Description |
|---|---|---|---|
mongoPoolFixedConnectionMemory | Number | null | Optional: fixed bytes per connection for memory estimation |
Method Tracing Configuration
| Option | Type | Default | Description |
|---|---|---|---|
traceMethodArguments | Boolean | true | Capture method arguments (sanitized) |
maxArgLength | Number | 1000 | Max string length for arguments |
traceMethodOperations | Boolean | true | Capture detailed operation timeline |
Index Usage Tracking
| Option | Type | Default | Description |
|---|---|---|---|
captureIndexUsage | Boolean | true | Capture MongoDB index usage via explain() |
indexUsageSampleRate | Number | 0.05 | Sample 5% of queries for explain() |
explainVerbosity | String | executionStats | queryPlanner | executionStats | allPlansExecution |
explainSlowQueriesOnly | Boolean | false | Only explain queries exceeding slow threshold |
Performance Safeguards
| Option | Type | Default | Description |
|---|---|---|---|
maxBatchRetries | Number | 3 | Max retries for failed batches |
requestTimeout | Number | 3000 | API request timeout (3 seconds) |
maxMemoryMB | Number | 50 | Max memory (MB) for batches |
CPU Profiling Configuration
| Option | Type | Default | Description |
|---|---|---|---|
cpuProfileThreshold | Number | 80 | CPU usage percentage to trigger an on-demand profile |
cpuProfileDuration | Number | 10000 | Duration (ms) of the CPU profile sample |
cpuProfileCooldown | Number | 300000 | Minimum time (ms) between consecutive profiles (5 minutes) |
Worker Offload (Large Pools)
| Option | Type | Default | Description |
|---|---|---|---|
useWorkerThread | Boolean | false | Enable worker thread for large pools |
workerThreshold | Number | 50 | Spawn worker if pool size exceeds this |
Background Job Monitoring
| Option | Type | Default | Description |
|---|---|---|---|
collectJobs | Boolean | true | Enable background job monitoring |
jobsInterval | Number | 30000 | Job stats collection interval (30 seconds) |
jobsPackage | String | null | Auto-detect, or specify: "msavin:sjobs" |
Log Collection
| Option | Type | Default | Description |
|---|---|---|---|
collectLogs | Boolean | true | Enable log capturing |
logLevels | Array | ["info", "warn", "error", "fatal"] | Log levels to capture (excludes debug by default) |
logSampleRate | Number | 1.0 | Sample rate (0-1). Reduce for high-volume apps |
logMaxMessageLength | Number | 10000 | Max characters per log message before truncation |
logCaptureConsole | Boolean | true | Intercept console.log, console.info, console.warn, console.error, console.debug |
logCaptureMeteorLog | Boolean | true | Intercept Meteor Log.info, Log.warn, Log.error, Log.debug |
Client-Side Error Tracking
Client-side error tracking is configured in Meteor.settings.public.skysignal.errorTracking and auto-initializes alongside RUM.
| Option | Type | Default | Description |
|---|---|---|---|
errorTracking.enabled | Boolean | true | Enable client-side error capture |
errorTracking.captureUnhandledRejections | Boolean | true | Capture unhandled Promise rejections |
errorTracking.debug | Boolean | false | Log error tracker activity to the browser console |
What Gets Monitored
System Metrics (Automatic)
The agent automatically collects:
- CPU Usage - Overall CPU utilization percentage
- CPU Cores - Number of CPU cores available
- Load Average - 1m, 5m, 15m load averages
- Memory Usage - Total, used, free, and percentage (heap, external, RSS)
- Event Loop Utilization - 0-1 ratio of how busy the event loop is (Node 14.10+)
- V8 Heap Statistics - Per-space breakdown (new_space, old_space, code_space, etc.), native context count, detached context leak detection
- Process Resource Usage - User/system CPU time, voluntary/involuntary context switches, filesystem reads/writes (via
process.resourceUsage()) - Active Resources - Handle/request counts by type (Timer, TCPWrap, FSReqCallback) for resource leak detection (Node 17+)
- Container Memory Limit - cgroup memory constraint for containerized deployments (Node 19+)
- Disk Usage - Disk space utilization (platform-dependent)
- Network Traffic - Bytes in/out (platform-dependent)
- Process Count - Number of running processes (platform-dependent)
- Agent Version - Tracks the installed agent version for compatibility checks
Collected every 60 seconds by default.
Method Traces
Automatic instrumentation of Meteor Methods:
- Method name and execution time
- Operation-level breakdown (DB queries, async operations, compute time)
- Detailed MongoDB operation tracking with explain() support
- COLLSCAN detection - Flags queries performing full collection scans (no index used)
- Slow aggregation pipeline capture - Captures sanitized pipeline stages for slow aggregations
- N+1 query detection and slow query analysis
this.unblock()analysis with optimization recommendations- Wait time tracking (DDP queue, connection pool)
- Error tracking with stack traces
- User context and session correlation
Publication Monitoring
Track publication performance:
- Publication name and execution time
- Subscription lifecycle tracking
- Document counts (added, changed, removed)
- Data transfer size estimation
- Live query efficiency (oplog vs polling)
DDP Connection Monitoring
Real-time WebSocket connection tracking:
- Active connection count and status
- Message volume (sent/received) by type
- Bandwidth usage per connection
- Latency measurements (ping/pong)
- Subscription tracking per connection
MongoDB Pool Monitoring
Connection pool health and performance:
- Pool configuration (min/max size, timeouts)
- Active vs available connections
- Checkout wait times (avg, max, P95)
- Queue length and timeout tracking
- Memory usage estimation
Live Query Monitoring
Meteor reactive query tracking with per-observer driver detection:
- Change Stream detection (Meteor 3.5+), oplog, and polling observer types
- Per-observer introspection via
handle._multiplexer._observeDriver.constructor.name - Fallback to
MONGO_OPLOG_URLheuristic for pre-3.5 Meteor apps - Reactive efficiency metric:
(changeStream + oplog) / total observers - Observer count by collection
- Document update rates
- Performance ratings (optimal/good/slow)
- Query signature deduplication
Background Job Monitoring
Track msavin:sjobs (Steve Jobs) and other job packages:
- Job execution times and status
- Queue length and worker utilization
- Failed job tracking with error details
- Job type categorization
DNS Timing
Measure DNS resolution latency to detect slow or misconfigured resolvers:
- Wraps
dns.lookup()anddns.resolve()without replacing them - Per-hostname resolution times with avg, P95, and max latency
- Failure counts and error tracking
- Ring buffer (last 500 samples) to bound memory
- Particularly useful in Docker/K8s environments where DNS is a common latency source
Reported every 60 seconds by default.
Outbound HTTP Instrumentation
Track outbound HTTP/HTTPS requests using Node.js diagnostics_channel (Node 16+):
- Zero monkey-patching — uses the same mechanism as OpenTelemetry and Undici
- Request timing breakdown: DNS, connect, TLS handshake, TTFB, total duration
- Request/response metadata: method, host, path, status code, content-length
- Error rates for external API dependencies
- Aggregated per endpoint to minimize cardinality
Reported every 60 seconds by default.
CPU Profiling (On-Demand)
Automatic CPU profiling when CPU usage spikes above a configurable threshold:
- Uses the built-in
inspectormodule (same as Chrome DevTools) — zero dependencies - Triggered automatically when CPU exceeds the threshold (default: 80%)
- Sends a summary (top functions by self-time), not raw profile data
- Configurable duration (default: 10s) and cooldown (default: 5 min between profiles)
- Minimal overhead when not actively profiling
Deprecated API Detection
Track synchronous vs asynchronous Meteor API usage to measure migration readiness:
- Wraps
Mongo.Collectionprototype methods to count sync vs async calls - Tracks
Collection.find().fetch()vsfetchAsync(),findOne()vsfindOneAsync(), etc. - Tracks
Meteor.call()vsMeteor.callAsync() - Per-collection counters with negligible overhead (just increments)
- Helps prioritize Meteor 3.x async migration efforts
Reported every 5 minutes by default.
Publication Efficiency Analysis
Detect over-fetching and unbounded publications:
- Wraps
Meteor.publishto intercept returned cursors - Checks
_cursorDescription.options.fieldsfor missing projections (over-fetching flag) - Tracks document counts per publication (average and max)
- Flags publications returning large result sets without limits
- Per-publication call counts and efficiency scores
Reported every 5 minutes by default.
Environment Snapshots
Periodic capture of application environment metadata:
- Installed package versions from
process.versionsandpackage.json - Node.js flags (
process.execArgv) - Environment variable keys (NOT values — security-conscious)
- OS platform, release, CPU count, total memory
- Collected immediately on start, then refreshed periodically
Reported every 30 minutes by default.
Vulnerability Scanning
Automated security scanning for known package vulnerabilities:
- Runs
npm audit --jsonon a configurable schedule - Supports both npm audit v6 and v7+ JSON formats
- Only reports high and critical severity vulnerabilities to reduce noise
- Tracks: package name, severity, advisory title, fix availability
- Deduplicates results (skips reporting if unchanged since last scan)
- 30-second timeout on
npm auditto prevent blocking
Reported every 1 hour by default. Initial scan delayed 60s after startup.
Error Tracking
Automatic error capture on both server and client:
- Server-side errors with stack traces
- Client-side errors via
window.onerrorandunhandledrejectionhandlers - Browser context (URL, user agent, viewport, user ID)
- Error grouping and fingerprinting
- Affected users and methods
- Build hash correlation for source maps
- Batched delivery to
/api/v1/errorswith public key authentication
Log Collection
Server-side log capture with structured metadata:
- Intercepts
console.log,console.info,console.warn,console.error,console.debug - Intercepts Meteor
Log.info,Log.warn,Log.error,Log.debug - Configurable log levels (default: info, warn, error, fatal)
- Sampling support for high-volume apps
- Message truncation to prevent oversized payloads
- Automatic host and timestamp enrichment
- Correlation with Meteor Method traces via
methodNameandtraceId - Programmatic log submission via
SkySignalAgent.addLog()
Real User Monitoring (RUM) - Client-Side
Automatic browser-side performance monitoring collecting Core Web Vitals and providing PageSpeed-style performance warnings.
What Gets Collected
Core Web Vitals:
- LCP (Largest Contentful Paint) - Measures loading performance
- Good: <2.5s | Needs Improvement: 2.5-4s | Poor: >4s
- FID (First Input Delay) - Measures interactivity
- Good: <100ms | Needs Improvement: 100-300ms | Poor: >300ms
- CLS (Cumulative Layout Shift) - Measures visual stability
- Good: <0.1 | Needs Improvement: 0.1-0.25 | Poor: >0.25
- TTFB (Time to First Byte) - Measures server response time
- Good: <800ms | Needs Improvement: 800-1800ms | Poor: >1800ms
- FCP (First Contentful Paint) - Measures perceived load speed
- Good: <1.8s | Needs Improvement: 1.8-3s | Poor: >3s
- TTI (Time to Interactive) - Measures time until page is fully interactive
- Good: <3.8s | Needs Improvement: 3.8-7.3s | Poor: >7.3s
Additional Context:
- Browser name and version
- Device type (mobile, tablet, desktop)
- Operating system
- Network connection type, downlink speed, RTT
- Viewport and screen dimensions
- User ID (via Meteor.userId() for correlation with server-side traces)
- Session ID (30-minute sessions with localStorage persistence)
- Page route and referrer
- Top 10 slowest resources
Configuration
RUM monitoring auto-initializes from your Meteor settings.
settings-development.json:
1{ 2 "skysignal": { 3 "apiKey": "sk_your_server_api_key_here", 4 "endpoint": "http://localhost:3000" 5 }, 6 "public": { 7 "skysignal": { 8 "publicKey": "pk_your_public_key_here", 9 "endpoint": "http://localhost:3000", 10 "rum": { 11 "enabled": true, 12 "sampleRate": 1.0, 13 "debug": false 14 }, 15 "errorTracking": { 16 "enabled": true, 17 "captureUnhandledRejections": true, 18 "debug": false 19 } 20 } 21 } 22}
Configuration Options:
| Option | Type | Default | Description |
|---|---|---|---|
publicKey | String | required | SkySignal Public Key (pk_ prefix) - Safe for client-side use |
endpoint | String | (same origin) | Base URL of SkySignal API (e.g., http://localhost:3000 or https://dash.skysignal.app) |
rum.enabled | Boolean | true | Enable/disable RUM collection |
rum.sampleRate | Number | Auto | Sample rate (0-1). Auto: 100% for localhost, 50% for production |
rum.debug | Boolean | false | Enable console logging for debugging |
errorTracking.enabled | Boolean | true | Enable client-side error capture via window.onerror and unhandledrejection |
errorTracking.captureUnhandledRejections | Boolean | true | Capture unhandled Promise rejections |
errorTracking.debug | Boolean | false | Log error tracker activity to the browser console |
Key Security Note:
- API Key (sk_ prefix): Server-side only, keep in private
settings.skysignal. Used for server-to-server communication. - Public Key (pk_ prefix): Client-side safe, can be in
settings.public.skysignal. Used for browser RUM collection. - This follows the Stripe pattern of separating public/private keys for security.
The agent automatically:
- Collects Core Web Vitals using Google's
web-vitalslibrary - Tracks SPA route changes and collects metrics for each route
- Batches measurements and sends via fire-and-forget HTTP with
keepalive: true - Provides PageSpeed-style console warnings for poor performance
- Correlates metrics with server-side traces via Meteor.userId()
SPA Route Change Tracking
The RUM client automatically detects route changes in single-page applications by:
- Overriding
history.pushStateandhistory.replaceState - Listening for
popstateevents (browser back/forward) - Listening for
hashchangeevents (hash-based routing)
Each route change triggers a new performance collection, allowing you to track performance across your entire application navigation flow.
Performance Warnings
When Core Web Vitals exceed recommended thresholds, the RUM collector logs PageSpeed-style warnings to the console:
[SkySignal RUM] Largest Contentful Paint (LCP) is slow: 4200ms. LCP should be under 2.5s for good user experience. Consider optimizing images, removing render-blocking resources, and improving server response times.
These warnings help developers identify performance issues during development and testing.
Manual Usage (Advanced)
While RUM auto-initializes, you can also use it manually:
1import { SkySignalRUM } from 'meteor/skysignal:agent'; 2 3// Check if initialized 4if (SkySignalRUM.isInitialized()) { 5 // Get current session ID 6 const sessionId = SkySignalRUM.getSessionId(); 7 8 // Get current metrics (for debugging) 9 const metrics = SkySignalRUM.getMetrics(); 10 11 // Get performance warnings (for debugging) 12 const warnings = SkySignalRUM.getWarnings(); 13 14 // Manually track a page view (for custom routing) 15 SkySignalRUM.trackPageView('/custom-route'); 16}
How It Works
- Session Management - Creates a 30-minute session in localStorage, renews on user activity
- Core Web Vitals Collection - Uses Google's
web-vitalslibrary for accurate measurements - Browser Context Collection - Detects browser, device, OS, network info from user agent and Navigator API
- Performance Warnings - Compares metrics against PageSpeed thresholds and logs warnings
- Batching - Batches measurements (default: 10 per batch, 5-second flush interval)
- HTTP Transmission - Sends to
/api/v1/rumendpoint withkeepalive: truefor reliability - SPA Detection - Automatically resets and re-collects metrics on route changes
Advanced Usage
Custom Metrics
Track business-specific KPIs and performance indicators with the custom metrics API:
Counter Metrics
Use counters for values that only increment (orders placed, emails sent, API calls):
1import { SkySignalAgent } from 'meteor/skysignal:agent'; 2 3// Simple counter increment 4SkySignalAgent.counter('orders.completed'); 5 6// Counter with custom value and tags 7SkySignalAgent.counter('items.sold', 5, { 8 tags: { category: 'electronics', store: 'NYC' } 9}); 10 11// Track API requests by endpoint 12SkySignalAgent.counter('api.requests', 1, { 13 tags: { endpoint: '/users', method: 'GET', status: '200' } 14});
Timer Metrics
Use timers for measuring durations (API response times, job execution, processing time):
1// Track payment processing time 2const start = Date.now(); 3await processPayment(order); 4SkySignalAgent.timer('payment.processing', Date.now() - start, { 5 tags: { provider: 'stripe', currency: 'USD' } 6}); 7 8// Track external API call duration 9const start = Date.now(); 10const result = await fetch('https://api.example.com/data'); 11SkySignalAgent.timer('external.api.call', Date.now() - start, { 12 tags: { service: 'example', endpoint: '/data', status: result.status } 13});
Gauge Metrics
Use gauges for point-in-time values that go up or down (queue size, active users, inventory):
1// Track queue depth 2const queueSize = await getQueueSize('email-queue'); 3SkySignalAgent.gauge('queue.size', queueSize, { 4 unit: 'items', 5 tags: { queue: 'email' } 6}); 7 8// Track active users 9const activeUsers = Meteor.server.sessions.size; 10SkySignalAgent.gauge('users.active', activeUsers, { 11 unit: 'users' 12}); 13 14// Track inventory levels 15SkySignalAgent.gauge('inventory.stock', 150, { 16 unit: 'items', 17 tags: { product: 'widget-123', warehouse: 'NYC' } 18});
Generic trackMetric Method
For full control, use the generic trackMetric() method:
1SkySignalAgent.trackMetric({ 2 name: 'checkout.flow', 3 type: 'counter', // 'counter' | 'timer' | 'gauge' 4 value: 1, 5 unit: 'conversions', // optional 6 tags: { // optional - for filtering in dashboard 7 product: 'premium', 8 region: 'us-east-1' 9 } 10});
Manual Trace Submission
Track custom operations:
1const startTime = Date.now(); 2 3// Your code here... 4 5SkySignalAgent.client.addTrace({ 6 traceType: 'method', 7 methodName: 'myCustomOperation', 8 timestamp: new Date(startTime), 9 duration: Date.now() - startTime, 10 userId: this.userId, 11 operations: [ 12 { type: 'start', time: 0, details: {} }, 13 { type: 'db', time: 50, details: { collection: 'users', func: 'findOne' } }, 14 { type: 'complete', time: 150, details: {} } 15 ] 16});
Manual Log Submission
Send structured logs programmatically, bypassing console.* / Meteor Log.* interception:
1import { SkySignalAgent } from 'meteor/skysignal:agent'; 2 3// Simple log 4SkySignalAgent.addLog('info', 'User signed up', { userId: 'abc123' }); 5 6// Error log with context 7SkySignalAgent.addLog('error', 'Payment failed', { 8 orderId: 'xyz-789', 9 provider: 'stripe', 10 errorCode: 'card_declined' 11}); 12 13// Warning with structured metadata 14SkySignalAgent.addLog('warn', 'Rate limit approaching', { 15 endpoint: '/api/search', 16 currentRate: 450, 17 limit: 500 18});
Log levels: debug, info, warn, error, fatal
Logs submitted via addLog() are tagged with source: "api" to distinguish them from auto-captured console/Meteor logs.
Stopping the Agent
To gracefully stop the agent (e.g., during shutdown):
1SkySignalAgent.stop();
This will:
- Stop all collectors
- Flush any remaining batched data
- Clear all intervals
Performance Impact
The agent is designed to have minimal performance impact on your application:
Built-in Optimizations
- Fire-and-forget batching - Data is batched and sent asynchronously using
setImmediate()for lowest latency - HTTP connection pooling - Reuses TCP connections with
keepAliveto reduce handshake overhead - Gzip compression - Large payloads (>1KB) are compressed before sending to reduce bandwidth
- Non-blocking collection - System metrics use async commands to avoid blocking the event loop
- Object pooling - HTTP request tracking reuses pre-allocated objects to reduce GC pressure
- Optimized URL matching - Combined regex patterns for O(1) exclude pattern matching
- Staggered startup - Collectors start with 500ms intervals to avoid CPU spikes at boot
- Configurable intervals - Adjust collection frequency based on your needs
- Automatic retries - Failed requests are re-queued with exponential backoff and jitter
Typical Overhead
- CPU: < 1% additional usage
- Memory: ~10-20MB for batching queues
- Network: ~1KB per metric (less with compression), sent in batches
- Event loop: < 1ms impact per collection cycle
Troubleshooting
Agent Not Sending Data
- Check that your API key is correct
- Verify
enabled: truein configuration - Check server logs for error messages
- Verify network connectivity to SkySignal API
High Memory Usage
If you notice high memory usage:
- Reduce
batchSizeto flush data more frequently - Reduce collection intervals
- Disable collectors you don't need
Missing System Metrics
Some system metrics (disk, network, process count) require platform-specific APIs:
- Use the
systeminformationnpm package for comprehensive cross-platform metrics - These metrics may return
nullon certain platforms
API Reference
SkySignalAgent
Main agent singleton instance.
Configuration Methods
configure(options)- Configure the agent with optionsstart()- Start all collectors and monitoringstop()- Stop all collectors and flush data
Custom Metrics Methods
| Method | Description |
|---|---|
counter(name, value?, options?) | Track incremental values (default value: 1) |
timer(name, duration, options?) | Track durations in milliseconds |
gauge(name, value, options?) | Track point-in-time values |
trackMetric(options) | Generic method with full control |
Log Methods
| Method | Description |
|---|---|
addLog(level, message, metadata?) | Submit a structured log entry. Level: debug, info, warn, error, fatal |
Options object:
tags- Object with key-value pairs for filteringunit- Unit of measurement (e.g., 'ms', 'items', 'percent')timestamp- Optional Date (defaults to now)
Properties
client- HTTP client instance for manual data submissionconfig- Current configuration objectcollectors- Active collector instancesstarted- Boolean indicating if agent is running
Support
Changelog
v1.0.15 (New Features)
7 new collectors, enhanced system metrics, COLLSCAN detection, sendBeacon transport, and worker thread offloading.
New Collectors
- DNS Timing (
DnsTimingCollector) - Wrapsdns.lookupanddns.resolveto measure DNS resolution latency. Tracks per-hostname timing, P95/max latency, and failure counts. Identifies slow resolvers in Docker/K8s environments. - Outbound HTTP (
DiagnosticsChannelCollector) - Uses Node.jsdiagnostics_channelAPI (Node 16+) to instrument outbound HTTP/HTTPS requests without monkey-patching. Captures timing breakdown (DNS, connect, TLS, TTFB), status codes, and error rates for external dependencies. - CPU Profiling (
CpuProfiler) - On-demand CPU profiling via the built-ininspectormodule. Automatically triggers when CPU exceeds a configurable threshold (default: 80%), captures a 10-second profile, and sends a summary of top functions by self-time. Configurable cooldown prevents over-profiling. - Deprecated API Detection (
DeprecatedApiCollector) - WrapsMongo.Collectionprototype methods andMeteor.callto count sync vs async invocations. Tracksfind().fetch()vsfetchAsync(),findOne()vsfindOneAsync(),insert/update/removevs async variants. Helps measure Meteor 3.x migration readiness. - Publication Efficiency (
PublicationTracer) - WrapsMeteor.publishto intercept returned cursors. Detects publications missing field projections (over-fetching) and those returning large document sets without limits. Reports per-publication call counts, document averages, and efficiency scores. - Environment Snapshots (
EnvironmentCollector) - Captures installed package versions (process.versions+package.json), Node.js flags, environment variable keys (not values), and OS metadata. Collected immediately on start, then refreshed every 30 minutes. - Vulnerability Scanning (
VulnerabilityCollector) - Runsnpm audit --jsonhourly (with 30s timeout). Parses both v6 and v7+ audit formats. Reports high/critical vulnerabilities with package name, severity, advisory title, and fix availability. Deduplicates unchanged results.
Enhanced System Metrics
- Event Loop Utilization (ELU) - 0-1 ratio of event loop busyness via
performance.eventLoopUtilization()(Node 14.10+) - V8 Heap Statistics - Per-heap-space breakdown (new_space, old_space, code_space, etc.) via
v8.getHeapStatistics()andv8.getHeapSpaceStatistics(). Includes native context count and detached context leak detection. - Process Resource Usage - User/system CPU time, voluntary/involuntary context switches, filesystem reads/writes via
process.resourceUsage() - Active Resources - Handle/request counts by type (Timer, TCPWrap, FSReqCallback, etc.) via
process.getActiveResourcesInfo()(Node 17+) for resource leak detection - Container Memory Limit - cgroup memory constraint via
process.constrainedMemory()(Node 19+) for containerized deployments - Agent Version -
agentVersionfield added to every system metrics payload for compatibility tracking
Method Tracer Enhancements
- COLLSCAN flagging - Slow queries are now flagged with
collscan: truewhenexplain()data indicates a full collection scan (no index used, ortotalDocsExamined > 0withtotalKeysExamined === 0). Applied both at initial detection time and retroactively after async explain completes. - Slow aggregation pipeline capture - Slow aggregation operations now include the sanitized pipeline stages in the slow query entry for debugging.
Client-Side Transport Improvements
sendBeaconprimary transport -ErrorTrackerandRUMClientnow usenavigator.sendBeacon()as the primary transport for small payloads (<60KB for errors, all RUM batches). This is truly fire-and-forget with zero async overhead — no promises, no callbacks, no event loop work. Falls back tofetchwithkeepalivefor large payloads or when sendBeacon returns false.- Public key via query param -
sendBeaconcannot set custom headers, so the public key is passed as?pk=query parameter (lazily cached URL). TheX-SkySignal-Public-Keyheader is still sent on fetch fallback for backward compatibility.
Batching & Infrastructure
- 7 new batch types in
SkySignalClient:dnsMetrics,outboundHttp,cpuProfiles,deprecatedApis,publications,environment,vulnerabilities— each with dedicated REST endpoints and payload keys - Worker thread pool (
WorkerPool+compressionWorker) - Optionalworker_threads-based compression offloading to prevent gzip work from blocking the host application's event loop. Lazy initialization, auto-restart on crash, and graceful main-thread fallback.
Configuration
- 18 new config fields added to
DEFAULT_CONFIGandvalidateConfig()for all new collectors:collectDnsTimings,dnsTimingsInterval,collectOutboundHttp,outboundHttpInterval,collectCpuProfiles,cpuProfileThreshold,cpuProfileDuration,cpuProfileCooldown,cpuProfileCheckInterval,collectDeprecatedApis,deprecatedApisInterval,collectPublications,publicationsInterval,collectEnvironment,environmentInterval,collectVulnerabilities,vulnerabilitiesInterval - All new collectors are enabled by default and use staggered startup to avoid CPU spikes at boot
v1.0.14 (Bug Fix)
- Silent production logging - Replaced bare
console.log()calls with debug-guarded_log()helpers across all collectors (HTTPCollector,DDPCollector,DDPQueueCollector,LiveQueriesCollector,MongoCollectionStatsCollector,BaseJobMonitor,SteveJobsMonitor,JobCollector). Previously, operational messages like "Batched 1 HTTP requests", "Sent 18 subscription records", and job lifecycle events were unconditionally printed to stdout regardless of thedebugsetting. All informational logs are now silent by default and only appear whendebug: trueis set in the agent configuration.
v1.0.13 (Bug Fix)
- Trace context isolation - Replaced shared
_currentMethodContextvariable with Node.jsAsyncLocalStorageto properly isolate method trace contexts across concurrent async operations. Fixes a bug where background job database queries (e.g.,jobs_data.findOneAsync()) would leak into unrelated Meteor method traces when both executed concurrently on the same event loop.
v1.0.12 (New Features & Bug Fixes)
- Change Streams support - Live query observer detection now identifies Change Stream drivers (Meteor 3.5+) alongside oplog and polling, with per-observer introspection instead of global heuristic
- Log collection - New
LogsCollectorcapturesconsole.*and MeteorLog.*output with structured metadata, configurable levels, and sampling support. Includes publicSkySignalAgent.addLog()API for programmatic log submission - Silent failure for optional packages - HTTP and Email package instrumentation no longer logs warnings when packages aren't installed; errors are suppressed to debug-only output (fixes #1)
- Client-side error tracking fix - Fixed 400 "Invalid JSON" response when the agent sends batched client errors to
/api/v1/errors. The server endpoint now correctly reads the pre-parsed request body and supports both batched{ errors: [...] }and single error formats (fixes #2)
v1.0.11 (New Feature)
- Added client IP address collection for enhanced user context in error tracking and performance correlation
v1.0.7 (Bug Fixes)
- Increased default timeout from 3000ms to 15000ms for API requests to handle slow networks
v1.0.4 (Rollback)
- Reverted to Meteor 2.16+ compatibility due to Node.js version issues with older Meteor versions (Only Meteor 3.x supports Node 20+)
v1.0.3 (Bug Fixes)
- Polyfill for
AbortSignal.timeout()to support older Node.js versions
v1.0.2 (Bug Fixes)
- Updated Meteor version compatibility to 2.16
v1.0.1 (Bug Fixes)
- Fixed incorrect default endpoint URL
v1.0.0 (Initial Release)
- Complete Method Tracing - Automatic instrumentation with operation-level profiling
- MongoDB Query Analysis - explain() support, N+1 detection, slow query analysis
this.unblock()Analysis - Optimization recommendations for blocking methods- DDP Connection Monitoring - Real-time WebSocket tracking with latency metrics
- MongoDB Pool Monitoring - Connection pool health, checkout times, queue tracking
- Live Query Monitoring - Oplog vs polling efficiency tracking
- Background Job Monitoring - Support for msavin:sjobs with extensible adapter system
- HTTP Request Monitoring - Automatic tracking of server HTTP requests
- Collection Stats - MongoDB collection size and index statistics
- App Version Tracking - Auto-detection from package.json with manual override
- Build Hash Tracking - Source map correlation via BUILD_HASH/GIT_SHA env vars
- Performance Safeguards - Memory limits, request timeouts, batch retries
- Real User Monitoring (RUM) - Client-side Core Web Vitals collection (LCP, FID, CLS, TTFB, FCP, TTI)
- PageSpeed-Style Warnings - Automatic performance threshold warnings in console
- SPA Route Tracking - Automatic performance collection on every route change
- Session Management - 30-minute sessions with localStorage persistence
- Browser Context Collection - Automatic device, browser, OS, network information
- User Correlation - Uses Meteor.userId() to correlate with server-side traces
- Fire-and-Forget HTTP - Reliable transmission with keepalive during page unload
- Configurable Sampling - Auto-detects environment (100% dev, 50% prod) or manual configuration
- web-vitals Integration - Uses Google's official Core Web Vitals library
- System metrics monitoring (CPU, memory, load average)
- HTTP client with batching and auto-flush
- Configurable collection intervals
- Basic error handling and retry logic
- Multi-tenant ready architecture