Skip to main content

High CPU Usage

High CPU usage in Mango can degrade system responsiveness, delay data collection, and cause missed polling intervals. This page covers how to identify the root cause and resolve CPU-related performance issues.

Symptoms

  • The Mango server's CPU utilization is consistently above 80-90%.
  • The Mango UI becomes sluggish or unresponsive.
  • Data source polling falls behind schedule, with poll durations exceeding the configured update period.
  • The log shows warnings about overdue polls or thread pool exhaustion.
  • Point value updates are delayed, causing stale readings on watch lists and dashboards.
  • The server's load average is significantly higher than the number of CPU cores.

Common Causes

1. Polling Too Frequently

Data sources configured with very short update periods (e.g., 100ms or 500ms) across many points consume significant CPU time for each poll cycle. If the poll-process-store cycle cannot complete before the next poll begins, CPU usage escalates as polls queue up.

2. Too Many Data Sources Polling Simultaneously

When many polling data sources share the same update period and are not quantized, they may all fire at the same instant, creating CPU spikes that overwhelm the high-priority thread pool.

3. Meta Data Source Script Execution

Meta data source points execute JavaScript or Groovy scripts on every update of their source points. Complex scripts that perform extensive calculations, string manipulation, or point queries can consume significant CPU, especially when triggered at high frequency.

4. Garbage Collection Pressure

When the JVM heap is nearly full, the garbage collector runs frequently and aggressively, consuming CPU cycles. This manifests as high CPU with corresponding memory pressure symptoms. See Out of Memory Errors for memory-specific guidance.

5. Thread Pool Exhaustion

Mango uses thread pools (High, Medium, Low priority) to execute tasks. If all threads in a pool are occupied, new tasks queue up, and the system spends CPU time managing the queue rather than processing data.

6. Large Dashboard or Watch List Rendering

Dashboards with many live-updating components, large watch lists, or complex chart configurations can generate continuous REST API and WebSocket traffic that consumes CPU on both the server and client.

7. Event Detector Evaluation Storms

A large number of event detectors re-evaluating simultaneously (e.g., after a data source restart) can create transient CPU spikes.

8. Database Query Load

Inefficient or frequent database queries (e.g., from custom SQL data sources, REST API calls requesting large datasets, or the purge process running on a very large database) can contribute to high CPU.

Diagnosis

Identify the Process

First, confirm that the Mango Java process is the source of high CPU:

# Show CPU usage by process
top -b -n 1 | head -20

# Or filter for Java/Mango specifically
ps aux --sort=-%cpu | grep java

Identify Hot Threads

Use jstack to capture a thread dump of the running Mango JVM:

# Find the Mango Java PID
jps -l | grep mango

# Capture thread dump
jstack <PID> > /tmp/mango_threads_1.txt

# Wait 10 seconds, capture another
sleep 10
jstack <PID> > /tmp/mango_threads_2.txt

Compare the two thread dumps to identify threads that are consistently in RUNNABLE state executing the same code paths. Common patterns:

  • Threads stuck in script execution (Nashorn, Graal.js, Groovy) indicate expensive meta data source or scripting data source scripts.
  • Threads in Modbus/BACnet/OPC UA communication classes indicate slow device responses causing poll timeouts.
  • Threads in GC operations indicate memory pressure.
  • Many threads in BLOCKED state waiting for locks indicate contention.

Use JVisualVM for Live Profiling

For a graphical analysis with more detail:

jvisualvm --openpid <PID>

In JVisualVM:

  • CPU Sampler: Identifies which methods are consuming the most CPU time.
  • Thread view: Shows thread states over time, making it easy to spot thread pool saturation.
  • Monitor tab: Shows overall CPU usage, GC activity, and thread count.

Check Internal Metrics

If the Internal Metrics data source is enabled, review these data points:

  • High Priority Thread Pool Active Count: Number of currently active high-priority threads. If this consistently equals the pool maximum, the pool is saturated.
  • Medium Priority Thread Pool Active Count: Same for medium-priority tasks.
  • Poll Duration points for individual data sources: If poll duration exceeds the update period, the data source is not keeping up.

Check Thread Pool Settings

Review the current thread pool configuration in mango.properties:

# High priority thread pool (used for data collection)
high.prio.pool.size.core=
high.prio.pool.size.max=

# Medium priority thread pool
med.prio.pool.size.core=
med.prio.pool.size.max=

Solutions

Solution 1: Increase Polling Intervals

Reduce polling frequency for data sources that do not require sub-second updates:

  • Increase the Update period from milliseconds to seconds or minutes as appropriate.
  • Use Quantize to distribute polls evenly within the period rather than clustering at startup.
  • Consider using Interval logging to decouple data collection frequency from storage frequency.

Solution 2: Stagger Data Source Polling

If multiple data sources poll at the same interval, stagger their start times:

  • Enable Quantize on polling data sources with different period offsets.
  • Use different prime-number polling intervals (e.g., 7 seconds, 11 seconds, 13 seconds) to naturally distribute polls.

Solution 3: Optimize Meta Data Source Scripts

For expensive scripts:

  • Simplify calculations: Replace complex script logic with simpler algorithms.
  • Reduce context points: Only include the data points that the script actually needs.
  • Increase the update event: Change from updating on every source point change to a scheduled interval.
  • Profile the script: Add timing to identify which operations are slow.

Solution 4: Increase Thread Pool Sizes

If thread pools are saturated but the server has available CPU cores:

# In mango.properties, increase pool sizes
# High priority (data collection) - default varies by CPU count
high.prio.pool.size.core=50
high.prio.pool.size.max=100

# Medium priority
med.prio.pool.size.core=30
med.prio.pool.size.max=60
caution

Increasing thread pool sizes without addressing the underlying cause can shift the bottleneck to another resource (memory, I/O, or network). Only increase pool sizes when you have confirmed that threads are the limiting factor and CPU cores are available.

Solution 5: Reduce Dashboard and Watch List Load

  • Limit the number of live-updating components on a single dashboard page.
  • Use rollups for chart data rather than displaying raw values over long time ranges.
  • Reduce the refresh rate for watch lists that do not require real-time updates.
  • Close browser tabs with active Mango dashboards that are not being viewed.

Solution 6: Address GC Pressure

If high CPU correlates with frequent garbage collection:

  • Increase the JVM heap size (see Out of Memory Errors).
  • Switch to a more efficient garbage collector:
    # G1GC is recommended for Mango
    -XX:+UseG1GC -XX:MaxGCPauseMillis=200
  • Reduce point value cache sizes to lower heap pressure.

Solution 7: Optimize Database Queries

  • Ensure MySQL or MariaDB has appropriate indexes for Mango's query patterns.
  • Increase the MySQL buffer pool size (innodb_buffer_pool_size) to reduce disk I/O.
  • Schedule purge operations during off-peak hours.
  • Use db.useMetrics=true in mango.properties to log slow queries and identify bottlenecks.

Prevention

  • Baseline your CPU usage when the system is first deployed and operating normally. This gives you a reference point for identifying abnormal behavior later.
  • Monitor thread pool utilization with Internal Metrics and set alarm thresholds at 80% of pool capacity.
  • Review meta data source scripts before deployment. Test them with representative data volumes and measure execution time.
  • Set polling intervals based on actual data change rates, not the fastest rate the protocol supports. A temperature sensor that changes once per minute does not need to be polled every second.
  • Use quantized polling to distribute load evenly across the polling period.
  • Size the server hardware appropriately: plan for at least 2 CPU cores per 1,000 fast-polling data points as a rough guideline, with additional cores for UI serving and scripting.