Troubleshooting JVM: JStack, Heap Dumps and JFR

Overview

Java Virtual Machine (JVM) troubleshooting requires systematic data collection and analysis to identify performance bottlenecks, memory leaks, and threading issues. This page provides practical approaches for collecting diagnostic dumps and leveraging them for effective problem resolution in production environments.

Prerequisites and Setup

Before beginning troubleshooting activities, ensure your environment is properly configured for diagnostic data collection.
Enable core dump generation with ulimit -c unlimited and consider using the -XX:+HeapDumpOnOutOfMemoryError flag to automatically capture heap dumps during memory errors.
For continuous monitoring, set up Java Flight Recorder with minimal overhead using continuous recording mode.

Thread Dump Collection with Jstack

Finding the Java Process

First, identify the target Java process using one of the following methods:

Bash
    
​x
 
# List all Java processesjps -l​# Alternative method using psps -ef | grep java
Copy

Collecting Thread Dumps

Thread dumps provide snapshots of all active threads and their current state. The jstack utility is the primary tool used for collecting thread dumps.

Bash
    
 
# Basic thread dump collectionjstack <pid>​# Thread dump with extended lock informationjstack -l <pid>​# Force thread dump for hung processes (Linux/Solaris only)jstack -F <pid>
Copy

For modern JDK versions, jcmd is the recommended alternative to jstack due to enhanced diagnostics and reduced performance overhead.

Bash
    
 
# Using jcmd for thread dumpsjcmd <pid> Thread.print
Copy

Alternative Collection Methods

The thread dumps can also be generated through:

Java VisualVM graphical interface
Java Mission Control (JMC)
Programmatically using Thread.getAllStackTraces()
Sending QUIT signal (Ctrl+\ on Unix systems)

Heap Dump Collection

Using jcmd (Recommended over jmap)

The jcmd utility provides the most reliable method for heap dump generation.

Bash
    
 
# Generate heap dump with jcmdjcmd <pid> GC.heap_dump /path/to/heapdump.hprof​# Generate heap dump with only live objectsjcmd <pid> GC.heap_dump -live /path/to/heapdump.hprof
Copy

Using jmap

While jmap is available, it's considered experimental and unsupported in newer JDK versions.

Bash
    
 
# Basic heap dump with jmapjmap -dump:format=b,file=/tmp/heapdump.hprof <pid>​# Heap dump with only reachable objectsjmap -dump:live,format=b,file=/tmp/heapdump.hprof <pid>
Copy

Using Java VisualVM

Java VisualVM provides a graphical interface for heap dump collection. Connect to the target process and use the "Heap Dump" button in the application tab. The dump can be analyzed immediately or saved for later analysis.

Java Flight Recorder (JFR) Collection (Highly Recommended)

Starting JFR at Application Startup

Enable JFR from application launch for comprehensive profiling.

Bash
    
 
# Basic JFR recording at startupjava -XX:StartFlightRecording=filename=recording.jfr,duration=60s MyApp​# JFR with profiling settingsjava -XX:StartFlightRecording=filename=recording.jfr,duration=60s,settings=profile MyApp​# Continuous recording with size and time limitsjava -XX:StartFlightRecording=filename=recording.jfr,maxage=4h,maxsize=400MB MyApp
Copy

Runtime JFR Collection

For running applications, use jcmd to control JFR recordings.

Bash
    
 
# Start a JFR recordingjcmd <pid> JFR.start name=myrecording duration=120s filename=recording.jfr​# Start with profiling settings for detailed analysisjcmd <pid> JFR.start name=detailed duration=300s filename=detailed.jfr settings=profile​# Stop a specific recordingjcmd <pid> JFR.stop name=myrecording​# Dump current recording datajcmd <pid> JFR.dump name=myrecording filename=dump.jfr
Copy

JFR Configuration Options

JFR supports various configuration parameters for customized data collection.

duration: Maximum recording duration
maxage: Maximum age of recorded data to keep
maxsize: Maximum size of recording data
settings: Predefined configuration (default, profile, custom)

Troubleshooting Strategies Using Dump Analysis

Thread Dump Analysis Strategies

Identifying Deadlocks

Thread dumps automatically detect and report deadlocks. Look for threads in BLOCKED state and examine the lock chain to identify circular dependencies.

Bash
    
 
"Thread-1" #10 prio=5 os_prio=0 tid=0x... nid=0x... waiting for monitor entry   java.lang.Thread.State: BLOCKED (on object monitor)
Copy

Analyzing Thread States

Focus on thread state distribution to identify performance issues:

RUNNABLE: Threads actively executing or ready to execute
BLOCKED: Threads waiting for monitor locks
WAITING: Threads waiting indefinitely for another thread
TIMED_WAITING: Threads waiting for a specified period

Thread Contention Analysis

Examine threads waiting on the same monitors to identify contention hotspots. Multiple threads blocked on identical resources indicate synchronization bottlenecks.

Heap Dump Analysis Strategies

Memory Leak Detection

Use JMC, JVisual VM, or similar tools for automated leak detection. The automated analysis feature generates leak suspect reports with minimal user intervention:

Load JFR dump in specified tool
Click "Leak Suspects" for automated analysis
Review suspect objects and their reference chains
Identify objects that should have been garbage collected

Object Retention Analysis

Analyze object retention patterns by examining:

Largest objects by size
Object count by class
Reference chains preventing garbage collection
Duplicate strings and arrays

Memory Usage Optimization

Focus on:

Classes consuming the most memory
Objects with excessive duplication
Large collections that may need optimization
String internalization opportunities

JFR Analysis Strategies

Performance Bottleneck Identification: JFR provides comprehensive performance insights through Java Mission Control:

Method Profiling: Identify CPU-intensive methods through stack trace sampling. Look for methods consuming disproportionate CPU time.
Memory Allocation Analysis: Track object allocation patterns to identify memory pressure sources. Focus on allocation rate and object types.
Garbage Collection Analysis: Monitor GC frequency, duration, and impact on application performance. Excessive GC activity indicates memory tuning opportunities.

Threading and Concurrency Issues

JFR captures detailed threading information:

Thread contention events showing lock competition
Thread parking and blocking events
Context switching frequency and overhead
Thread pool utilization patterns

I/O and Network Performance

Monitor I/O operations and network activity:

File I/O latency and throughput
Network connection patterns
Database query performance
Resource utilization trends

JVM Diagnostics: Performance Impact and Pauses

There are significant performance impacts and application pauses when collecting JVM diagnostic dumps. Each method has different overhead characteristics and pause behaviors that you need to understand before implementing them in production environments.

Thread Dump Collection Impact

jstack Performance Impact

Thread dump collection using jstack causes stop-the-world pauses where all application threads are suspended. This occurs because jstack requires all threads to reach a safepoint before the dump can be generated. The pause duration is typically brief (milliseconds to seconds) but can vary depending on application complexity and thread count.

The jstack utility operates through the HotSpot Serviceability Agent, which suspends the entire target process during execution. This means not only are application threads stopped, but the whole process becomes unresponsive during dump collection.

jcmd Thread Dump Performance

The jcmd utility is recommended over jstack for modern JDK versions due to enhanced diagnostics and reduced performance overhead. While jcmd still requires safepoint synchronization, it generally has lower impact than legacy tools. The performance difference becomes more pronounced in high-throughput applications where even brief pauses can affect response times.

Heap Dump Collection Impact

Stop-the-World Behavior

Heap dump collection represents one of the most significant performance impacts among diagnostic methods. Heap dumps are stop-the-world operations that pause all application activity during collection. This pause can last from seconds to multiple minutes depending on heap size.

jmap vs jcmd Performance Comparison

Both jmap and jcmd cause application pauses during heap dump generation, but with different characteristics:

jmap: Uses the Serviceability Agent approach, which suspends the entire target process. The -heap option specifically causes stop-the-world pauses
jcmd: Performs heap dumps in-process through the Dynamic Attach Mechanism, creating an AttachListener thread while terminating other threads
Production Environment Risks

In production environments, heap dump collection can cause health check failures and service termination. Large heap sizes (100GB+) can result in multi-minute pauses that trigger monitoring systems to restart applications. Additionally, heap dumps require substantial disk space equal to the heap size, potentially filling disk partitions if insufficient storage is available.

Java Flight Recorder (JFR) Overhead

Minimal Production Impact

JFR represents the lowest overhead option among all diagnostic methods. The overhead for standard profiling recordings is less than 2 percent for most applications. Running with continuous recording generally has no measurable performance impact, making it suitable for production environments.

Configuration Considerations

The primary performance consideration with JFR involves Heap Statistics events, which are disabled by default. When enabled, these events trigger old generation garbage collections at recording start and end, adding pause times that may impact latency-sensitive applications.

Safepoint Pause Characteristics

Time-to-Safepoint (TTSP) Issues

Application pauses during diagnostic collection depend heavily on time-to-safepoint behavior. Some threads may take seconds or even minutes to reach Safepoints, especially in applications with long-running loops or extensive native code execution. This can result in diagnostic operations taking much longer than expected.

Safepoint Operation Types

Different diagnostic operations trigger various Safepoint types:

Thread dumps: Require global Safepoints for consistent thread state capture
Heap dumps: Need Safepoints for heap consistency during memory snapshot
JFR events: Most events require minimal Safepoint coordination

Production Environment Recommendations

Risk Mitigation Strategies

To minimize production impact when collecting diagnostic data:

Pre-allocate sufficient disk space for heap dumps to prevent storage exhaustion
Use cloud resources with adequate memory and storage for large heap analysis
Schedule collection during maintenance windows when possible
Implement automated collection with -XX:+HeapDumpOnOutOfMemoryError for critical failures

Tool Selection Guidelines

Based on performance impact considerations:

JFR: Preferred for continuous production monitoring due to minimal overhead
jcmd: Recommended over legacy tools for better performance and enhanced features
jstack/jmap: Use sparingly in production due to stop-the-world behavior

Best Practices and Recommendations

Production Environment Considerations

When collecting diagnostic data in production:

Use continuous JFR recordings with circular buffers to minimize storage impact
Enable automatic heap dump generation on OutOfMemoryError
Maintain verboseGC logging for historical memory analysis
Archive diagnostic data regularly during maintenance windows

Performance Impact Considerations

Minimize diagnostic overhead:

JFR overhead is typically less than 2% for standard profiling
Heap dump collection can cause brief application pauses
Thread dump collection has minimal performance impact
Avoid enabling heap statistics in latency-sensitive environments

Last updated on Jun 17, 2025

Was this page helpful?