Finding the Bad Apple in JVM

If you run into situation where you have to troubleshoot a JVM that consumes high CPU and you have absolutely no idea whats causing it, here are some very useful tricks that saved my life countless time.

top:
Most of us use Top often but there are some handy params that can be even more helpful.

First Do a top to get the Cpu Hungry process id.

#top
top - 14:33:22 up 42 days, 7:11, 2 users, load average: 1.60, 1.62, 1.56
Tasks: 154 total, 1 running, 153 sleeping, 0 stopped, 0 zombie
Cpu(s): 77.8%us, 0.8%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8061328k total, 7545548k used, 515780k free, 134776k buffers
Swap: 2097148k total, 162800k used, 1934348k free, 798096k cached</code>

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12058 root 20 0 13.3g 4.7g 21m S 302.0 61.5 41219:31 java
18205 root 20 0 4273m 948m 6244 S 11.3 12.0 1704:03 java
12056 root 20 0 22032 1080 920 S 0.7 0.0 62:33.48 wrapper-linux-x
1 root 20 0 19356 1244 1052 S 0.0 0.0 0:03.22 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

Ok, clearly, the process with PID 12058 isn’t behaving properly.

Lets ask ‘top’ to list the child-process (one to one binding with each thread in JVM) for PID 12058

-H Flags turns on the Threads Toggle
-H : Threads toggle
Starts top with the last remembered ’H’ state reversed. When this toggle is On, all individual threads will be displayed. Other-
wise, top displays a summation of all threads in a process.

## top -p 12058 -H (-p only lists child process for 12058 )

&nbsp;

[mule@FSASYDESBPROD02 ~]# top -p $PID -H
top - 15:09:49 up 42 days, 7:48, 2 users, load average: 1.34, 1.50, 1.54
Tasks: 5359 total, 3 running, 5356 sleeping, 0 stopped, 0 zombie
Cpu(s): 77.8%us, 1.2%sy, 0.0%ni, 20.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 8061328k total, 7740576k used, 320752k free, 135436k buffers
Swap: 2097148k total, 162652k used, 1934496k free, 833864k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28255 root 20 0 13.3g 4.9g 21m R 98.3 63.3 10163:33 java
14064 root 20 0 13.3g 4.9g 21m R 97.7 63.3 14089:00 java
14062 root 20 0 13.3g 4.9g 21m R 97.4 63.3 14073:55 java
925 root 20 0 13.3g 4.9g 21m S 1.9 63.3 279:22.62 java
12060 root 20 0 13.3g 4.9g 21m S 0.6 63.3 108:39.51 java

Ok, so Thread $28255 & $14064 is doing a lot of stuff. How do we know what they are doing??

jstack in rescue !

"AsyncLoggerConfig-1" daemon prio=10 tid=0x00007f45ec2ab000 nid=0x2f27 waiting on condition [0x00007f45e1e22000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45)
at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55)
at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

In the jstack trace, the nid=0x2f27 is the hex of the pid we received before. So lets awk it!

bash# PID=12058
bash# NID=28255
bash# jstack $PID | awk '/ nid='"$(printf '%#x' $NID)"' /,/^$/'</code>

"pool-1140-thread-1" prio=10 tid=0x00007f45706b4000 nid=0x6e5f runnable [0x00007f453247a000]
java.lang.Thread.State: RUNNABLE
at sun.reflect.Reflection.getCallerClass0(Native Method)
at sun.reflect.Reflection.getCallerClass(Reflection.java:69)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.logging.log4j.util.ReflectionUtil.getCallerClass(ReflectionUtil.java:123)
at org.apache.logging.log4j.util.ReflectionUtil.getCallerClass(ReflectionUtil.java:230)
at org.apache.logging.log4j.jul.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:34)
at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42)
at org.apache.logging.log4j.jul.LogManager.getLogger(LogManager.java:89)
at java.util.logging.LogManager.demandLogger(LogManager.java:430)
at java.util.logging.Logger.demandLogger(Logger.java:346)
at java.util.logging.Logger.getLogger(Logger.java:393)
at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:103)
at com.orientechnologies.common.log.OLogManager.debug(OLogManager.java:128)
at com.orientechnologies.orient.client.remote.OStorageRemote.useNewServerURL(OStorageRemote.java:1703)
- locked (a java.util.ArrayList)
at com.orientechnologies.orient.client.remote.OStorageRemote.getAvailableNetwork(OStorageRemote.java:1866)
at com.orientechnologies.orient.client.remote.OStorageRemote.openRemoteDatabase(OStorageRemote.java:1608)
at com.orientechnologies.orient.client.remote.OStorageRemote.open(OStorageRemote.java:216)
at com.orientechnologies.orient.client.remote.OStorageRemoteThread.open(OStorageRemoteThread.java:87)
at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:244)
at com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPolled.internalOpen(OPartitionedDatabasePool.java:137)
at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.openDatabase(OPartitionedDatabasePool.java:331)
at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.acquire(OPartitionedDatabasePool.java:304)
at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.(OrientBaseGraph.java:166)
at com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx.(OrientGraphNoTx.java:54)
at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getNoTx(OrientGraphFactory.java:93)
at au.com.foxsports.subscription.service.SubscriptionServiceImpl.getSubscribersWithRetry(SubscriptionServiceImpl.java:760)
at au.com.foxsports.subscription.service.SubscriptionServiceImpl.getSubscribers(SubscriptionServiceImpl.java:739)
at au.com.foxsports.subscription.service.SubscriptionServiceImpl$1.call(SubscriptionServiceImpl.java:945)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

There you go! You can see the 3rd party system eating up the CPU ! In my case its Orientdb. Yes, we faced a lot of issue trying to implement a graph model in orientdb, but thats a story for another day.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s