If you run into situation where you have to troubleshoot a JVM that consumes high CPU and you have absolutely no idea whats causing it, here are some very useful tricks that saved my life countless time.
top:
Most of us use Top often but there are some handy params that can be even more helpful.
First Do a top to get the Cpu Hungry process id.
#top top - 14:33:22 up 42 days, 7:11, 2 users, load average: 1.60, 1.62, 1.56 Tasks: 154 total, 1 running, 153 sleeping, 0 stopped, 0 zombie Cpu(s): 77.8%us, 0.8%sy, 0.0%ni, 21.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 8061328k total, 7545548k used, 515780k free, 134776k buffers Swap: 2097148k total, 162800k used, 1934348k free, 798096k cached</code> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12058 root 20 0 13.3g 4.7g 21m S 302.0 61.5 41219:31 java 18205 root 20 0 4273m 948m 6244 S 11.3 12.0 1704:03 java 12056 root 20 0 22032 1080 920 S 0.7 0.0 62:33.48 wrapper-linux-x 1 root 20 0 19356 1244 1052 S 0.0 0.0 0:03.22 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
Ok, clearly, the process with PID 12058 isn’t behaving properly.
Lets ask ‘top’ to list the child-process (one to one binding with each thread in JVM) for PID 12058
-H Flags turns on the Threads Toggle
-H : Threads toggle
Starts top with the last remembered ’H’ state reversed. When this toggle is On, all individual threads will be displayed. Other-
wise, top displays a summation of all threads in a process.
## top -p 12058 -H (-p only lists child process for 12058 ) [mule@FSASYDESBPROD02 ~]# top -p $PID -H top - 15:09:49 up 42 days, 7:48, 2 users, load average: 1.34, 1.50, 1.54 Tasks: 5359 total, 3 running, 5356 sleeping, 0 stopped, 0 zombie Cpu(s): 77.8%us, 1.2%sy, 0.0%ni, 20.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8061328k total, 7740576k used, 320752k free, 135436k buffers Swap: 2097148k total, 162652k used, 1934496k free, 833864k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 28255 root 20 0 13.3g 4.9g 21m R 98.3 63.3 10163:33 java 14064 root 20 0 13.3g 4.9g 21m R 97.7 63.3 14089:00 java 14062 root 20 0 13.3g 4.9g 21m R 97.4 63.3 14073:55 java 925 root 20 0 13.3g 4.9g 21m S 1.9 63.3 279:22.62 java 12060 root 20 0 13.3g 4.9g 21m S 0.6 63.3 108:39.51 java
Ok, so Thread $28255 & $14064 is doing a lot of stuff. How do we know what they are doing??
jstack in rescue !
"AsyncLoggerConfig-1" daemon prio=10 tid=0x00007f45ec2ab000 nid=0x2f27 waiting on condition [0x00007f45e1e22000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043) at com.lmax.disruptor.BlockingWaitStrategy.waitFor(BlockingWaitStrategy.java:45) at com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:55) at com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:123) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
In the jstack trace, the nid=0x2f27 is the hex of the pid we received before. So lets awk it!
bash# PID=12058 bash# NID=28255 bash# jstack $PID | awk '/ nid='"$(printf '%#x' $NID)"' /,/^$/'</code> "pool-1140-thread-1" prio=10 tid=0x00007f45706b4000 nid=0x6e5f runnable [0x00007f453247a000] java.lang.Thread.State: RUNNABLE at sun.reflect.Reflection.getCallerClass0(Native Method) at sun.reflect.Reflection.getCallerClass(Reflection.java:69) at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.logging.log4j.util.ReflectionUtil.getCallerClass(ReflectionUtil.java:123) at org.apache.logging.log4j.util.ReflectionUtil.getCallerClass(ReflectionUtil.java:230) at org.apache.logging.log4j.jul.AbstractLoggerAdapter.getContext(AbstractLoggerAdapter.java:34) at org.apache.logging.log4j.spi.AbstractLoggerAdapter.getLogger(AbstractLoggerAdapter.java:42) at org.apache.logging.log4j.jul.LogManager.getLogger(LogManager.java:89) at java.util.logging.LogManager.demandLogger(LogManager.java:430) at java.util.logging.Logger.demandLogger(Logger.java:346) at java.util.logging.Logger.getLogger(Logger.java:393) at com.orientechnologies.common.log.OLogManager.log(OLogManager.java:103) at com.orientechnologies.common.log.OLogManager.debug(OLogManager.java:128) at com.orientechnologies.orient.client.remote.OStorageRemote.useNewServerURL(OStorageRemote.java:1703) - locked (a java.util.ArrayList) at com.orientechnologies.orient.client.remote.OStorageRemote.getAvailableNetwork(OStorageRemote.java:1866) at com.orientechnologies.orient.client.remote.OStorageRemote.openRemoteDatabase(OStorageRemote.java:1608) at com.orientechnologies.orient.client.remote.OStorageRemote.open(OStorageRemote.java:216) at com.orientechnologies.orient.client.remote.OStorageRemoteThread.open(OStorageRemoteThread.java:87) at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:244) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool$DatabaseDocumentTxPolled.internalOpen(OPartitionedDatabasePool.java:137) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.openDatabase(OPartitionedDatabasePool.java:331) at com.orientechnologies.orient.core.db.OPartitionedDatabasePool.acquire(OPartitionedDatabasePool.java:304) at com.tinkerpop.blueprints.impls.orient.OrientBaseGraph.(OrientBaseGraph.java:166) at com.tinkerpop.blueprints.impls.orient.OrientGraphNoTx.(OrientGraphNoTx.java:54) at com.tinkerpop.blueprints.impls.orient.OrientGraphFactory.getNoTx(OrientGraphFactory.java:93) at au.com.foxsports.subscription.service.SubscriptionServiceImpl.getSubscribersWithRetry(SubscriptionServiceImpl.java:760) at au.com.foxsports.subscription.service.SubscriptionServiceImpl.getSubscribers(SubscriptionServiceImpl.java:739) at au.com.foxsports.subscription.service.SubscriptionServiceImpl$1.call(SubscriptionServiceImpl.java:945) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)
There you go! You can see the 3rd party system eating up the CPU ! In my case its Orientdb. Yes, we faced a lot of issue trying to implement a graph model in orientdb, but thats a story for another day.