Sunday, February 8, 2015

WCS- Performance Tuning and Troubleshooting

This post includes tips and recommendations for tuning and troubleshooting performance on WebSphere Application Server based products. IBM WebSphere Application Server (WAS) is a market-leading Application Server for running Java applications. It is the flagship product within IBM's WebSphere product line and it is the base runtime product for several IBM products. These products range from Mobile servers (IBM Worklight Server), Integration servers (WebSphere Enterprise Service Bus), Operational Decision Management servers (WebSphere ILOG JRules), to Business Process Management servers (IBM BPM Advanced), among many others. Because these products share the same runtime, the tuning tips and recommendations in this post apply to all of them.
When to Tune for Performance?
Performance tuning needs to occur early in your project. There should be time allocated for load and stress testing. Load testing involves testing the normal application load in terms of expected concurrent request and type of requests. Stress testing involves testing beyond the expected load level (e.g. 125% to 150% of expected load) and also includes testing over extended periods of time (e.g. 4 to 8 hours). During the duration of these tests the goal is to monitor and measure how the application performance and server vitals behave. On a normal load test there should not be long sustained periods of high CPU usage. What constitutes an acceptable high usage is relative. A conservative CPU usage is typically under 50%, where as an aggressive CPU usage may be as high as 80%, but it boils down to the criticality of the application performance and the risk that can be incurred.
Java Virtual Machine (JVM) Tuning
During load tests and specifically for Java applications, it is extremely important to monitor for Java heap size usage. Every WebSphere Application Server instance runs in its own JVM. Default JVM settings are usually good enough for small volume applications. JVM settings will likely need to be tuned to support a combination of the following: large number of applications deployed, high volume of transactions to be handled concurrently and/or large size requests. There are two main areas to watch for when it comes to JVM heap size: how quickly the heap size grows and how long it takes to perform a garbage collection.
Tune JVM Minimum and Maximum heap size
As the number of deployed applications grows, the use of heap size will increase and may exceed the maximum heap size. This can lead to potential problems: garbage collections will occur more frequently and will take longer time; our out of memory errors can occur if there is not enough memory to allocate in the heap. The heap size growth is also impacted by the expected number of concurrent requests and by the number or size of objects allocated throughout processing. Before increasing maximum heap size, it is important to measure whether the increase is needed due to legitimate application growth, or caused by potential memory leaks. If it is due to legitimate growth, the maximum heap size should be incremented not to exceed 50% of overall physical memory on the server. This measure may vary depending on what other processes are running on the server (e.g. other JVMs) and how much memory the OS allocates for system processes. They main goal is to avoid paging to disk as much as possible. Paging memory to disk translates into larger garbage collection times and consequently slower application response times.
The default initial heap size for WAS is 50MB and default maximum size is 256MB. In most cases the initial heap size should be set lower than the maximum heap size, however in cases where optimal performance is a priority specifying the same value for the initial and maximum heap size is recommended. The JVM heap size settings can be changed from the administrative console:Servers > Server Types > WebSphere application servers > server_name > Java and process management > Process definition > Java Virtual machine.
Review Garbage collection policy
The IBM JVM in WAS supports four garbage collection policies. Starting with version 8.0, gencon is the default policy. From personal experience, gencon is the policy that yields the best throughput and overall smaller collection pause times. Of course, this may vary depending on the specifics needs of your application, but I normally recommend using gencon as a starting point.
Beyond JVM Tuning
JVM tuning is only one area of the tuning that needs to be done in WAS. Depending on the nature of the application, here are other settings that may need tuning.
Monitor and tune Thread pool sizes
Thread pool settings can be changed from the administration console at: Servers > Server Types > WebSphere application servers > server_name > Thread Pools. A thread pool maximum size can be increased to improve concurrent processing. Depending on the nature of the application, different thread pools are more relevant than others. For instance, Web Container thread pools are more relevant for web applications. The Default thread pool is another relevant thread pool that is used by most applications. The actual number of threads allocated to each thread pool should be monitored, to confirm that there is a legitimate need to increase its size.
Monitor and tune JDBC Connection pool sizes
If your application connects to a JDBC data source, connection pool sizes come into play. These can be changed from the administration console: Resources > JDBC > Data sources > data_source_name > Connection pool properties.
Monitoring Tools
Tivoli Performance Viewer
Ideally your organization should use a robust monitoring solution to monitor for server and JVM health indicators and proactively alert when certain thresholds are reached. If your organization does not provide such tools, developers can use the Tivoli Performance Viewer included in WAS. The Performance Viewer allows monitoring for CPU usage, Java heap size usage, thread pool sizes, JDBC connection pool sizes, among many other indicators. The Performance Viewer is accessible from the administration console at: Monitoring and tuning > Performance Viewer > Current activity > server_name. You can then expand the different sections of interest and check on the indicators to be monitored. In the screenshot below we are monitoring for Heap Size, Process CPU Usage, Web Container's thread pool size and WPSDB JDBC data source's pool size.

WAS Performance Management Tuning Toolkit
Description: http://cdn2.hubspot.net/hub/360402/file-557907061-png/blog-files/110813_1625_WebSphereAp1.png?t=1423174716937The Performance Viewer can be helpful to monitor for a small set of indicators within a single server (e.g. during development). However when you need to monitor several indicators across multiple servers on a cluster the Performance Viewer is difficult to navigate. A better tool for monitoring multiple servers at the same time is the "WAS Performance Management Tuning Toolkit".
This is a very useful tool that connects to the deployment manager of a cluster. Once connected you have access to all of the performance indicators available on the Performance Viewer. It is much easier to navigate and switch back and forth different servers and indicators.
Description: http://cdn2.hubspot.net/hub/360402/file-557907116-png/blog-files/110813_1625_WebSphereAp2.png?t=1423174716937
Troubleshooting Application Performance Problems
Here are a few tips and artifacts that can be used for troubleshooting application performance problems.
Enable verbose GC to identify frequency and time spent during garbage collection
The verbose output of the Garbage Collector can be used to analyze problems. To enable verbose GC output login to the Administrative console and navigate to: Servers > Server Types > WebSphere application servers > server_name > Java and process management and check "Verbose garbage collection". Verbose GC output will then be captured on the native_stderr.log file on the server logs. Verbose GC output can be analyzed with the "IBM Pattern Modeling and Analysis Tool for Java Garbage Collector".
This tool can provide useful information such as whether the Java Heap size was exhausted, number of garbage collections, pause time used for garbage collections. The tool also recommends configuration changes. The key items to look for are: garbage collections that are taking too long to run and whether they are happening too frequently. This analysis can also help measure the effect of different heap size configurations during load testing.
Capture Javacore files to look for unexpected blocked threads
Javacore files are analysis files that are generated either captured manually or automatically when system problems (e.g. out of memory, deadlocks, etc.) occur. These javacore files include environment information, loaded libraries, snapshot information about all running threads, garbage collection history and deadlocks detected.
Javacore files are created by default on the <WAS_install-root>/profiles/<profile> directory and they are named as follows: javacore.YYYYMMDD.HHMMSS.PID.txt
A javacore file can be captured in Linux using this command: kill -3 <pid of java process>
A useful tool to analyze these files is the "IBM Thread and Monitor Dump Analyzer for Java".
Typically you should capture multiple javacore files when symptoms of a problem are happening or about to happen (e.g. during a load test). This tool allows you to compare thread usage among the javacore files. This allows identifying blocked threads and identifying what was blocking these threads. In some cases, the blocked threads can be expected (e.g. waiting for a HTTP response), but in other cases the stack trace may reveal unexpected blocked threads.
Description: http://cdn2.hubspot.net/hub/360402/file-557907186-png/blog-files/110813_1625_WebSphereAp3.png?t=1423174716937
Analyze heap dumps to look for potential memory growth
Heap dumps are snapshots of the memory of a Java process. Heap dumps are generated by default in WAS after an OutOfMemoryError occurs. In WAS they are saved as .phd (Portable Heap Dump) files. A heap dump can be useful to identify potential memory leaks.
The IBM HeapAnalyzer can be used to analyze .phd files.
Keep in mind that phd files can be very large files depending on the heap size. You will likely need to increase the maximum heap size parameter when running the HeapAnalyzer. The heap size needs to be at least the size of the .phd file.
The HeapAnalyzer will show the allocation of heap for each object class and it will identify potential memory leaks. This tool does not show variable values, which makes it hard to isolate culprits when you have multiple applications deployed that use similar object types.

Description: http://cdn2.hubspot.net/hub/360402/file-557907276-png/blog-files/110813_1625_WebSphereAp4.png?t=1423174716937