This
post includes tips and recommendations for tuning and troubleshooting
performance on WebSphere Application Server based products. IBM WebSphere
Application Server (WAS) is a market-leading Application Server for running
Java applications. It is the flagship product within IBM's WebSphere product
line and it is the base runtime product for several IBM products. These
products range from Mobile servers (IBM Worklight Server), Integration servers
(WebSphere Enterprise Service Bus), Operational Decision Management servers
(WebSphere ILOG JRules), to Business Process Management servers (IBM BPM
Advanced), among many others. Because these products share the same runtime,
the tuning tips and recommendations in this post apply to all of them.
When to Tune for Performance?
Performance
tuning needs to occur early in your project. There should be time allocated for
load and stress testing. Load testing involves testing the normal application
load in terms of expected concurrent request and type of requests. Stress
testing involves testing beyond the expected load level (e.g. 125% to 150% of
expected load) and also includes testing over extended periods of time (e.g. 4
to 8 hours). During the duration of these tests the goal is to monitor and measure
how the application performance and server vitals behave. On a normal load test
there should not be long sustained periods of high CPU usage. What constitutes
an acceptable high usage is relative. A conservative CPU usage is typically
under 50%, where as an aggressive CPU usage may be as high as 80%, but it boils
down to the criticality of the application performance and the risk that can be
incurred.
Java Virtual Machine (JVM) Tuning
During
load tests and specifically for Java applications, it is extremely important to
monitor for Java heap size usage. Every WebSphere Application Server instance
runs in its own JVM. Default JVM settings are usually good enough for small
volume applications. JVM settings will likely need to be tuned to support a
combination of the following: large number of applications deployed, high
volume of transactions to be handled concurrently and/or large size requests.
There are two main areas to watch for when it comes to JVM heap size: how
quickly the heap size grows and how long it takes to perform a garbage
collection.
Tune JVM Minimum and Maximum heap size
As
the number of deployed applications grows, the use of heap size will increase
and may exceed the maximum heap size. This can lead to potential problems:
garbage collections will occur more frequently and will take longer time; our
out of memory errors can occur if there is not enough memory to allocate in the
heap. The heap size growth is also impacted by the expected number of
concurrent requests and by the number or size of objects allocated throughout
processing. Before increasing maximum heap size, it is important to measure
whether the increase is needed due to legitimate application growth, or caused
by potential memory leaks. If it is due to legitimate growth, the maximum heap
size should be incremented not to exceed 50% of overall physical memory on the
server. This measure may vary depending on what other processes are running on
the server (e.g. other JVMs) and how much memory the OS allocates for system
processes. They main goal is to avoid paging to disk as much
as possible. Paging memory to disk translates into larger garbage collection
times and consequently slower application response times.
The default initial heap size for WAS is 50MB and default
maximum size is 256MB. In most cases the initial heap size should be set lower
than the maximum heap size, however in cases where optimal performance is a
priority specifying the same value for the initial and maximum heap size is
recommended. The JVM heap size settings can be changed from the administrative
console:Servers > Server Types > WebSphere application servers >
server_name > Java and process management > Process definition > Java
Virtual machine.
Review Garbage collection policy
The
IBM JVM in WAS supports four garbage collection policies. Starting with version
8.0, gencon is the default policy. From personal experience, gencon is
the policy that yields the best throughput and overall smaller collection pause
times. Of course, this may vary depending on the specifics needs of your
application, but I normally recommend using gencon as a
starting point.
Beyond JVM Tuning
JVM
tuning is only one area of the tuning that needs to be done in WAS. Depending
on the nature of the application, here are other settings that may need tuning.
Monitor and tune Thread pool sizes
Thread
pool settings can be changed from the administration console at: Servers
> Server Types > WebSphere application servers > server_name >
Thread Pools. A thread pool maximum size can be increased to improve
concurrent processing. Depending on the nature of the application, different
thread pools are more relevant than others. For instance, Web Container thread
pools are more relevant for web applications. The Default thread pool is
another relevant thread pool that is used by most applications. The actual
number of threads allocated to each thread pool should be monitored, to confirm
that there is a legitimate need to increase its size.
Monitor and tune JDBC Connection pool sizes
If
your application connects to a JDBC data source, connection pool sizes come
into play. These can be changed from the administration console: Resources
> JDBC > Data sources > data_source_name > Connection pool
properties.
Monitoring Tools
Tivoli Performance Viewer
Ideally
your organization should use a robust monitoring solution to monitor for server
and JVM health indicators and proactively alert when certain thresholds are
reached. If your organization does not provide such tools, developers can use
the Tivoli Performance Viewer included in WAS. The Performance Viewer allows
monitoring for CPU usage, Java heap size usage, thread pool sizes, JDBC
connection pool sizes, among many other indicators. The Performance Viewer is
accessible from the administration console at: Monitoring and tuning
> Performance Viewer > Current activity > server_name. You
can then expand the different sections of interest and check on the indicators
to be monitored. In the screenshot below we are monitoring for Heap Size,
Process CPU Usage, Web Container's thread pool size and WPSDB JDBC data
source's pool size.
WAS Performance Management Tuning Toolkit
The
Performance Viewer can be helpful to monitor for a small set of indicators
within a single server (e.g. during development). However when you need to
monitor several indicators across multiple servers on a cluster the Performance
Viewer is difficult to navigate. A better tool for monitoring multiple servers
at the same time is the "WAS Performance Management Tuning Toolkit".
This
is a very useful tool that connects to the deployment manager of a cluster.
Once connected you have access to all of the performance indicators available
on the Performance Viewer. It is much easier to navigate and switch back and
forth different servers and indicators.
Troubleshooting Application Performance
Problems
Here
are a few tips and artifacts that can be used for troubleshooting application
performance problems.
Enable verbose GC to identify frequency and
time spent during garbage collection
The
verbose output of the Garbage Collector can be used to analyze problems. To
enable verbose GC output login to the Administrative console and navigate to: Servers
> Server Types > WebSphere application servers > server_name > Java
and process management and check "Verbose garbage
collection". Verbose GC output will then be captured on the native_stderr.log file
on the server logs. Verbose GC output can be analyzed with the "IBM Pattern Modeling and Analysis Tool for Java Garbage Collector".
This
tool can provide useful information such as whether the Java Heap size was
exhausted, number of garbage collections, pause time used for garbage
collections. The tool also recommends configuration changes. The key items to
look for are: garbage collections that are taking too long to run and
whether they are happening too frequently. This analysis can also help
measure the effect of different heap size configurations during load
testing.
Capture Javacore files to look for unexpected
blocked threads
Javacore
files are analysis files that are generated either captured manually or
automatically when system problems (e.g. out of memory, deadlocks, etc.) occur.
These javacore files include environment information, loaded libraries,
snapshot information about all running threads, garbage collection history and
deadlocks detected.
Javacore
files are created by default on the <WAS_install-root>/profiles/<profile> directory
and they are named as follows: javacore.YYYYMMDD.HHMMSS.PID.txt
A
javacore file can be captured in Linux using this command: kill -3
<pid of java process>
A
useful tool to analyze these files is the "IBM Thread and Monitor Dump Analyzer for Java".
Typically
you should capture multiple javacore files when symptoms of a problem are
happening or about to happen (e.g. during a load test). This tool allows you to
compare thread usage among the javacore files. This allows identifying blocked
threads and identifying what was blocking these threads. In some cases, the
blocked threads can be expected (e.g. waiting for a HTTP response), but in
other cases the stack trace may reveal unexpected blocked threads.
Analyze heap dumps to look for potential
memory growth
Heap
dumps are snapshots of the memory of a Java process. Heap dumps are generated
by default in WAS after an OutOfMemoryError occurs. In WAS they are saved as
.phd (Portable Heap Dump) files. A heap dump can be useful to identify
potential memory leaks.
The IBM HeapAnalyzer can be used to analyze .phd files.
Keep
in mind that phd files can be very large files depending on the heap size. You
will likely need to increase the maximum heap size parameter when running the
HeapAnalyzer. The heap size needs to be at least the size of the .phd file.
The
HeapAnalyzer will show the allocation of heap for each object class and it will identify
potential memory leaks. This tool does not show variable values, which
makes it hard to isolate culprits when you have multiple applications deployed
that use similar object types.