Thursday, December 18, 2008

Simple way to get your Java application internal state

I think everyone who wrote Java applications had some thoughts like this "It would be great to periodically check internal application states from the command line or script to see if everything is OK".

Right command line interface is a simple and native way to trace what is going on inside the application that provides access to for example internal application variables.

Here is my answer that probably can help. a kind of alternative to the JMX ;) The main idea is to open socket on certain port and send some commands from command line to this port. Application will catch those commands parse them and write back the result that we could see on STDOUT.

public class TcpTest {
// List of supported commands
enum Command {
GET,
SET,
TEST
}


public static void main(String[] args) throws Exception {
// Create server socket
ServerSocket serverSocket = new ServerSocket(8000);

while (true) {
// Wait for connection and read data
Socket sc = serverSocket.accept();
BufferedReader sr = new BufferedReader(
new InputStreamReader(sc.getInputStream()));

String command = sr.readLine().toUpperCase();

try {
// Parse command
Command cmd = Command.valueOf(command);

// Write answer
switch (cmd) {
case GET: sc.getOutputStream().write(
new byte[]{'A','N','S','W','E','R','\n'});

break;
case SET: sc.getOutputStream().write(
new byte[]{'S','E','T','\n'});

break;
case TEST: sc.getOutputStream().write(
new byte[]{'P','A','S','S','E','D','\n'});

break;
default: sc.getOutputStream().write(
new byte[]{'U','N','S','U','P','P','O','R','T'
,'E','D','\n'});

}
}
catch (Exception e) {
sc.getOutputStream().write(
new byte[]{'U','N','S','U','P','P','O','R','T',
'E','D','\n'});

}
finally {
sc.close();
}
}
}
}
This code is not exhausted but explains the approach. And of course when you start it you should be able to communicate with this program by sending commands to the port 8000.

On Linux it's echo "GET"|netcat localhost 8000 executed from command line. Also you can use it from shell script and get back command result from standard output, parse it and handle.

Monday, December 15, 2008

javax.sql.DataSource? Forget about it.

Well, how much time do you usually waste fighting with some stupid inconsistency in implementation? I think quite a lot if you are in charge of some investigations and integrations. So do I and here is a story to cheer you up.

We all know about Tomcat and their way to configure data source for the application. Just to refresh your memory here is a snapshot of
context.xml

< Resource name="jdbc/mydb"
auth="Container"
type="javax.sql.DataSource"
driverClassName="oracle.jdbc.driver.OracleDriver"
url="bla-bla-bla"
username="bla"
password="bla-bla"/>

Basically this means that we are going to establish connection with Oracle database. But Tomcat in this particular case will use DBCP and give us a wrapper to the Oracle data source and thus some useful Oracle features won't work.

Simple and native way to get all features is to ask Oracle to provide the connection by adding "factory" property to the Resource tag. It should look like this:


< Resource name="jdbc/mydb"
auth="Container"
type="javax.sql.DataSource"
factory="oracle.jdbc.pool.OracleDataSourceFactory"
driverClassName="oracle.jdbc.driver.OracleDriver"
url="bla-bla-bla"
username="bla"
password="bla-bla"/>

Looks good. Simple. Yeah. Wait man it does not work! I got strange message in Tomcat console:
SEVERE: Null component Catalina:type=DataSource, path=/bla, host=localhost, class=javax.sql.DataSource, name="jdbc/mydb.
Guys I need help! Please!

So that's what you can find if you Google this issue. Also there is a recommendation to replace "javax.sql.DataSource" with "oracle.jdbc.pool.OracleDataSource" like below.


< Resource name="jdbc/mydb"
auth="Container"
type="oracle.jdbc.pool.OracleDataSource"
factory="oracle.jdbc.pool.OracleDataSourceFactory"
driverClassName="oracle.jdbc.driver.OracleDriver"
url="bla-bla-bla"
username="bla"
password="bla-bla"/>

Right. Cool stuf. It works like a charm but you don't see this data source in Tomcat console. The Tomcat does not recognize it as DataSource and does not publish it as DataSource MBean. And you cannot monitor it and all connections.

What the f.... is wrong with it? could you ask and here is the answer. ObjectFactory interface has only one method


public Object getObjectInstance(Object obj, Name name, Context nameCtx, Hashtable environment)

and in our particular case the very first parameter passed into it is a ... what do you think ... right reference to the type. Guys from Oracle are smart enough to handle it and process some cases like
  • oracle.jdbc.pool.OracleDataSource
  • oracle.jdbc.xa.client.OracleXADataSource
  • oracle.jdbc.pool.OracleConnectionPoolDataSource
  • oracle.jdbc.pool.OracleOCIConnectionPool
but they don't care about javax.sql.DataSource. They just return NULL in this case instead of default implementation. Guys, have you ever read about Null Object pattern? What the hell are you doing there.

That's it basically and there is no way to get rid of it. Wasting time is our job.

Thursday, December 11, 2008

Java monitoring tools/suites

Recently I spent some time on investigation what sort of monitoring tools can help in solving production issues. And here are some thoughts about it.


Issues I usually run into in production and that I'd like to solve

  • Memory footprint/leaks (caches, collections).
  • Threads/pools/executors. They should have limited, possibly configured number of threads.
  • Database connections/throughput.
  • Number of different requests coming into the system.
  • The most CPU consuming tasks.
  • Application availability (whether or not it is started and reachable)
So these are the issues that happen in production and which are usually hard to solve.


Monitoring tools

Here is a list of frameworks/tools/suites I found and looked at so far

  • JConsole
  • Visual VM
  • Lambda probe
  • GlassBox
  • JAMon
  • Spring AMS/Hyperic HQ
  • JXInsight

Let me describe them and point some key features in context of issues listed above.


JConsole


This one provides the access to the started VM using different kind of Mbeans/MXBeans. Following information can be extracted from the running VM:

System information
  • Threads (Peaks, current number and so on),
  • Memory (Heap/NoneHeap)
  • Loaded classes
  • OS state (operation system vendor, system properties and paths and so on)
  • Garbage collector info.

Mbeans:
  • Custom application Mbeans
  • VM Mbeans (runtime, threading, memory pools, garbage collector)

In most cases this information is quite enough to identify the problem in general. I.e. whether or not this issue is a memory leak or may be over-threaded application. Say, this is a basis that one can get fast and for free but it won't give you any details about application bottlenecks.

JConsole supports pluggable modules that are simple to write and integrate into it.

Going forward we could say that currently this tool gives enough data and together with some applications provided by Sun it could be the best one to monitor and find all kind of bottlenecks.


Visual VM


Heavy-weighted framework based on NetBeans API and thus requires it to start. At the same time it is very flexible in data representation aspect. Pluggable modules allow depiction of any monitored data the way you like the most (charts/histograms/textual view). But this approach gives you yet another representation layer for the same information that you can get with JConsole.

To my personal understanding VisualVM won't give much in comparison with JConsole except may be some CPU/memory profiling features integrated into the VisualVM and it won't help you to find out database bottlenecks.


JAMon

JAMon is not a tool but rather a monitoring framework that wraps your code with proxy objects and logs execution time. Basically it can wrap almost all calls and objects, even the database ones and thus provide comprehensive view on what happens inside the application. Also it has very simple user interface based on some servlets wrapped into the WAR file.

One have to either change code and wrap every monitoring place with JAMon classes or use aspect pointcuts to instrument code at runtime. Both ways have some pros and cons. But it would be great not to change code (avoid any dependency on JAMon) and at the same time not to loose the performance (and memory) with instrumenting code at runtime.

This framework can be used instead of profiler even in the production but in very exceptional cases when we know where exactly issue happens.

JAMon coding example:

import com.jamonapi.*;
...
Monitor mon=MonitorFactory.start("myMonitor");
...Code Being Timed...
mon.stop();


Lambda probe.

This one is much better (and even oriented) for the application servers. Lambda probe can be easily integrated into the web-container or application server and show some additional container specific data like database connections, running servlets, thread pools and even particular thread in one of them.
So it looks like it could help us with database issues but in practice it just give us number of active connections that application has. It does not show MBeans and should be used together with JConsole.


GlassBox

Simple and probably useful monitoring application, but lacks of documentation does not allow to dig into it.

It runs as a wrapper around the web-container/appServer and as I see instruments everything using AspectJ in runtime. The main idea behind it is to get access to all possible Java calls and then filter out those ones that are not really interesting from performance point of view. But it makes execution slower (mostly at startup but at runtime as well, when application gets access to the particular class the first time). Also it consumes a lot of memory to instrument all classes.

Tool developers make some performance assumptions based on some internal criteria and there is no way to configure them. This framework has a few of maintainers (3-5) and last commit to their svn was about a month ago. So I wouldn't recommend to use it.


Spring AMS

The most powerful suite (application management suite) based on Hyperic HQ - world-class leader monitoring framework.

Features:

  • Joins together all application information under the same roof.
  • Different applications can be grouped to provide useful views.
  • Physical box availability with a lot of operating system specific parameters.
  • Depicts comprehensive Spring based details (contexts, executor services, db connections – everything that can be declared in Spring configuration).
  • Has integration with almost all application servers (Tomcat, WebLogic, WebSphere, GlassFish, Spring DM)
  • Configurable alerts could inform in time about failures, lack of memory or overloaded CPUs

At the same time it is:
  • Heavy-weighted (AMS requires server to be installed along with Progress database, agents set up on every monitored box)
  • Proprietary
  • Complicated, details overloaded web-based console.

Beside the basics provides by JConsole, Spring AMS gives some additional useful details about Spring based application like database commits/rollbacks (overall, average). The better integration you have with Spring the more information could be shown on the console.

One can even get all application Mbeans by changing MBeans domain to the “spring.application” (this is a requirement of Spring and the way they define what should be shown in console) and thus adding application specific metrics.

Another requirements is to use compile-time instrumented Spring files. Spring allows to download all libraries from their site and use instrumented version. Also they provide instrumented logging, hibernate, collections, ehcache. So it is oriented to the Spring applications, but IMHO we have a lot of them.


JXInsight

This product is mostly oriented to the development phase but can be used at production as well.

Pros:

  • Integration with a lot of frameworks and products
  • Support for distributed environment
  • Probes to meter and and traces to get paths (stack traces). Common ways to detect CPU consumption and hotspots but very featured ones.
  • True JDBC monitoring on transaction level with long-term statements detection.
  • Allows off-line analyze by taking snapshots at runtime.
  • Hight resolution clock.

Cons:
  • Proprietary.
  • JDBC monitoring is not recommended for the production.
  • Adds overhead by own Java agent and instrumentation
  • Does not support alerts and thus need to be monitored all the time.

So it won't give database activity monitoring at production. The only useful things are probes that meter resources consumption across different customizable groups (read packages/classes).


Profiling


Memory footprints/leaks

Why memory is so important? Simply because of lack of the system resources, but even if you have enough hardware resources the Java GC could take time and thus slow down your application.
Let's assume that we know the issue (whatever tool we used it gave us some information to make the decision) and it's a memory footprint/leak or application over-threading. Next step is to define where exactly this problem occur. What point in code or at least class causes it.
Starting from Java 5 Sun provides set of very convenient tools to identify it. First of all it's a memory dumper. Tool called “jmap” allows getting memory dump by process id (the only issue is that it fails up to JDK 5.0._14). It works very fast especially if traces “live” objects only and could get a 4G heap dump approximately in a couple of minutes. This dump is a memory snapshot with all objects and references between them and thus can be analyzed later to find out the outstanding number of threads or other objects.
Another case is a out of memory exception (OOME). In this case it's recommended to start production application with Java parameter (-XX:+HeapDumpOnOutOfMemoryErrors) that takes memory snapshot right after exception happened and thus we still have a dump file to analyze and find out leaks.

Cpu overloading


This is the most complicated issue because usually it takes a few seconds and it's hard to catch it.
But let's assume that application consumes 100% of CPU and we see it in our monitoring tool. In this case we can go through the list of active threads and find out those ones that are in charge of that. Thread name should give us the point in code that caused this problem.
In most cases this means that application uses all hardware resources and need to be either optimized or scaled. Talking about scalability we should remember two types of it. One can scale-in application by adding more power to the same box or scale-out code by moving some calculations outside the original box and thus making grids (both types computational and data ones).


Database monitoring

It happens very often that application does almost nothing and has little memory footprint and at the same time works very slow. The possible bottleneck could be a database that consumes a lot of resources on a remote box and cannot handle all applications requests.
Usually databases provide tools for their monitoring and optimization but on application side it could be worthwhile to trace database activity as well.
One of the ways is a J2EE data sources registered in application server and showing all SQL statements, the slowest ones, all connections and their activity.


IO stat

The last major performance issue that should be taken into consideration is a input-output throughput. This means both network and hardware activity that can be easily monitored with Linux “iostat” and “netstat” utilities.
The solution in this case is to fix it on hardware level or change application code to diminish data amount sent/saved if it's possible.


Conclusion

So as I see it all bottlenecks can be found with JConsole and some useful tools like jmem, SAP mat, operating system tool (iostat, netstat) and database specific applications. The only thing that should be solved is an application availability. But this could be resolved with Apache/shell scripts.

Sunday, December 7, 2008

JPPF and GridGain. Two Java computational grids.

It took me a month to get back to my blog and today I'd like to talk about JPPF (Java Parallel Processing Framework). As usually I will compare it with GridGain just because GridGain to my personal understanding is the best one in some aspects.
Last couple of years JPPF grew up very intensively and brought some new features that made it flexible and robust.

Differences in architecture.

In general we can say that GridGain has only one "layer". Every node ("master" and "worker") can execute tasks either from another node or from itself. Whenever you start GridGain from your code or as standalone application or as a service integrated into the application server you should know that you start new node and to avoid any calculation started on this node one should make some configuration changes. Normally it should be node attribute that says to topology SPI not to include this node into the calculation.
On one hand this is very flexible because you don't need to change your code or start additional services to involve this node into calculation, but on the other hand it messes concept up for the newcomers who usually expect some clients and servers (just because of the common multi-tier approach).
Another consequence of this approach is a peer-to-peer architecture. All nodes are connected to each other. Obviously this leads to the nodes number limitation (because of the network traffic). I know that GridGain jumped on this issue and is going to solve it quickly.


Unlike the GridGain, JPPF divides the framework into the client, driver and executor parts. Client layer provides an API and communication tools to use the framework to submit tasks, to execute in parallel. Service layer (driver) is responsible for the communication between with the clients and the nodes, along with the management of the execution queue, the load-balancing and recovery features, and the dynamic loading of both framework and application classes onto the appropriate nodes. And execution layer is the node - it executes individual tasks, return the execution results, and dynamically request, from the JPPF driver, the code they need to execute the client tasks.
This approach simplifies the initial understanding and in some cases makes "master-worker" implementation less complicated.
Another benefit of such division is that it overcomes the limitation of the maximum number of nodes. Only servers are connected to each other, not nodes and this segmentation allows having a lot of nodes with a few connected servers. On the other hand if you need simple 5-10 nodes grid, server could be potentially a single-point-of-failure.


Features and features.

Both products are very featured and cover a lot of edge cases. Let me mention some of them:
  1. On demand class loading. To my understanding (just because I suggested this feature and was responsible for the implementation ;) ) GridGain was the first one who supported transparent class loading between nodes. JPPF supports it as well.
  2. Load balancing. Both frameworks support it, but GridGain gives you about 5-7 strategies out-of-the-box.
  3. J2EE integrations. Both products can be integrated into the application servers, but the integration ways are different. GridGain starts either as a service (JBoss, WebLogic, WebSphere) and uses application server resources (executor service, logs, etc) or as a servlet (Tomcat, GlassFish). JPPF registers in JNDI tree and provides its functionality as a standard J2EE component (JCA). As far as I know GridGain has a ticket to support JNDI lookup as well.
  4. Both frameworks are task/job based.
  5. Both frameworks have annotation based execution (@Gridify in GridGain and @JPPFRunnable in JPPF).
  6. JFFP has DataProvider to exchange data between tasks/nodes. GridGain gives distributed TaskSession for that.

Despite of some common features they have a lot of differences:

  1. Communication. JPPF is a TCP/Multicast based approach. GridGain supports various protocols (TCP, JGroups, Mule, JMS, Mail, JBoss and so on).
  2. Extension. JPPF is much more closed then GridGain. Last gives you SPI interfaces and one can extend or write new functionality and integrate it into the GridGain.
  3. Monitoring. While GridGain is still writing their cool monitoring console, JPPF already provided one and as far as I see it's pretty good.
  4. Node information. I did not find any node attributes in JPPF which is not very good because it's very useful when you send your tasks/jobs into the grid. Very often you need to control, which node should execute this particular task. Simple example is executing task on Linux nodes just because it loads some native libraries or uses node specific resources. GridGain supports node attributes and even custom (user defined) ones.
  5. Tasks rescheduling in case of overload. JPPF keeps tasks on server side (note that execution layer is not the same as server one) and all nodes execute tasks as soon as they have free resources for that. This is great but what if one server got overloaded and another one has nothing to do? Nothing in case of JPPF. GridGain will redistribute work (taking into account user wishes) if job stealing is on.

Coding.

Coding approach is more or less similar with some differences.

In JPPF one should start client and submit set of tasks into the grid like this:

JPPF Code

JPPFClient client = new JPPFClient();

List<JPPFTask> tasks = new ArrayList<JPPFTask>();

tasks.add(new HelloTask());

try {
// execute the tasks
List<JPPFTask> results = client.submit(tasks, null);
} catch (Exception e) {
e.printStackTrace();
}
JPPFJob interface can be used as a tasks container and provide some additional functionality. But anyway it's quite far away from Map/Reduce way.

GridGain requires a kind of Map/Reduce implementation and force you to understand this concept. Your task should implement method "map" where you split it into small jobs and "reduce" to collect results from all jobs and reduce it into one task result. but from execution standpoint they look alike.

GridGain Code

Grid grid = GridFactory.start();

try {
// Execute Hello World task.
GridTaskFuture<Integer> future = grid.execute(GridHelloWorldTask.class,
task_param);
}
finally {
GridFactory.stop(true);
}

Conclusion.

Both frameworks are very friendly and reliable. They are simple and flexible. But in this particular case I would say that GridGain wins just because of simple common Map/Reduce approach and its SPIs that provide incredible flexibility.