Thursday, February 19, 2009

Web-services. Edge case performance issue.

Think twice before you choose web-services as the only way of communication. People say they are slow, but in general it's not true.

Web services are fast enough and pretty scalable if you are going to send simple small set of parameters and expecting to receive back small result. Saying small I mean say 10-50k of data. Actually even if you send back more than 50k it will work fine except the case that I'm going to discuss here.

Task:
We need to transfer huge (say 1 megabyte) XML from .Net client to the Java server. Web-services looks very attractive from performance point of view and scalability as well. Why did we chose XML as a data structure? It's quite native for the web-services and provides good structured, human-readable data format.

1st and simplest solution:
Spring-WS. Of course this one. Really simple (especially if you have Spring based server side application). Yep. 20 minutes and you have cool web-service. On .Net side in Visual Studio we create proxy by means of corresponding wizard and "voila!" it works.

Hmm, but not everything is so good. For some reasons empty web-service takes 10-20 seconds to receive this embedded XML that we sent from client.

Explanation:
Right. You did face this performance issue. Let me explain what happened.

Web-Services uses SOAP to transfer data from client to server and back. SOAP is an XML with certain structure and has so called envelope, body, header and so on. When you send XML from client to the server, web-services implementation has to process embedded data and escape all tags in SOAP message body with < > as it is normal text.

So if you sent
<myTag>value</myTag>
from client, on the server side you should get
& lt;myTag& gt;value& lt;/myTag& gt;
instead. Not a big deal actually, but keep in mind that we have 1M XMl here.

Java by default has Xalan transformation and Xerces parsers for XML processing. And when it receives SOAP message it tries to parse ENTIRE message body, taking into account all those "tags". But you can say "They were escaped!". Yes they were. Do you know what does & character mean in XML? Yep - it's a reference and Xerces tries to handle em as well. That's why it takes dozens of seconds to simply receive this text in your web-service.

I personally did not find any way to configure this parser. Just because it happens somewhere inside Spring+Java and you cannot set up Java system property that somehow speed it up.

2nd a bit more complicated solution:
Well, if it's just a parsing issue, we need to avoid such SOAP body and send XMl as an attachment.
Woks great in case of Java-to-Java communication, but what did kill me is that .Net does not have WS-attachment implementation in their web-services extension v3!!! Yep they say use MTOM instead.

Not sure if it's possible in Visual Basic but anyway even if it's possible we cannot change communication protocol on-the-fly. And this change requires some refactoring, planning and so on.

That's all folks. Sad but unfortunately this performance bottleneck was not solved in time. We postponed it. Have a good performance guys

Tuesday, February 17, 2009

Fighting with Terracotta

Recently I waste some time fighting with Terracotta server and 'd like to share my thoughts about it.

First of all nobody solved the issue of clustering database connection and any other network connections either. But to my understanding this is quite important and plausible. This is the only issue that does not allow clustering to come truth.

Now let's get back to Terracotta. The task was to cluster pretty big application tidily integrated with Spring 2.5 (most of database, web-services, executors functionality were written using Spring's WS and JDBC calls). Application runs asynchronous tasks with Quartz. And to sum it up it looks more or less good from software design point of view.

The initial idea was to provide server side failover and thus:
1) Somehow share asynchronous tasks and guarantee re-execution is case of one server fail.

2) Share managed beans exposed outside and their states


3) Since server has complicated business logic that requires some synchronization provide distributed locks (ideally they need to be moved on the database level indeed)


Terracotta is really great product to get deal with. I got excited and confused at the same time because:

1) Terracotta provides integration with Quartz and allows sharing tasks with re-execution, BUT noone tested it with Quarts+Spring (as you probably know Spring gives some good wrappers for Quartz to simplify and unify the code). Thus after wasting 3-4 hours I had to use pure Quartz without Spring.

2) Shared locks as any other shared (distributed) objects works perfect and out of the box, BUT as soon as I start distributing more or less complicated data with some inheritance and fields like Loggers in parent classes I found myself in troubles. I faced the fact that Terracotta cannot exclude field from the distribution if it is declared in parent class (superclass). I had pretty much Spring stuff and I was not able to change the code and had to find out a lot of workarounds for this.

3) Spring. Yep. Wasted days to understand that it's just impossible to distribute most of the beans (especially those ones that had some JDBC stuff injected). All database connections need to be established when you need them. The same should be done in asynch. tasks.


To sum up this post:

1) Terracotta works fine but be ready to refactor your code and keep in mind all "product features" :)
2) Forget about database stuff injection into distributed beans or asynchronous tasks.
3) Be ready to feel "magic inside and consequences outside". Terracotta is a "black box" with some magic.