Saturday, March 15, 2008

Solving a Memory Leak in ColdFusion MX

Executive summary: Complex queries can cause a memory leak in ColdFusion MX. It seems that when ColdFusion caches the prepared statement of a complex query, that statement never gets freed for garbage collection. The solution is to set the value of the Max Pooled Statements in the datasource settings to zero.

Read on if you want the gory details of how we came to this conclusion.

We recently took the giant leap from ColdFusion 5.0 to ColdFusion MX 6.1. There are several reasons why we haven't made the migration until now, some were well-founded, but none of them are going to be of any interest to anyone, so I won't bother you with them. The transition was done as we upgraded to newer servers. We set up the new servers with MX 6.1 and gradually moved traffic over to them. As the MX servers took on more traffic, they became more unstable. The JRun service would just randomly die without any trace of a problem.

We were less than a week away from the scheduled release of a large MX-dependent project. To prepare, we moved the last remaining traffic to the MX servers, and the servers became extremely unstable. During peak traffic time, if one server happened to die, the load balancer would pull it out. Then all of the traffic would slam the other server, it would slow down, and get pulled by the load balancer, starting a vicious cycle that would go on until traffic died down.

After no small amount of work and research, we ended up adding physical RAM to the servers and increasing the heap size and the max perm size in the JVM arguments. We increased these parameters as high as they would go and still allow the service to start. It seemed to help. We had two solid days without any problem, so we decided to roll out the new project.

It promptly fell on its face.

The servers could not stay up, under normal daytime load, for more than 30 minutes. We had to roll everything back.

At this point, it's a disastrous mystery. The new code is suddenly suspect, of course, but there are plenty of other variables to consider as well. A few days after the code was rolled back, we continued to see instability with the MX servers, so that pointed to a more general issue rather than anything in the new code. The MX servers are around 3-4 years newer than the CF 5 servers they are replacing, and they can barely keep up with the traffic. (Whoever said CF MX 6.1 would perform anywhere close to CF 5 must be in marketing, because anyone with any familiarity with Java knows it's not going to outperform a native application.)

The new code was the first to rely heavily on CFC's, one of which was particularly large. Our initial hypothesis was that this particularly large object was being instantiated but not freed for the garbage collector, leading to a rapid increase in memory usage that eventually chokes the server. We looked extensively into the reported memory leak ColdFusion MX. We even went so far as to refactor a major section of the project to avoid using CFC's altogether.

Another major suspect was the JVM itself. ColdFusion MX 6.1 uses Java 1.4.2, which is old. I have no idea how old because I didn't see any version history after a very brief look at Wikipedia, but it's the first flavor of Java I ever knew. Of course, Macromedia/Adobe does not recommend using a newer JVM on 6.1, which could be an honest recommendation or a ploy to get people to upgrade. Either way, we started taking a serious look at ColdFusion MX 8.0.

When we started load-testing (this point in the software life cycle is a terrible time to start load-testing), it immediately became obvious it was indeed a problem with memory. The old code, which was stable during the testing, when ran under simulated load would use 80-85% of the memory allocated to the JVM. The new code, ran under the same load, would climb to around 95% memory usage and induce a constant cycle of full garbage collections. These full garbage collections would not recover any memory, however. Since a full garbage collection has to pause all threads in order to run, the server would be come unresponsive for all intents and purposes.

We scrounged up another server that we could install ColdFusion MX 8.0 onto. Although the code failed in ColdFusion 8.0 just like it did in 6.1, this was really a godsend, because ColdFusion 8.0 uses Java 1.6 by default, which has much better tools for investigating these types of issues than 1.4.2 does. After a few heap dumps taken under load, and a lot of help analyzing them from our in-house Java experts, we figured out most of the memory was being used by database connections. We had over 2000 instances of database connections, and all of their supporting objects, instantiated. That's pretty impressive when you consider the load test only simulated ten concurrent users.

We eventually narrowed the problem down to one query. Replace the real query with a simple "SELECT staticValue AS columnName" type query, and everything ran like a swiss watch. The query in question was fairly complex, with several joins. The funny thing is that it existed in the old code, and was only made a little more complex in the new code.

Before we set off trying to optimize this query by using views, stored procedures and every other trick we knew, we started playing with the datasource settings. We noticed that the default value of the Max Pooled Statements setting is 1000. We just happen to have two datasources used in the application, and 2 x 1000 = 2000, the approximate number of database connection objects in our heap. We cut this back to 100, with negligible effect. We set it to zero, and, wow, did that ever help. Not only did that make the application stable (which is the most important metric), memory was stable at 20-25% percent usage (a dramatic change), CPU load was down slightly, average request time was slightly faster, total requests per second was slightly higher, and the database performance was not effected at all. All good and no bad.

We made the change to our production servers, and memory usage seems to be stable around 10%, rather than near 60-70%.

So there you go - if you happen to have a ColdFusion MX server that is operating at high memory usage, you may want to check your Max Pooled Statements setting. Of course, try it in a non-production environment first. There are so many possible causes of a memory leaks in ColdFusion (and in general, Java and other virtual machines as well), I hate to add one more thing to the check list, but maybe it will help someone out. I should mention we verified this behavior against Microsoft SQL Server 2000 and 2005, and I can't say if any other DBMS are affected.

In a piece of wonderful irony, we discovered that someone had posted this very same advice under the username of bert.d to the 6.1 livedocs; after we had figured it out on our own of course.

Wednesday, March 5, 2008

Real-Life Search Engine Optimization

An acquaintance of mine was launching a web site for his business and wanted some advice on how to publicize it. Ah, the classic Search Engine Optimization (SEO) problem. I have always wanted to formate a proof that this was an NP-Complete, as results can be a polynomial in time and effort.

This is a huge question for all sorts of people, as it affects every web site on the internet in one way or another. The reality of the problem only hits you when you stop to consider the position of the search engine, whose sole purpose (forget about selling ad space for a minute) is to completely categorize and index the world wide web to allow people to find the information they need with minimal effort. This is a hopelessly impossible situation the search engines are in, so they are forced to take short cuts, cheat off each other, make assumptions, and, in the end, usually provide sub-optimal results.

The best advice I have: think like a search engine. This may be easier for programmers like myself, but probably not. I would venture a guess that very few people in Google would be able to tell you how to best present a site to the Google god for processing.

The next best advice: know your target audience. Try to imagine someone looking for your web site, or more appropriately, the information on your web site. What search term(s) would they use? Make sure those terms are on your web site, and draw as much attention to them as possible. It's no small irony that the tech community has it easy here. Programmers often search for specific terms (strings) that aren't even real dictionary words that can often only mean one specific thing. If I am looking for documentation on a certain ColdFusion tag, maybe <cfinvokeargument>, I can simply enter "cfinvokeargument" into any search engine and find what I need. If I were to type in "orange", however, the search engine wouldn't know if I was looking for the proper RGB values of the color orange, what varieties of oranges are grown in Texas, or the location of the county courthouse in Orange county, CA.

Enough with the abstract advice. Here are a few suggestions that I have used in real-life situations, in no particular order.

Domain Name is a Key
Get a good domain name, like AcmeRocketSales.com or something closely related to your company or web site name. This may seem like a no-brainer, but search engines will see this as a mark of legitimacy and permanency, just like a human would. Darn, those computers are smart.

The Power of the People
Get your site listed on the Open Directory Project. This is a project with the goal of indexing the entire web - with a twist that it doesn't rely on robot/spider software to "crawl" the web to find and categorize web sites. Instead volunteer human editors must review and approve every submission. This results in a very accurate listing of legitimate sites. As a result, normal search engines, like Google and Yahoo, pay very close attention to web sites added to the ODP. Getting listed here will go a long way to getting your site more value in the eye of a search engine.

Meta Tags do no Harm
Use the <meta> tags for keywords and descriptions in the HTML code of your web site. Make them the same on every page of your site. These tags used to be the primary thing that search engines used to automatically index web pages, but they were abused so much by people that the search engines started to just ignore them. So then people quit using them. Rumor is they are on the rebound though. Whatever the case may be, they can't hurt.

Self-Linking is like Self-Promotion
Make sure every page on your web site has links to the home page and the top level sections of your web site. This is pretty standard nowadays with site layouts where every page has a common navigation area.

Embrace the Power of the Web
Get other web sites to link to your site. Every web site the links to yours increases the importance of your site in the eyes of a search engine. If any of your employees have a blog, facebook, myspace, etc., have them add a link someplace. You could approach some of your customers and ask if they would consider putting a link as a recommendation (business cards with your URL help a lot here). By the way, Google happens to run a small blogging site. You think it scans those blogs more frequently or more efficiently? I bet so...

Geocode Yourself
Your physical location is one of the most unique attributes you have, so you had better use it. Make sure your address is displayed, in full, on your web site. The more often, the better, so consider sticking it in the footer or somewhere. Register with Google local; provide as much information as you can.

Feed the Beast, and the Beast May Not Eat You
You can always throw in the towel and advertise with search engines themselves, like Google. I'm not real sure how this works, but I think you can set something up so your ad shows up if someone searches for certain search terms. I would consider this a last resort. It's a sure-fire solution, but it's not free.