Archive for category Uncategorized
The Open Proxy Saga
Posted by Martin in Performance, Uncategorized on October 26, 2011

A couple weeks ago I was messing with a few Apache configs, trying a few things that could improve the server performance. Everything was fine until late last week when I noticed that the page was really slow. Initially I thought it was a connectivity issue but after a couple hours I decided to troubleshoot it. First thing to do is check the logs for any possible explanation. Found two interesting messages:
[error] server reached MaxClients setting, consider raising the MaxClients setting
That is interesting specially because I tweaked the MaxClients setting not too long ago and the traffic has not increased significantly since then. The second interesting information was the number of GETs to external domains. That can’t be right. Why users would be requesting pages from other domains?
First thing a thought was ‘Damn, I’m serving as an open proxy!’, and I was right! Went to check the Apache configs and found:
ProxyRequests On
ProxyRequests was set to On, meaning that Apache was serving as an open proxy.
Second thing I went to check was the server statistics. Interestingly 3 days ago the memory usage increased significantly and and also the bandwidth utilization. More memory was coming from more Apache processes, showing exactly when it started. But how I started to get so many requests so quickly? Goggled it. My IP was listed on several open proxy lists, containing even the status, latency and even the last check time. That is awesome! Probably they have bots port scanning all around. One of these bots found my IP and published it somewhere and this list was replicated and replicated from here to Japan!
Obviously I don’t want to be serving as and open proxy for several reasons, so I went and changed the ProxyRequests back to Off. Right after I changed it, I saw the logs growing enormously. That’s when I noticed the extent of the problem. I was serving hundreds of concurrent users, a pretty good burn test for the server. And guess what, after days like that, it was still rock solid!
Now the second part of the saga. After turning ProxyRequests back to Off, besides the huge increase on logging (error only), CPU spiked to a load average of 22 on a 4 proc server. That’s a lot for those not familiar with Linux. An increase on logging is expected, since we’re having far more errors now, were users requests for other domains are failing. An increase on CPU usage was also expected, since the number of requests to my main page increased significantly (failed proxy requests are redirected to the default Apache site), but not as much as 22.
Checking the logs again I’ve noticed a huge number of errors stating that the URL was too long. All of these ‘long’ URLs had the same format, an external domain, followed by ‘http’ in a loop, like ‘http://www.google.com/httphttphttphttphttphttphttphttphttphttp…’. That was strange, why would someone requests a website like this. Then I decided to try using my server as a proxy. The same thing happened, tried google.com and I was being automatically got to a redirect loop that and appending ‘http’ to my requests until reaching a limit of 20 or so redirects. This means that every proxy request by users was generating over 20 requests on my server. Next step is to check why that was happening. Time to ‘telnet’ my server on port 80:
GET http://www.google.com/ HTTP/1.1
Host:www.google.com
That returned and HTTP 301 (Moved Permanently) response, moving to the same domain, but appending ‘http’ to the address. Good, same behavior we had on the browser. Now why is this happening. Looking a little bit further into this, I’ve found that when Apache gets a request for a domain that not in your virtual host list, it responds with the default virtual host, or the first virtual host loaded if you have not defined that explicitly. My default virtual host is my main website, a WordPress based site. Analyzing this further, I’ve found that when WordPress receives a request for an unknown page, it redirects to a standard page, instead of returning an HTTP 404 (Not Found) error. That is called canonical URL redirection and is used for a number of reasons, from enabling alternative URLs to ‘fancy’ permalinks. That explains the loop. Apache opens the default website with WordPress, which redirects automatically to the same non existent domain, just appending the requested page to the address. Since the ‘:’ char is not valid in an address, WordPress stops there. Since the user still has my server set as proxy, the process starts again, but that time with an extra ‘http’, and so on, on a infinite loop. So how do we disable that??
Found a simple how-to at velvetblues.com. You simply have to add the following line to your templates ‘functions.php’:
remove_filter('template_redirect','redirect_canonical');
Tried that and it worked! Now when I try to use my Apache server as proxy, all requests return a WordPress page mentioning that the page was not found. Problem solved!
…
Not so much, we still have the part three of the saga. I waited a few minutes and checked the server statistics again. CPU usage reduced significantly, to a load average of 5. Still a lot, but much better than 22 we had before. Server was responding quickly, but I’m still not satisfied. I don’t like the fact that a lot of leechers are consuming a lot of resources on my server. How can I improve that assuming that leechers will keep trying to access my server as a proxy for a while before figuring out it is working anymore. To solve this we have plenty of options, from simple ones to more complex ones like adding modules to Apache to ‘iptables’ block users that try to request domains that are not on the virtual hosts list. I don’t want to waste too much time on this since it’s not critical, so I opted for a very simple solution. Dynamic pages are very resource intensive compared to static pages. I don’t really care about serving a ‘nice’ page to users that trying to use my server as a proxy. So why not show these users a simple html page instead of my WordPress website? Well, Apache serves the default website to virtual hosts not matching any virtual host on the list, so I decided to simply change the default website to the default and well known Apache page ‘It Works!’. To do so, I just had to enable the default Apache site that was already there, just not enabled.
Guess what?? It worked. Requests to non-mapped domains were being served with a simple ‘It Works!’ page. Waited for a few minutes and checked the server statistics again, and wow, load average went to 0.1. Problem solved. Serving simple static pages reduced the CPU usage drastically. Now I just have to deal with the error log file.
That was easy, since now all ‘undesired’ users were being ‘redirected’ to the default Apache website, it was just a matter of changing the error log level. Just went and changed the following line on the default configuration:
LogLevel crit
This will only log critical errors, which are not the errors we’re having now, solving the log file issue. Ohh, just remember to comment the CustomLog line too, to avoid access logging, which is even worse.
Cheers,
Martin
Chrome or Firefox?
Posted by Martin in Uncategorized on October 25, 2011

I’ve been a Chrome user for quite a while and before that, Firefox for a long time. I decided to make the move almost 2 years ago, right after extensions were introduced. The main reasons to do so were the start time (cold and warm), which by that time were blazing fast on Chrome compared to the slow Firefox and the cool new Chrome start page. So far I’ve been happy with Chrome besides a few problems with it’s extension capabilities, but the recent launch of Firefox 7 made me question my decision. The new Firefox is as fast, if not faster than Chrome on start time and apparently consumes significantly less resources. That, combined with more powerful extensions is a recipe for a great browser.
Now the requirements. What is really important to me is, besides the speed and footprint, are a few particular extensions. Good support to Google Bookmarks, Evernote, Read it Later, a Session Manager, AdBlock, a bulk media downloader and a Twitter client.
So far I came to the following analysis:
| Chrome | Firefox | |
| Google Bookmarks | Partially. YAGBE works but it’s not great. | Yes, thru GMarks |
| Evernote | Yes, thru official extension | Yes, thru official extension |
| Read it Later | Yes | Yes |
| Session Manager | Yes | Yes |
| AdBlock | Yes | Yes |
| Media Downloader | No. Apparently this is a Chrome API limitation | Yes, thru DownThem All or Download Helper |
| Twitter Client | Yes, TweetDeck app | No. Could not find a web app at the same level as TweetDeck. |
| Nice Start Page | Yes. | Yes, thru myfav.es. |
So what is your opinion? Have any extension suggestions? Firefox or Chrome?
Updates on the Blog
Posted by Martin in Uncategorized on February 26, 2010
Some of you may not have noticed, but during the last couple days I’ve made a few improvements to the Blog, adding a few links, changing categories, new sections.
- Added two buttons to the sidebar to make it easier to follow me on Twitter and RSS.
- Changed the RSS feeds to FeedBurner, this way I can analyze the subscribers data with a little bit more detail.
- Changed the Bookmark buttons on each post, making it easier to share each item.
- Split the categories in different subjects. This way its easier for readers to browse through the subjects that matter for them.
- Created different feeds for each category, so the readers can subscribe only to the content they care about.
- Created a new section (Page) for questions and answers. I’ve noticed a while ago that several users land on the blog while searching for a specific problem (specially with LoadRunner). Some of them ask questions commenting on a post, but my guess is that most of them just go back to Google. Now these users have a space to ask questions and make their suggestions.
I’m still working on a few new improvements. Expect more soon!
WordPress supports RSS Cloud now!
Posted by Martin in Uncategorized on September 8, 2009
If you still don’t know, WordPress adopted RSS Cloud (rsscloud.org) for all its blogs today!
RSS, short for Really Simple Syndication, helps you stream all of your news and blog sources into an easy-to-manage RSS reader such as Google Reader. Millions of people use RSS to keep up with Mashable, The New York Times, and even LOLcats.
However, it does have its limitations. The big one is speed. It can take minutes to hours for a blog post to reach the reader through RSS. This has been a big reason why more and more people are turning to real-time services like Twitter and FriendFeed for their news. In the real-time web, delayed news and information just isn’t good enough.
Now WordPress has done something big that eliminates that RSS delay problem and brings WordPress.com’s 7.5 million blogs into real-time, along with any other self-hosted WordPress blog. It has implemented RSSCloud, an RSS element that makes instant syndication of blog posts possible. However, it does have a few obstacles to overcome before your RSS is just like Twitter.
Obviously I already updated the blog to support it too. Now everyone with a compatible RSS Reader can have instant syndication!
Via mashable.com
First result on a Google Search
Posted by Martin in Uncategorized on August 5, 2009
A colleague called me on the IM yesterday with some great news. He was searching for some information on LoadRunner and came out with the following search query:
load runner response server size
Guess which page came as the first result from Google? This very own blog.
Good thing for me that my blog is being quoted by other blogs and growing in popularity. Good thing also for him that didn’t have to go too far to get his answer.
ps.: Yes, I know that this is not a search query that you will come up with every day!



