Hi, I am Boby, a frenchie working for TheNetcircle since 3 years and i act now as technical lead. So this is my first article and i hope the beginning of a long serie of technical ones. I won’t be the only one writing in this category; i will try to push other workmates to share their knowledge and experience. For this article i am going to talk about our main issue at the moment, which solution we chose and why. Hope you will enjoy it.
We are currently busy developing a new version of our main website, a very special dating community. It is quite a success, we have 2 millions registered users and at peak hours around 20,000 concurrent users browsing the site. To serve this traffic we are using 50 machines like databases, static servers, php servers… All those have to be organized to achieve scalabilty and a kind of high availability.
Since 2 years we are using a very traditional setup. We wanted to separate the traffic between static and dynamic content, as it is better to optimize. We had then 2 clusters: one for static content using Lighttpd and one for dynamic content using Apache server. For each cluster there is a loadbalancer in front, currently LVS. And on top of that we have several Squids which forward the requests to the appropriate cluster. The Squid in this case was just using as reverse proxy and forwarding, no caching as the Lighttpds are fast enough to deliver the content (in fact putting a squid in front of them was more a drawback than anything else). Some static content is served directly from the loadbalancer using another domain name; still for AJAX requests the static content has to be under the same domain name.
Here is the layout as described:
Every layer scales; the Squids by DNS round robin, the rest by LVS and up to now this was working fine. Still it is a lot of physical servers and it would be nice to reduce it to ease the maintenance and reduce our expenses; especially with the new version of the site coming, we could spend some time experimenting.
We started by the Squid. By looking at his load we noticed that it was only using one CPU, letting the other one collecting dust. Squid is not multi-threaded and except launching 2 instances of it, there won’t be any benefit of running it on any modern CPUs. We then looked for other solutions; first very old redirection daemons, then incomplete Squid wannabes and finally lightweight HTTP servers. That is where we found NginX.
NginX is, as they said, “a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server”. As we heard before by browsing some ruby forums, “Nginx is known for its stability, rich feature set, simple configuration, and low resource consumption”. The best of Russian hacking, that is so badass; we had to try it. After playing with it a while, we noticed that not only it will replace efficiently our forwarding Squids but its features could cover most of our requirements for the site.
- Request forwarding: That was our main focus at first and there is a module rewrite. It is using a simple but elaborate syntax which allow conditional statement, include of a set of rules and real POSIX REGEX rules. This was a big change compared to pseudo REGEX rules used by Asqredir as the Squid used to be configured. After rewriting our current rules, we had a more organized configuration and no rules duplication. For sure there was some limitations as the rewriting language is sometimes too simple (not more than 9 variables can be used), but nothing blocking or which would have killed our enthusiasm.
- Load balancing: Simple loadbalancing is natively included in NginX. For each domains you want to loadbalance you specified the IP and here you go. Bye bye LVS.
- Static server: As Lighttpd, NginX is an asynchronous lightweight HTTP server. So it will just deliver as fast as it can any static content and no need for caching. Adios Lighttpd.
Here is what the layout looks like then:
Much simpler and less machines, which means less maintenance and less hosting fees. Several NginX with DNS round robin, also IP fail over to be safe and it is ready for production. Well we still need more test, configuration tuning but from what we have seen we are confident.
August 22nd, 2008 

August 21st, 2008












