Category Archives: Development

Saving pandas with nginx/memcache

by Mauro

During the time when we were building the server environment for a new version of one of our community pages, we tested many different server applications and architectures. One thing which brought us a big speed and efficiency improvement was the combination of Nginx, Memcached and PHP-FPM which we are using and which I’m going to line out here.

First I’ll need to introduce the three main components a little.

  • Nginx is an extremely efficient high performance webserver. Actually its not only a webserver, but I call it that since its other functionalities are crap (subjective opinion). For serving HTTP it provides many things like very advanced URL rewriting and content filtering that includes subrequests to HTTP, FastCGI, Memcache and other backends.
    nginx-logo
  • Memcached is a popular and well known small, fast and simple key-value cache. Memcached
  • FPM is a patch for PHP and stands for FastCGI Process Manager. As the name says it manages processes to serve FastCGI. We first tried to use the spawn-fcgi process manager that comes with the Lighttpd package, but FPM proved to be more reliable and it has some nice features which spawn-fcgi doesn’t provide. PHP-FPM

In our setup we use Nginx as webserver in front of the environment and first entry point for user requests. For each request to non-static files it creates a FastCGI request and forwards it back to the right PHP Cluster.

Caching makes more sense on static content, so we make it static

On our site we serve a community in which each user has a profile that can be seen by other users. Those profiles are probably the most accessed part of the page, so it makes sense to think twice about how to cache them. Big parts of the profiles are more or less static, the users usually don’t change their profile data every day. So first we had to extract all the more dynamic parts, like the guestbook or the users recent forum posts from the profile by loading them via ajax. Now that we load those things on a separate request, we can start treating the profiles as almost static pages.

First steps with the new environment

In the first few months when we started serving the new version to users as a beta, all the requests to profiles arrived on the PHP cluster as FastCGI requests. To serve a request, PHP loaded the whole framework, checked if the profile was already on Memcache, and if not it generated the profile and stored it on the Memcache cluster. If the same profile got requested again, the FastCGI request went back to the PHP again and the PHP loaded the whole framework again only to check the Memcache.

blog nginx-memcache pic1

Improvement

If a profile got stored on the Memcache by PHP already, it doesn’t make much sense to load the whole, in our case really heavy, PHP framework again just to decide that it doesn’t need to regenerate the profile but read it from the cache instead. Luckily Nginx provides a really nice functionality to do subrequests to Memcache directly.

Probably you can already more or less imagine what we did. And now in detail:

Everytime when a request arrives at the Nginx, it first checks if this is a request for a static file or another subsystem like for example the forum. If not, the Nginx itself checks on the Memcache cluster if a key exists which it generates out of the URI. If it gets a hit, it totally bypasses the PHP by serving the request directly from Memcache.

To implement this, of course we needed to make sure that nothing else might accidentally store a key into the Memcache that might get hit by the Nginx and make it serve nonsense. Maybe it sounds stupid to mention this because its obvious, but honestly it actually happened.

blog nginx-memcache pic2

Enough theory

Thats where the magic happens:

1  location /{
2     if ($request_method != GET)
3     {
4       rewrite . @fallback last;
5     }
6     default_type    text/html;
7     add_header      "Content" "text/html; charset=utf8";
8     charset         utf-8;
9     set             $memcached_key nginx_prefix$uri;
10    memcached_pass  profilememcache;
11    error_page      500 404 405 = @fallback;
12 }
13 location @fallback {
15    /* pass to FastCGI */
16 }

That snippet is a part of our Nginx configuration. On top of that snippet all static requests or requests to other subsystems got catched aleady. So we can be sure that requests which reach the location / are dynamic and belong to the main system which is our PHP framework.

The location @fallback is where the requests have to go if they want to reach the PHP.

On line 2 all requests which are not of method GET get catched, of course we don’t want to have application logic on the Nginx, so they get rewritten to be handled by the @fallback location and sent to PHP.

Lines 6-8 are defining the encoding and content type. Since we don’t store any HTTP headers on the Memcache, but only the HTML output itself, the Nginx needs to know about those things on its own.

On line 9 the variable $memcached_key gets set. This variable name is defined by the Nginx Memcache module, and the variables value will be the key which gets retrieved from the Memcache backend.

Finally, on line 10 Nginx does the request to the Memcache backend. If you now think “How does it know how to reach this profilememcache?”. Profilememcache is the name of an upstream which I defined somewhere above in the config and it includes the IP and port of the entry point of the Memcache cluster.

Another really nifty things happens on Line 11. The Parameter error_page defines what the Nginx has to do in case of a certain HTTP error. If the request on line 10 doesn’t succeed and gets a miss back from Memcache, the Nginx will raise a 404 error. Thanks to line 11, the 404 error will let the Nginx fall back to the location @fallback where the request gets sent to FastCGI, and now you know where the second location got its name from.

Does that work?

Yes, it works like hell. We saved our PHP framework cluster from handling hundreds of thousands of requests to which the Memcache already knew the answer. This significantly lowered the machine load, which made the cluster suck less power and saved the pandas.

Posted in Development 2

Waking up a server

by Thomas

For one of our web sites we programmed a server that provides us with information about the whole friendship network on this site: Who is friends with who, six degrees of separation information, etc. Back in 2007, we wrote that server in C and since we started using it, it was running with absolute reliability.

Suddenly it stopped working, without any prior warning. It did not crash, the process was still alive. It just did not response to any requests that we sent to it. All requests timed out. And even worse, there was no helpful log information that pointed us to any specific direction we could look for the error.

Check list

The good thing was, since we programmed the server, we already had several points to think about first. On top of a C programmer’s check list is the segmentation fault, that happens when trying to access a memory area that the process does not own. But we could almost exclude this point, since the server would have logged any segmentation fault it did. This point got crossed out.

This left a second point on our check list: Blocking. Our server creates worker threads that deal with each client’s requests. For responding to the requests, all threads are sharing the same information about the friendship network, or, in a mathematical term, the graph. Here we are using mutexes in order to keep the information consistent (by preventing to have more than one writing operation at a time or to have a read request while there was some data written).

Mutual exclusion? Dining philosophers? Race condition? Deadlock?

Continue reading →

Posted in Development 0

Profiling symfony with CouchDB

by Alvaro

During the last two sprints we run in the need to perform statistic analysis to profile a symfony application. As I explained here symfony logs to the file system quite a lot of useful information regarding the request that is processing. We wanted to be able to easily parse those logs and then perform queries to filter data. The data was going to be collected form our productions servers, which means that whatever tool we choose must not impact the performance of the website. We knew that symfony logs to the filesystem which was not an option for our production servers.
Our first attempt was to research Facebook Hive and Facebook Scribe, but we declined the idea. We then thought that we could try to build our own tool, probably writing some daemon in Erlang, but something appeared in our way…

During the research regarding map reduce Wikipedia led me to the mighty CouchDB, which in their words is:

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.

I must say this: after seeing some examples about it’s functionality my mind blew off. It was awesome, awesome like in CouchDB :-)

Then the idea popped in my mind:
Why not to build a symfony logger that talks to CouchDB and create some CouchDB views that will produce the statistics? I was at home and It was late at night, but my inner geek told me: let’s give it a try.

Continue reading →

Posted in Development 3

Propel Query Optimization Tips

by Alvaro

Below I will like to share some snippets for optimizing Propel queries.

Replace MyTablePeer::retrieveByPk() with a custom query.

If in our code we have something like:

if(MyTablePeer::retrieveByPk($id))
{
  //do something here
}

Where we can see that the result from the retrieveByPk() call is used as a boolean, it is better to replace it with a custom query like this:

public static function recordExists($id)
{
  $c = new Criteria();
  $c->add(self::ID, $id);
  return self::doCount($c) > 0;
}

This method will be added to the MyTablePeer class. Asuming that self::ID is the primary key of the table this will return true if there is a record entry with that id on the table -you should adapt the Criteria to your needs.

Continue reading →

Posted in Development 4

IE6 The Best Javascript Debugger Ever. Period!

by Alvaro

How you dare to say that? I hear you saying.. Wait wait wait… Put down those torches… I may have a point here.

At the office I spent the last two days hunting a javascript bug on IE6, imagine how painful that can be, with no Firebug to come to rescue.

It happens that we are using the symfony sfCombine plugin to improve the performance of the website. Everything was going OK until someone dare to test the website with IE6. Imagine the picture: layout broken, javascript errors popping all around, mayhem, etc.

The bug was in a image gallery made with a Prototype based carousel library. After going to the line with the javascript error I saw that some extensions that the library performs on Prototype using Element.addMethods() didn’t worked at all.

Then the hunt started, as almost every time, in the wrong direction. -Maybe a lost semicolon that breaks the combinations. -No no, I think that the nginx is not sending the correct mime types. Etc. You can imagine all the thoughts going here and there.

I spent hours in front of IE6, hitting refresh, cleaning the browser cache, adding breakpoints in the Microsoft (R) Script(R) Editor(R), inspecting the objects in memory, evaluating Javascript code, etc.

No results. Nothing. Just null is null or not an object.

Continue reading →

Posted in Development 3

Our Philosophy

Failure defeats losers, failure inspires winners.

TheNetcircle In Pictures

Open SpaceOpen SpaceFoodWhy Scrum Master-ing? (Xu Yi)Introduction (Samuel Pierquin)Open space

Latest Tweets

Join Our Team

Categories