Author Archives: Mauro

New possibilities offered by smart deployment scripts

by Mauro

In the past year we were using several methods to deploy our PHP code to the production machines. Now finally we have a deployment method which seems to be very stable and gives us many advantages.

Deployment methods that we’ve tried

The main problems during the deployments have always been the filesystem-related caches. We have two of those:

  • The higher level filesystem cache gets generated by our PHP framework. Its gathering a lot of configs from the whole filesystem, compiles them into PHP files and then stores them in some dedicated places.
  • On the lower level there is APC, a PHP module which does opcode caching. For performance reasons we’ve set the “file_stat” option to 0. This means that APC, once it cached a compiled PHP file, will ignore every change that is done to it.

When we first built the deployment scripts for the new environment we often had the problem that we didn’t know how to clear the filesystem cache safely.

Let’s assume we first clear the filesystem caches, then deploy the PHP code. During the time it takes to deploy PHP the filesystem caches already start getting regenerated which means that we end up with an inconsistent cache because it is partly generated by the old code and partly generated by the new one.

deployment1

On the other hand, let’s assume we first deploy code and then clear the filesystem caches. This would mean that after the deployment is finished we do have a clean site with consistent caches, but during the time of the deployment nobody knows what exactly is going to happen because there is already new code running with the old caches.

deployment2

We also do not want to rely on APC and just assume that it really caches every single file. If we could rely on the fact that it cached every single file, we could simply deploy the new code and then clear APC. But if we deploy twice in a short time span we can’t be sure that each of the PHP nodes already cached each of the PHP files.

Furthermore, none of the above described deployment methods allows us to do cache warming before we throw the code into the wild. On nightly deployments that’s no problem, unfortunately we often ran into the situation that we had to fix critical bugs and deploy during daytime.

A first step into the right direction

After going through many problems due to the above described issues, we decided that we can’t deploy on a machine while it is in production use. Fortunately we have quite a lot of machines in our PHP cluster, which allowed us to split them half-half. For this i need to say that we run Nginx in front of the environment, which then uses FastCGI to connect back to the PHP servers. In this setup it didn’t take much to make the Nginx temporarily only use the first half of the backend servers and let the deployment script deploy to the second half. Then swap the Nginx to use the second half for prod and make the deployment deploy to the first half. That way we could deploy cleanly and the above described problems were solved.

deployment3

The only problem with this solution was that we have many caches to clean and rsyncing the new code can take up to a minute, so altogether we divided the power of the cluster by two for around 5 minutes. Additionally that half which received the new code first, had to run the whole site in the second phase and regenerate its caches at the same time. During peak times we couldn’t afford that loss of computing power and we knew we had to find a better solution.

Final solution

I think we can say that we now, finally, found a solution which doesn’t have any disadvantages to the above described ones and it also solved all the described problems.

The trick is that we are versioning the directories on the PHP servers where our code is stored in. When the Nginx does the FastCGI request back to the PHP cluster, it always passes the absolute path of the PHP file it wants to have processed as part of the FastCGI header. On the Nginx it’s very simple to change the prefix for all those PHP files and then send all FastCGI requests simply to another version of the code on the backend servers while all the different versions of code on the PHP cluster can coexist in different directories. This solves the problem of the file-caching APC, because the absolute path of the same file in two different versions of code is different, because they reside in different document roots. The problem of the framework caches is also solved, because all the framework cache directories are specific to each version of the code and also reside together with the code in the versioned directory.

deployment4

I show an example snippet of our Nginx config to try to make this whole thing a little clearer:

  set $code_version 13;

  location ~* ((.*).php(.*)) {
    fastcgi_intercept_errors on;
    error_page 404 = @404; 

    set $script $uri;
    set $path_info "";

    if ($uri ~ "^(.+\.php)(/.+)") {
      set $script $1;
      set $path_info $2;
    }

    include /etc/nginx/fastcgi_params;
    fastcgi_param  SCRIPT_FILENAME  /srv/www/vhosts/$code_version.code/web$script;
    fastcgi_param SCRIPT_NAME $script;
    fastcgi_param REQUEST_URI $uri;

    fastcgi_pass   backend.cgi;
  }

Now the deployment script will simply deploy the new code into a new directory on all the PHP servers and then we request certain PHP files on each of the backend nodes to warm the cache. As final step it will change the line “set $code_version” to the new version and tell the Nginx to reload its config, without any user interruption and without crazy high load due to cache regeneration.

New possibilities

Since we now have multiple versions of PHP code on each of the backend nodes we can switch between them in seconds. Simply by editing the version number which is saved in the nginx config via a script, and then tell the Nginx to reload the conf. This allows us to, just in case we deploy something that causes problems, roll back without any bigger service interruption.

The coolest new possibility that we have now is that we can compare the efficiency of different code versions life in the production systems. We deploy two versions of the code into each of the backend blades into different directories without making the Nginx use the new versions. Lets say one version is 51 and the other one is 52. then we simply create a filesystem symlink with the name 53, on half of the nodes it points to 51 and on the other half it points to 52. Then we make the Nginx use the version 53, which then means that half of the nodes will run one version and the other half runs the second version. Once we want to switch everything to one of those two versions we simply make the Nginx switch to use version 52 or 51.

First half node:

rsid-a-20:/srv/www/vhosts # ls -lha
total 19M
drwxr-xr-x 71 user users 4.0K Feb 10 11:55 .
drwxr-xr-x  7 root   root  4.0K Dec  7 04:40 ..
drwxr-xr-x 14 wwwrun www   4.0K Feb  4 09:31 51.code
drwxr-xr-x 14 wwwrun www   4.0K Feb  4 09:31 52.code
lrwxrwxrwx  1 root   root    28 Feb  4 09:33 53.code -> /srv/www/vhosts/51.code

Second half node:

rsid-a-20:/srv/www/vhosts # ls -lha
total 19M
drwxr-xr-x 71 user users 4.0K Feb 10 11:55 .
drwxr-xr-x  7 root   root  4.0K Dec  7 04:40 ..
drwxr-xr-x 14 wwwrun www   4.0K Feb  4 09:31 51.code
drwxr-xr-x 14 wwwrun www   4.0K Feb  4 09:31 52.code
lrwxrwxrwx  1 root   root    28 Feb  4 09:33 53.code -> /srv/www/vhosts/52.code

That way we can compare multiple different versions of code while they are running on different backend nodes on prod and live monitor if one of them has some efficiency/load problems.

Posted in General 2

Saving pandas with nginx/memcache

by Mauro

During the time when we were building the server environment for a new version of one of our community pages, we tested many different server applications and architectures. One thing which brought us a big speed and efficiency improvement was the combination of Nginx, Memcached and PHP-FPM which we are using and which I’m going to line out here.

First I’ll need to introduce the three main components a little.

  • Nginx is an extremely efficient high performance webserver. Actually its not only a webserver, but I call it that since its other functionalities are crap (subjective opinion). For serving HTTP it provides many things like very advanced URL rewriting and content filtering that includes subrequests to HTTP, FastCGI, Memcache and other backends.
    nginx-logo
  • Memcached is a popular and well known small, fast and simple key-value cache. Memcached
  • FPM is a patch for PHP and stands for FastCGI Process Manager. As the name says it manages processes to serve FastCGI. We first tried to use the spawn-fcgi process manager that comes with the Lighttpd package, but FPM proved to be more reliable and it has some nice features which spawn-fcgi doesn’t provide. PHP-FPM

In our setup we use Nginx as webserver in front of the environment and first entry point for user requests. For each request to non-static files it creates a FastCGI request and forwards it back to the right PHP Cluster.

Caching makes more sense on static content, so we make it static

On our site we serve a community in which each user has a profile that can be seen by other users. Those profiles are probably the most accessed part of the page, so it makes sense to think twice about how to cache them. Big parts of the profiles are more or less static, the users usually don’t change their profile data every day. So first we had to extract all the more dynamic parts, like the guestbook or the users recent forum posts from the profile by loading them via ajax. Now that we load those things on a separate request, we can start treating the profiles as almost static pages.

First steps with the new environment

In the first few months when we started serving the new version to users as a beta, all the requests to profiles arrived on the PHP cluster as FastCGI requests. To serve a request, PHP loaded the whole framework, checked if the profile was already on Memcache, and if not it generated the profile and stored it on the Memcache cluster. If the same profile got requested again, the FastCGI request went back to the PHP again and the PHP loaded the whole framework again only to check the Memcache.

blog nginx-memcache pic1

Improvement

If a profile got stored on the Memcache by PHP already, it doesn’t make much sense to load the whole, in our case really heavy, PHP framework again just to decide that it doesn’t need to regenerate the profile but read it from the cache instead. Luckily Nginx provides a really nice functionality to do subrequests to Memcache directly.

Probably you can already more or less imagine what we did. And now in detail:

Everytime when a request arrives at the Nginx, it first checks if this is a request for a static file or another subsystem like for example the forum. If not, the Nginx itself checks on the Memcache cluster if a key exists which it generates out of the URI. If it gets a hit, it totally bypasses the PHP by serving the request directly from Memcache.

To implement this, of course we needed to make sure that nothing else might accidentally store a key into the Memcache that might get hit by the Nginx and make it serve nonsense. Maybe it sounds stupid to mention this because its obvious, but honestly it actually happened.

blog nginx-memcache pic2

Enough theory

Thats where the magic happens:

1  location /{
2     if ($request_method != GET)
3     {
4       rewrite . @fallback last;
5     }
6     default_type    text/html;
7     add_header      "Content" "text/html; charset=utf8";
8     charset         utf-8;
9     set             $memcached_key nginx_prefix$uri;
10    memcached_pass  profilememcache;
11    error_page      500 404 405 = @fallback;
12 }
13 location @fallback {
15    /* pass to FastCGI */
16 }

That snippet is a part of our Nginx configuration. On top of that snippet all static requests or requests to other subsystems got catched aleady. So we can be sure that requests which reach the location / are dynamic and belong to the main system which is our PHP framework.

The location @fallback is where the requests have to go if they want to reach the PHP.

On line 2 all requests which are not of method GET get catched, of course we don’t want to have application logic on the Nginx, so they get rewritten to be handled by the @fallback location and sent to PHP.

Lines 6-8 are defining the encoding and content type. Since we don’t store any HTTP headers on the Memcache, but only the HTML output itself, the Nginx needs to know about those things on its own.

On line 9 the variable $memcached_key gets set. This variable name is defined by the Nginx Memcache module, and the variables value will be the key which gets retrieved from the Memcache backend.

Finally, on line 10 Nginx does the request to the Memcache backend. If you now think “How does it know how to reach this profilememcache?”. Profilememcache is the name of an upstream which I defined somewhere above in the config and it includes the IP and port of the entry point of the Memcache cluster.

Another really nifty things happens on Line 11. The Parameter error_page defines what the Nginx has to do in case of a certain HTTP error. If the request on line 10 doesn’t succeed and gets a miss back from Memcache, the Nginx will raise a 404 error. Thanks to line 11, the 404 error will let the Nginx fall back to the location @fallback where the request gets sent to FastCGI, and now you know where the second location got its name from.

Does that work?

Yes, it works like hell. We saved our PHP framework cluster from handling hundreds of thousands of requests to which the Memcache already knew the answer. This significantly lowered the machine load, which made the cluster suck less power and saved the pandas.

Posted in Development 2

Our Philosophy

Success is not final, failure is not fatal: it is the courage to continue that counts.

TheNetcircle In Pictures

Open SpaceOpen SpaceFoodWhy Scrum Master-ing? (Xu Yi)Introduction (Samuel Pierquin)Open space

Latest Tweets

Join Our Team

Categories