Profiling symfony with CouchDB

by Alvaro

During the last two sprints we run in the need to perform statistic analysis to profile a symfony application. As I explained here symfony logs to the file system quite a lot of useful information regarding the request that is processing. We wanted to be able to easily parse those logs and then perform queries to filter data. The data was going to be collected form our productions servers, which means that whatever tool we choose must not impact the performance of the website. We knew that symfony logs to the filesystem which was not an option for our production servers.
Our first attempt was to research Facebook Hive and Facebook Scribe, but we declined the idea. We then thought that we could try to build our own tool, probably writing some daemon in Erlang, but something appeared in our way…

During the research regarding map reduce Wikipedia led me to the mighty CouchDB, which in their words is:

Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.

I must say this: after seeing some examples about it’s functionality my mind blew off. It was awesome, awesome like in CouchDB :-)

Then the idea popped in my mind:
Why not to build a symfony logger that talks to CouchDB and create some CouchDB views that will produce the statistics? I was at home and It was late at night, but my inner geek told me: let’s give it a try.

After I got CouchDB installed in my mac I fired up a symfony project and started coding. After a while I had a basic logger pretty much similar to the example I have here but instead of logging to the filesystem and latter parsing with Hive, this time I was inserting the logs directly in a CouchDB database. The next step was to launch an apache benchmark against the project to add a bunch of logs to the database. In less than 5 minutes I had more than 15.000 documents. I created some basic
views and then I was able to see which module action in my application was using more memory, or doing more queries to the database, etc.

The next day I presented the idea to the team which took it with excitement. In a few hours I don’t know how many times we used the word CouchDB, but it was everywhere.

We took a brainstorming doses where we defined which information we wanted to log. We pair programmed with Boby to build a custom symfony logger to capture the following
information after every request:

  • Memory Usage
  • Number of Queries
  • Request Time
  • Memcached usage i.e.: when the partial was found in the cache and when was generated.
  • Table Column usage. Because we use Propel, and Propel by default performs a SELECT * FROM table; our goal was to know from the retrieved fields, which one of them where actually used in the request.

After the tool was finished, we enabled symfony logging and debug mode on the production environment and we set the log level to debug. Then we made some tweaks to allow the symfony file logger to just log errors to the filesystem and nothing else. A symfony filter at the end of each request took the responsibility of sending the aggregated log data to CouchDb. We used the php fsockopen to send JSON encoded data with a timeout of 1 second. (We didn’t wanted that our users had to wait for the page to be delivered because there was a socket going wild somewhere).

We deployed our tool to production and then we seated and waited… In fact we waited seven days -dear product owner, we did other users stories in the meantime ;-)

After one week it collected more than 15.000.000 logs in a 2GB database. We were quite impressed. While we have 11 php servers in production serving those 15M requests the CouchDB server was alone, handling the incoming logs from all of them with no complains.

We thought that 15M logs were enough, so we disabled the logger. The next step was to create views to analyze the data.

While it took near two hours for CouchDB to index the data for our views – keep in mind that we never indexed our data while it was being inserted-, the successive queries seemed instantaneous.

With the data rolling in front of our eyes we spotted the bottlenecks in our website. We saw which actions were taking more memory, performing more queries, etc. This gave us real world usage of our application to focus our performance improvements.

Conclusion
So far CouchDB has been really helpful for our project. It was easy to install and deploy and because uses Javascript as a query language, it took no time to pull something useful from it. Also the JSON format that uses for it’s documents and the HTTP API positions CouchDB as a universal database: If your language speaks JSON then it will make good friends with CouchDB. Regarding the logging tool we have plans to release it as a symfony plugin for others to benefit from it.

This entry was posted in Development. Bookmark the permalink.

3 Responses to Profiling symfony with CouchDB

  1. Rehan says:

    Have you release the logging tool for couchDB for Symfony yet? I am very excited about this!

  2. Torrent says:

    С большего, автор удачно опубликовал.

  3. BiosWoolf says:

    Думаю, недурная работа

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Our Philosophy

Commitment. Flexibility. Efficiency.

TheNetcircle In Pictures

Open SpaceOpen SpaceFoodWhy Scrum Master-ing? (Xu Yi)Introduction (Samuel Pierquin)Open space

Latest Tweets

Join Our Team

Categories