Using Riak at The NetCircle

by Joseph

At The NetCircle, we build great software for our clients’ communities and building great communities means that users want to talk to each other, a lot. For several of our clients’ communities, we have started outgrowing the ability of our current messaging systems to handle new data easily. Database sharding, adding new machines and changing data structures have helped get our system to where it is today, but maintaining this system comes at a cost (several system admins have gone nearly bald!).

At The NetCircle, we are always on the lookout for new technologies that could help improve our clients’ communities by expanding the user base or improving the user experience. In the case of our messaging system, not only are we looking at a way to help it grow in the future (stably!), but also create a central system that all our clients’ communities can use as a service rather than having different systems and data storage for each community.


Enter The Riak

A while back Riak, a Dynamo-inspired, distributed key-value store built in Erlang popped up on our radar. After evaluating several other possible data stores, we settled on Riak as the cornerstone of our new message system service.

Why Riak?

As sharding and scaling our current message system datastores became more and more difficult, we looked for a solution that could help take some of the pain out of scaling for the future, as well as providing durable storage without having to worry too much about hardware failures and the like. Pumping the “NoSQL” goodness through Riak’s veins is Riak core. At the heart of Riak is the “ring”, a 160-bit integer space that partitions data over virtual partitions, which in turn are spread over physical nodes in a Riak cluster. This data distribution leads to master-less nodes and improves a systems ability to failover without affecting the end-user.

Riak is highly tunable and in the Riak documentation, much attention is paid to the CAP Theorem, originally postured by Dr. Eric Brewer. This theorem states that it is impossible for a system to provide simultaneously consistency, availability and partition tolerance. In Riak, all of these system parameters can be tuned, most of them on the fly. Need higher levels of consistency? Change the consistency parameters to ensure that the data is consistent across more nodes before returning a successful write. Need performance? Turn down the consistency values to get quicker responses.

All of these come with different trade-offs, but the important point is that it can be tailored for different systems and needs.

Riak is also content-type agnostic, meaning we can store anything in Riak. For our system, we want to store structured message data (in our case JSON). Riak has built-in support for handling JSON and Javascript, which is perfect. Not only can we store massive amounts of structured JSON data for our message system, we can also run map/reduce jobs for batch-processing tasks without having to create handlers for processing JSON data.

When we first began evaluating Riak for our message system, we needed to keep relationships between several different objects. In order to do this, we used Links (which worked well). However, we couldn’t get the performance we wanted for our message system using only links. And so began an odyssey to create a system of secondary index for Riak. As we were nearing completion on the prototype for secondary indexes, a killer feature for us was released, Riak Search.

Riak Search

Riak search is a full-text search engine for Riak that uses Lucene syntax and can be queried from any of the APIs. Riak search works by allowing users to define different indexing schemas by bucket. Riak Search uses the pre-commit hook functionality to specify a module or function that parses the data being inserted, and returns index-able key/values based on the schema and a specified analyzer. Our JSON formatted data fits nicely into Riak Search, because it can analyze and index JSON data out of the box.

Another important feature of Riak Search indexing is that the type of indexing analysis can be specified at a field level. So, one field could be indexed as an integer, another as text based on whitespace analysis, tokens or any of a myriad of different analyzers. And once again, these analyzers can be customized to fit the needs of the user so you aren’t restricted to just the provided analyzers.

Now, we have our distributed, fault-tolerant data store, our JSON formatted message documents and a way to index and search them quickly. With Riak Search now available and getting more stable by the day, we are finally gearing up to start rolling out our new Riak Search-backed message system to our clients’ communities.

In Conclusion

With Riak Search, we have found a way to be able to quickly grow our messaging system as the needs of our clients’ communities grow, as well as providing a standardized way to store and retrieve message across all of those communities. We have cut out some clutter and inconsistencies in our clients’ communities code as well as reducing the cost for maintaining and growing our system.

On a side note, the Riak community as well as Basho have been amazing in providing support and ideas to help get our system to where it is. Basho is constantly improving Riak, bringing better performance and great new features with each update. We are excited to put our new system into production, and are looking forward to continued success using Riak for our clients’ communities.

Coming up:

Using Redis as a secondary index with Riak. Stay tuned!

Posted in Development and tagged , , , . Bookmark the permalink.

2 Responses to Using Riak at The NetCircle

  1. Scott Likens says:

    Going to look at 1.0 with LevelDB to see if the secondary index’s work better for you?

  2. Joseph says:

    Absolutely. We are currently evaluating the 1.0 release candidates, but it would be some time before it would make it into our production environment.

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Join Our Team in Shanghai

Now hiring PHP Developer in Shanghai