Posts Tagged scaling

Web Scale

Development at Streamy is always done with the mindset of “will it scale” in the back of our minds.

Generally speaking, scalability deals with the ability for a software system to handle increasing load when given additional resources.  Increased load could mean more concurrency, a larger data set, or increased complexity.  Additional resources refers to hardware and either scaling vertically (upgrading a single node) or scaling horizontally (adding additional nodes).  While vertical scale is important in terms of squeezing all the performance you can out of each node, and rapidly dropping hardware prices means even cheap nodes are powerful, the key to achieving true scalability is the ability to horizontally scale, or distribute.

Distributed systems are becoming more and more mainstream as the web has flourished.  Search engines index billions of web pages and social networks support millions of concurrent users.  Content continues to evolve from being static, to dynamic, to the current emphasis on personalization and customization.  The adoption of AJAX and COMET have further increased requirements for concurrency.  And all of this must be highly-available and low-latency (sub-second).

This is what I call Web Scale.

So what exactly is it?

It’s a new set of technical requirements borne out of Web 2.0, and the move towards distributed systems and cloud computing as the solution.  It encompasses all the different aspects of today’s web applications: the data, the storage, the caches, the web servers, the communications, the realtime queries, the batch queries, and everything in between.  It marks a departure from the vertical scaling of relational databases and web servers as the solution to scale towards a new world of horizontal scalability: distributed hash tables, consistent hashing, column-orientation, horizontal partitioning, eventual consistency, elastic computing, and every other buzzword you can think of.

Who has solved it?

Google. They deserve a great deal of credit for being the first to really achieve web scale.  Long before web sites really considered their architecture as part of their competitive advantage, Google embraced the notion and invented their own solutions.  They subsequently published a number of papers describing their efforts:  The Google File System, MapReduce, and BigTable.

Amazon.  The enormous amount of work that was done in order to achieve scalability for the world’s premier e-commerce site is obvious; one need only look at their extensive and fast-growing elastic storage and computing services like S3 and EC2.  An e-commerce site becoming a service provider?  There’s only so much cost-savings a good architecture can give an e-commerce site, so it only makes sense that they try to profit from their proprietary systems.  The CTO of Amazon, Werner Vogels, has an excellent blog AllThingsDistributed.  Read through some of his posts and you can quickly see that Amazon is a company that understands scaling, distribution, and the right way to go about design and engineering in today’s Web Scale world.

Who struggles with it?

Facebook.   Today’s prime example for what happens when you don’t build for scale early on: the only short-term solution to rapid growth is to throw money at the problem.  Utilizing both horizontal and vertical scale, Facebook has enormous clusters of MySQL and Memcached to deal with storing user data and serving user queries.  As reported nearly a year ago, they already had close to 2,000 MySQL boxes and 1,000 Memcached boxes in addition to their 10,000 web servers.

They have been playing catch-up ever since, slowly developing (and open-sourcing in some cases) distributed systems to deal with their enormous scale.  Though almost always plagued by the lack of community and direct support from Facebook engineers, they have some very interesting projects including Cassandra and Hive.  More recently, they seem to finally have solved their photo storage cost issues with Haystack, saving them from needing to buy an additional $2M+ server every month just to keep up.  The difference in architectures is well described in Niall Kennedy’s post.

Twitter.  Considering how well known the fail whale is, it’s clear to most that Twitter has been plagued by slow responses and downtime, even to this day.  Though extremely simple in its requirements, the personalized views, emphasis on search, and massive use of the API by developers creates huge amounts of load for the microblogging (or is it nanoblogging?) service.

FriendFeed.  A company led by ex-Googlers, and to my knowledge without major technical issues, seems to be going down a path of scaling that seems clunky and backwards.  Generally speaking, scaling a database means letting go of some of the traditional restrictions, first things like normalization and secondary indexes, and then more significantly by relaxing ACID-compliance or adding eventual consistency.  As outlined in a blog post by one of their founders, Bret Taylor, their attempt at a schema-less storage system atop MySQL seems to be a good idea and good effort gone wrong.

When what you’re after is schema-less storage, and the need for partitioning/distribution, why would you base it on a system completely tied to schemas and full-blown transactions?  You’re bringing with you all the things you don’t want, at the expense of performance and flexibility, because they “trust” MySQL and are already familiar with it.  Good reasons, no doubt, but the whole thing appears misguided.  In any case, I bet that it works and performance is acceptable.  But it’s not always about finding a solution that works.  Flexibility, simplicity, and of course, additional scalability, are also important and something so confusing to do something so simple just ask for you to not want to touch it once it works :)

Note:  This is not to say that these companies are “doing it wrong”.  I point out these examples because they are cases of costs gone out of control, continued performance and uptime issues, approaches I personally would not recommend, etc.  MySQL + Memcached is certainly part of the Web Scale tool box and in a great deal of use cases is satisfactory.  For more information on relational databases and how they compare to something like HBase, check out my presentation on Hadoop and HBase vs RDBMS.

So, how does Streamy solve it?

Stay tuned!  This will be the topic of an upcoming series of posts over the next few weeks.

, , , , , , , , , , , , ,

6 Comments