<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Cookies are for Closers: Oren Hurvitz's Blog &#187; Scalability</title>
	<atom:link href="http://hurvitz.org/blog/category/scalability/feed" rel="self" type="application/rss+xml" />
	<link>http://hurvitz.org/blog</link>
	<description>Not a baking blog, but possibly half-baked</description>
	<lastBuildDate>Sat, 20 Feb 2010 13:14:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>The Yahoo Effect</title>
		<link>http://hurvitz.org/blog/2008/06/the-yahoo-effect</link>
		<comments>http://hurvitz.org/blog/2008/06/the-yahoo-effect#comments</comments>
		<pubDate>Tue, 10 Jun 2008 22:20:50 +0000</pubDate>
		<dc:creator>Oren Hurvitz</dc:creator>
				<category><![CDATA[Scalability]]></category>

		<guid isPermaLink="false">http://hurvitz.org/blog/?p=27</guid>
		<description><![CDATA[Lukas Biewald and Chris Van Pelt of Dolores Labs wrote a fun application called FaceStat. This application lets its users evaluate each other based on their photos. Unlike its famous spiritual ancestor Hot or Not, in FaceStat each person can choose which criteria he or she wants to be evaluated on, e.g. &#8220;am I liberal [...]]]></description>
			<content:encoded><![CDATA[<p>Lukas Biewald and Chris Van Pelt of <a href="http://doloreslabs.com/">Dolores Labs</a> wrote a fun application called <a href="http://facestat.com/">FaceStat</a>. This application lets its users evaluate each other based on their photos. Unlike its famous spiritual ancestor <a href="http://www.hotornot.com/">Hot or Not</a>, in FaceStat each person can choose which criteria he or she wants to be evaluated on, e.g. &#8220;am I liberal or conservative&#8221;, &#8220;do I seem trustworthy&#8221;, etc.</p>
<p>Everything was sunshine and puppies until the day Yahoo decided to link to FaceStat from their front page, sending masses of new visitors to the site. The FaceStat server gave a small whimper, rolled on its back and played dead. Incensed Yahoos took the site&#8217;s downtime personally and resorted to stalking tactics: they found the email and phone number of the site&#8217;s registered owner (Chris Van Pelt), and left him angry emails and phone messages. It&#8217;s a tough racket, the web business.</p>
<p><a href="http://hurvitz.org/blog/wp-content/uploads/2008/06/tv_2412063362_bc2bb2a56c.jpg"><img class="size-medium wp-image-28 alignright" title="tv_2412063362_bc2bb2a56c" src="http://hurvitz.org/blog/wp-content/uploads/2008/06/tv_2412063362_bc2bb2a56c-300x225.jpg" alt="Defunct TV" width="300" height="225" /></a></p>
<p>After <a href="http://www.lukasbiewald.com/?p=153">some frantic work over the weekend</a> to add hardware and streamline the software, FaceStat was back online and able to handle the load. And what was that load? According to their amazing <a href="http://blog.doloreslabs.com/2008/06/facestat-scales/">Google Analytics chart</a>, they jumped from 10,000 pageviews per day to 800,000! That&#8217;s not a hockey stick, that&#8217;s a space elevator.</p>
<p>So what happened? They fell victim to one of the classic dangers of the web. The most famous is the <a href="http://en.wikipedia.org/wiki/Slashdot_effect">Slashdot Effect</a>, which happens when a website is linked to from <a href="http://www.slashdot.org/">Slashdot</a>. But only slightly less well-known (despite being more potent) is the Yahoo Effect. Although they managed to recover fairly quickly, they lost valuable visitors during the time that their site was still on the front page of Yahoo, but inaccessible.</p>
<p>Unfortunately, building an immunity to this kind of problem is usually not cost-effective. There are two options, and both of them have drawbacks.</p>
<p>First, you can buy enough hardware in advance to survive the Yahoo Effect. But if you never get that link from the front page of Yahoo then you will have wasted a lot of money.</p>
<p>Second, you can use Cloud Computing to enable your application to use additional servers when needed. In Cloud Computing, your application runs on a variable number of servers that are owned by someone else; you can add or remove servers at a moment&#8217;s notice. The poster boy for this kind of service is <a href="http://www.amazon.com/gp/browse.html?node=201590011">Amazon&#8217;s Elastic Compute Cloud (EC2)</a>. Since you can add resources almost instantly, your application can handle vastly increased loads when needed, and you pay only for the resources you actually require at any given moment. This is a very attractive proposition, and indeed a representative of cloud computing management company <a href="http://www.rightscale.com/">RightScale</a> was quick to leave a comment on Lukas Biewald&#8217;s blog suggesting their services (thus demonstrating that ambulance chasing isn&#8217;t just for lawyers anymore).</p>
<p>Although cloud computing is cost-effective from a hardware point of view, it has a different cost: you must design your application in advance to use these resources. This requires additional development time, and that&#8217;s also an up-front cost. Given the relative costs of programmers and hardware, it might be cheaper to buy additional servers than rearchitect the application.</p>
<p>So what&#8217;s an internet entrepreneur to do? If you&#8217;re starting a new application then definitely look into cloud computing to help your application withstand traffic spikes. Designing a new application to use cloud computing is easier than retrofitting it into an existing application. Another option is to use <a href="http://code.google.com/appengine/">Google App Engine</a>, which is Google&#8217;s entry in the scalable web applications space. But that requires a significant commitment to do things the Google Way &#8482;.</p>
<p>Or just do what most of us (including FaceStat) do: build your application as quickly as possible, and worry about the traffic when you get it. It&#8217;s the time-honored way: people won&#8217;t respect you unless you&#8217;ve got war stories about overcoming vast amounts of traffic with nothing but a screwdriver and a SCSI differential cable.</p>
<h4>Update &#8211; June 14, 2008</h4>
<p>Eran Hammer-Lahav spent two years <a href="http://www.hueniverse.com/hueniverse/2008/04/the-last-announ.html">building Nouncer</a>, a Twitter-like service, before deciding to shut down the project. One of his lessons from this experience is:</p>
<blockquote><p>Many people criticize the typical path Web 2.0 applications take in their development: putting together a poorly executed site, gauging the market, and only upon success building the service to actually scale and accommodate the market. However, the cost of building scalability ahead of time is extremely high, and for most startup is cost prohibitive.</p></blockquote>
<p>(Photo by <a href="http://www.flickr.com/photos/53317685@N00/">Robbt</a>)</p>
]]></content:encoded>
			<wfw:commentRss>http://hurvitz.org/blog/2008/06/the-yahoo-effect/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LinkedIn Architecture</title>
		<link>http://hurvitz.org/blog/2008/06/linkedin-architecture</link>
		<comments>http://hurvitz.org/blog/2008/06/linkedin-architecture#comments</comments>
		<pubDate>Wed, 04 Jun 2008 21:20:53 +0000</pubDate>
		<dc:creator>Oren Hurvitz</dc:creator>
				<category><![CDATA[Scalability]]></category>

		<guid isPermaLink="false">http://hurvitz.org/blog/?p=23</guid>
		<description><![CDATA[At JavaOne 2008, LinkedIn employees presented two sessions about the LinkedIn architecture. The slides are available online:

LinkedIn &#8211; A Professional Social Network Built with Java™ Technologies and Agile Practices
LinkedIn Communication Architecture

These slides are hosted at SlideShare. If you register then you can download them as PDF&#8217;s.
This post summarizes the key parts of the LinkedIn architecture. [...]]]></description>
			<content:encoded><![CDATA[<p>At <a href="http://hurvitz.org/blog/2008/05/javaone-2008">JavaOne 2008</a>, LinkedIn employees presented two sessions about the LinkedIn architecture. The slides are available online:</p>
<ul>
<li><a href="http://www.slideshare.net/linkedin/linkedins-communication-architecture">LinkedIn &#8211; A Professional Social Network Built with Java™ Technologies and Agile Practices</a></li>
<li><a href="http://www.slideshare.net/linkedin/linked-in-javaone-2008-tech-session-comm">LinkedIn Communication Architecture</a></li>
</ul>
<p>These slides are hosted at SlideShare. If you register then you can download them as PDF&#8217;s.</p>
<p>This post summarizes the key parts of the LinkedIn architecture. It&#8217;s based on the presentations above, and on additional comments made during the presentation at JavaOne.</p>
<h3>Site Statistics</h3>
<ul>
<li>22 million members</li>
<li>4+ million unique visitors/month</li>
<li>40 million page views/day</li>
<li>2 million searches/day</li>
<li>250K invitations sent/day</li>
<li>1 million answers posted</li>
<li>2 million email messages/day</li>
</ul>
<h3>Software</h3>
<ul>
<li>Solaris (running on Sun x86 platform and Sparc)</li>
<li>Tomcat and Jetty as application servers</li>
<li>Oracle and MySQL as DBs</li>
<li>No ORM (such as Hibernate); they use straight JDBC</li>
<li>ActiveMQ for JMS. (It&#8217;s partitioned by type of messages. Backed by MySQL.)</li>
<li>Lucene as a foundation for search</li>
<li>Spring as glue</li>
</ul>
<h2>Server Architecture</h2>
<h3>2003-2005</h3>
<ul>
<li>One monolithic web application</li>
<li>One database: the <strong>Core Database</strong></li>
<li>The network graph is cached in memory in <strong>The Cloud</strong></li>
<li>Members <strong>Search</strong> implemented using Lucene. It runs on the same server as The Cloud, because member searches must be filtered according to the searching user&#8217;s network, so it&#8217;s convenient to have Lucene on the same machine as The Cloud.</li>
<li>WebApp updates the Core Database directly. The Core Database updates The Cloud.</li>
</ul>
<h3>2006</h3>
<ul>
<li>Added <strong>Replica DB&#8217;s</strong>, to reduce the load on the Core Database. They contain read-only data. A <strong>RepDB</strong> server manages updates of the Replica DB&#8217;s.</li>
<li>Moved Search out of The Cloud and into its own server.</li>
<li>Changed the way updates are handled, by adding the <strong>Databus</strong>. This is a central component that distributes updates to any component that needs them. This is the new updates flow:
<ul>
<li>Changes originate in the WebApp</li>
<li>The WebApp updates the Core Database</li>
<li>The Core Database sends updates to the Databus</li>
<li>The Databus sends the updates to: the Replica DB&#8217;s, The Cloud, and Search</li>
</ul>
</li>
</ul>
<h3>2008</h3>
<ul>
<li>The WebApp doesn&#8217;t do everything itself anymore: they split parts of its business logic into <strong>Services</strong>.<br />
The WebApp still presents the GUI to the user, but now it calls Services to manipulate the Profile, Groups, etc.</li>
<li>Each Service has its own domain-specific database (i.e., vertical partitioning).</li>
<li>This architecture allows <em>other</em> applications (besides the main WebApp) to access LinkedIn. They&#8217;ve added applications for Recruiters, Ads, etc.</li>
</ul>
<h2>The Cloud</h2>
<ul>
<li>The Cloud is a server that caches the entire LinkedIn network graph in memory.</li>
<li>Network size: 22M nodes, 120M edges.</li>
<li>Requires <strong>12 GB RAM</strong>.</li>
<li>There are <strong>40 instances</strong> in production</li>
<li>Rebuilding an instance of The Cloud from disk takes <strong>8 hours</strong>.</li>
<li>The Cloud is updated in real-time using the Databus.</li>
<li>Persisted to disk on shutdown.</li>
<li>The cache is implemented in C++, accessed via JNI. They chose C++ instead of Java for two reasons:
<ul>
<li>To use as little RAM as possible.</li>
<li>Garbage Collection pauses were killing them. [LinkedIn said they were using advanced GC's, but GC's have improved since 2003; is this still a problem today?]</li>
</ul>
</li>
<li>Having to keep everything in RAM is a limitation, but as LinkedIn have pointed out, partitioning graphs is hard.</li>
<li>[Sun offers servers with up to 2 TB of RAM (<a href="http://www.sun.com/servers/highend/m9000/">Sun SPARC Enterprise M9000 Server</a>), so LinkedIn could support up to 1.1 billion users before they run out of memory. (This calculation is based only on the number of nodes, not edges). Price is another matter: Sun say only "contact us for price", which is ominous considering that the prices they <em>do</em> list go up to $30,000.]</li>
</ul>
<p>The Cloud caches the entire LinkedIn Network, but each user needs to see the network from his <em>own </em>point of view. It&#8217;s computationally expensive to calculate that, so they do it just once when a user session begins, and keep it cached. That takes up to 2 MB of RAM per user. This cached network is <strong>not updated </strong>during the session. (It <strong>is</strong> updated if the user himself adds/removes a link, but not if any of the user&#8217;s contacts make changes. LinkedIn says users won&#8217;t notice this.)</p>
<p>As an aside, they use <a href="http://ehcache.sourceforge.net/">Ehcache</a> to cache members&#8217; profiles. They cache up to 2 million profiles (out of 22 million members). They tried caching using LFU algorithm (Least Frequently Used), but found that Ehcache would sometimes block for 30 seconds while recalculating LFU, so they switched to LRU (Least Recently Used).</p>
<h2>Communication Architecture</h2>
<h3>Communication Service</h3>
<p>The Communication Service is responsible for <strong>permanent messages</strong>, e.g. InBox messages and emails.</p>
<ul>
<li>The entire system is asynchronous and uses JMS heavily</li>
<li>Clients post messages via JMS</li>
<li>Messages are then routed via a routing service to the appropriate mailbox or directly for email processing</li>
<li>Message delivery: either Pull (clients request their messages), or Push (e.g., sending emails)</li>
<li>They use Spring, with proprietary LinkedIn Spring extensions. Use HTTP-RPC.</li>
</ul>
<h4>Scaling Techniques</h4>
<ul>
<li>Functional partitioning: sent, received, archived, etc. [a.k.a. vertical partitioning]</li>
<li>Class partitioning: Member mailboxes, guest mailboxes, corporate mailboxes</li>
<li>Range partitioning: Member ID range; Email lexicographical range. [a.k.a. horizontal partitioning]</li>
<li>Everything is asynchronous</li>
</ul>
<h3>Network Updates Service</h3>
<p>The Network Updates Service is responsible for <strong>short-lived notifications</strong>, e.g. status updates from your contacts.</p>
<h4>Initial Architecture (up to 2007)</h4>
<ul>
<li>There are many services that can contain updates.</li>
<li>Clients make separate requests to each service that can have updates: Questions, Profile Updates, etc.</li>
<li>It took a long time to gather all the data.</li>
</ul>
<p>In 2008 they created the Network Updates Service. The implementation went through several iterations:</p>
<h4>Iteration 1</h4>
<ul>
<li>Client makes just one request, to the NetworkUpdateService.</li>
<li>NetworkUpdateService makes multiple requests to gather the data from all the services. These requests are made in parallel.</li>
<li>The results are aggregated and returned to the client together.</li>
<li>Pull-based architecture.</li>
<li>They rolled out this new system to everyone at LinkedIn, which caused problems while the system was stabilizing. In hindsight, should have tried it out on a small subset of users first.</li>
</ul>
<h4>Iteration 2</h4>
<ul>
<li>Push-based architecture: whenever events occur in the system, add them to the user&#8217;s &quot;mailbox&quot;. When a client asks for updates, return the data that&#8217;s already waiting in the mailbox.</li>
<li>Pros: reads are much quicker since the data is already available.</li>
<li>Cons: might waste effort on moving around update data that will never be read. Requires more storage space.</li>
<li>There is still post-processing of updates before returning them to the user. E.g.: collapse 10 updates from a user to 1.</li>
<li>The updates are stored in CLOB&#8217;s: 1 CLOB per update-type per user (for a total of 15 CLOB&#8217;s per user).</li>
<li>Incoming updates must be added to the CLOB. Use optimistic locking to avoid lock contention.</li>
<li>They had set the CLOB size to 8 kb, which was too large and led to a lot of wasted space.</li>
<li>Design note: instead of CLOB&#8217;s, LinkedIn could have created additional tables, one for each type of update. They said that they didn&#8217;t do this because of what they would have to do when updates expire: Had they created additional tables then they would have had to delete rows, and that&#8217;s very expensive.</li>
<li>They used JMX to monitor and change the configuration in real-time. This was very helpful.</li>
</ul>
<h4>Iteration 3</h4>
<ul>
<li>Goal: improve speed by reducing the number of CLOB updates, because CLOB updates are expensive.</li>
<li>Added an overflow buffer: a VARCHAR(4000) column where data is added initially. When this column is full, dump it to the CLOB. This eliminated 90% of CLOB updates.</li>
<li>Reduced the size of the updates.</li>
</ul>
<p>[LinkedIn have had success in moving from a Pull architecture to a Push architecture. However, don't discount Pull architectures. Amazon, for example, use a Pull architecture. In <a href="http://www.acmqueue.com/modules.php?name=Content&#038;pa=showpage&#038;pid=388">A Conversation with Werner Vogels</a>, Amazon's CTO, he said that when you visit the front page of Amazon they typically call more than 100 services in order to construct the page.]</p>
<p><br/><br />
The presentation ends with some tips about scaling. These are oldies but goodies:</p>
<ul>
<li>Can&#8217;t use just one database. Use many databases, partitioned horizontally and vertically.</li>
<li>Because of partitioning, forget about referential integrity or cross-domain JOINs.</li>
<li>Forget about 100% data integrity.</li>
<li>At large scale, cost is a problem: hardware, databases, licenses, storage, power.</li>
<li>Once you&#8217;re large, spammers and data-scrapers come a-knocking.</li>
<li>Cache!</li>
<li>Use asynchronous flows.</li>
<li>Reporting and analytics are challenging; consider them up-front when designing the system.</li>
<li>Expect the system to fail.</li>
<li>Don&#8217;t underestimate your growth trajectory.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://hurvitz.org/blog/2008/06/linkedin-architecture/feed</wfw:commentRss>
		<slash:comments>44</slash:comments>
		</item>
	</channel>
</rss>
