Whats Wrong with Facebook

Whats Wrong With Facebook - Early today Facebook was down or unreachable for a number of you for roughly 2.5 hrs. This is the worst failure we have actually had in over four years, and also we wished to firstly apologize for it. We also wished to provide much more technological detail on what occurred as well as share one large lesson found out.

What's Wrong With Facebook

Whats Wrong With Facebook


The crucial flaw that caused this failure to be so extreme was an unfavorable handling of an error problem. An automated system for confirming setup values wound up causing much more damage than it taken care of.

The intent of the automatic system is to check for setup worths that are invalid in the cache as well as replace them with upgraded values from the persistent store. This functions well for a transient problem with the cache, but it does not work when the persistent store is void.

Today we made a change to the relentless copy of a setup value that was interpreted as invalid. This indicated that every client saw the void value as well as tried to fix it. Due to the fact that the fix involves making a question to a cluster of data sources, that cluster was swiftly bewildered by hundreds of countless queries a 2nd.

To make matters worse, each time a customer got an error trying to query among the data sources it translated it as a void worth, and also deleted the corresponding cache trick. This implied that even after the original issue had actually been fixed, the stream of inquiries continued. As long as the databases stopped working to service some of the demands, they were creating much more requests to themselves. We had gotten in a feedback loop that didn't permit the data sources to recover.

The way to quit the feedback cycle was quite painful - we needed to quit all web traffic to this database collection, which implied turning off the site. As soon as the databases had actually recuperated as well as the source had actually been fixed, we slowly permitted even more people back onto the site.

This obtained the site back up and running today, and also in the meantime we've switched off the system that attempts to correct setup worths. We're discovering brand-new designs for this setup system adhering to layout patterns of various other systems at Facebook that deal even more gracefully with comments loops and also transient spikes.

We say sorry once more for the site outage, as well as we want you to understand that we take the performance and also dependability of Facebook very seriously.