Is something Wrong with Facebook Right now
By
pusahma2008
—
Thursday, December 12, 2019
—
What's Wrong With Facebook
Is Something Wrong With Facebook Right Now
The essential flaw that triggered this failure to be so severe was an unfortunate handling of an error condition. A computerized system for verifying arrangement values wound up creating much more damages than it repaired.
The intent of the computerized system is to look for setup worths that are void in the cache and also replace them with upgraded worths from the persistent shop. This functions well for a short-term problem with the cache, yet it doesn't work when the relentless shop is invalid.
Today we made a modification to the persistent copy of an arrangement worth that was interpreted as invalid. This indicated that every customer saw the invalid worth as well as attempted to fix it. Since the repair includes making an inquiry to a collection of data sources, that collection was quickly bewildered by numerous thousands of queries a second.
To make issues worse, every single time a customer obtained a mistake attempting to query among the data sources it translated it as an invalid worth, and also removed the corresponding cache key. This suggested that even after the original issue had been fixed, the stream of inquiries continued. As long as the databases failed to service several of the requests, they were creating a lot more demands to themselves. We had actually gone into a responses loop that really did not allow the databases to recover.
The method to quit the feedback cycle was quite painful - we needed to stop all traffic to this data source cluster, which meant shutting off the website. Once the databases had recouped as well as the root cause had actually been fixed, we slowly allowed even more people back onto the site.
This got the website back up and running today, as well as for now we've shut off the system that attempts to correct setup worths. We're discovering new styles for this configuration system following style patterns of other systems at Facebook that deal even more beautifully with comments loops and transient spikes.
We apologize once more for the website interruption, as well as we desire you to recognize that we take the performance and reliability of Facebook very seriously.