Facebook sorry something Went Wrong Error
By
Dany Firman Saputra
—
Tuesday, February 11, 2020
—
What's Wrong With Facebook
Facebook Sorry Something Went Wrong Error
The crucial problem that triggered this failure to be so severe was an unfavorable handling of an error condition. An automated system for confirming configuration values ended up triggering far more damage than it taken care of.
The intent of the automated system is to look for configuration worths that are void in the cache as well as replace them with upgraded values from the relentless store. This functions well for a short-term trouble with the cache, but it does not work when the consistent store is invalid.
Today we made a modification to the relentless copy of a configuration worth that was interpreted as void. This suggested that every single customer saw the invalid worth and also tried to repair it. Due to the fact that the repair entails making a question to a cluster of data sources, that collection was rapidly bewildered by hundreds of hundreds of inquiries a second.
To make issues worse, every single time a customer obtained a mistake trying to query one of the databases it translated it as a void value, and removed the matching cache trick. This indicated that also after the initial problem had actually been repaired, the stream of inquiries proceeded. As long as the data sources fell short to service several of the demands, they were triggering much more demands to themselves. We had actually gotten in a comments loop that didn't allow the data sources to recuperate.
The means to stop the feedback cycle was rather painful - we had to stop all website traffic to this data source cluster, which indicated shutting off the website. Once the databases had actually recovered as well as the root cause had actually been fixed, we gradually allowed even more people back onto the site.
This obtained the website back up as well as running today, and also for now we have actually turned off the system that attempts to deal with arrangement values. We're exploring brand-new layouts for this configuration system adhering to design patterns of other systems at Facebook that deal even more beautifully with responses loops and short-term spikes.
We say sorry once more for the website interruption, as well as we want you to understand that we take the performance as well as reliability of Facebook very seriously.