Thursday, May 2, 2013
We know that developers around the world depend on our APIs for their apps, sites and businesses every day. Unfortunately, we experienced an outage of the Google API serving infrastructure yesterday, April 30. This outage impacted most Google APIs, resulting in requests failing with a 500 error code. Additionally, users may have experienced missing features or capabilities from some Google services that rely on these APIs.
At 6:26 pm Pacific Time, we pushed a config change that inadvertently caused a widespread outage of our API infrastructure. Our normal rollback procedure failed, delaying the rollback until 7:22 pm, at which time APIs started to recover. The outage was completely resolved by 8:00 pm.
We are making several changes to help ensure this issue won’t happen again. We’ve identified some key improvements to our release and rollback process that we are implementing immediately. Reliability is a top priority at Google, and we are continuously making improvements to our systems. We apologize to everyone who was affected.
Louis Ryan is an engineer on the API platforms team in Mountain View. Louis is passionate about making APIs faster, more consistent, and reliable.
Posted by Scott Knaster, Editor
Posted at 5/02/2013 12:31:00 AM