Tonight at 8 PM Eastern (1 AM UTC on Wednesday, November 16th) we will be performing emergency maintenance on our Elasticsearch cluster. This will cause performance to degrade once more as we work to fully restore our services. Once maintenance is complete, we will update the status blog. In the mean time, we continue to work on the stability of our On Demand services. Here's what we've identified so far, and what we've resolved. This is not a full post-mortem. Rather, it is a summary of the current state of the service as we work to a full resolution.
Last week we've had uncharacteristic network congestion, which led to slow performance across the board and particularly with Kiln changeset processing. Sunday evening, we experienced a series of power events that caused cascading hardware failure and took our On Demand services offline. We restored service Sunday evening, and continued to work with our vendors to fully resolve our hardware issue.
On Monday morning, we identified some configurations that were likely culprits for our network congestion issue. Updating those configurations had an immediate and lasting impact on our network, eliminating the congestion.
Monday afternoon, we had another power event while we were continuing work to fully resolve Sunday's event. This took us down again while we replaced the affected hardware and fully resolved our hardware issue. In addition, a reboot of our Redis server resolved a lingering side effect from the power issue on Sunday.
Today, we see normal network traffic levels and no indication of further hardware issues. That being said, we're still seeing slowness and intermittent errors in our On Demand service. One of our Elasticsearch nodes is working extra hard, and continued monitoring seems to indicate that it is not related to how FogBugz is indexing data. While performance has been improving through the day, the path to full resolution requires work on the Elasticsearch cluster which will further reduce performance. That is why we're announcing this emergency maintenance window for our On Demand services.
If you have any further questions, please contact us.