The emergency maintenance was completed without incident. That last of my monitoring alarms have cleared and all our tests look good. Thanks again for your patience, and hopefully this is the last we see of this issue.
At 15:24 EST, we experienced a brief (30 second) network blip in our New York datacenter. This is related to Friday's notice.
We will be performing emergency maintenance tonight at 1:00 EST to replace this equipment altogether. I expect this maintenance to take less than half an hour, but I've scheduled the entire 1:00-2:00 hour to give us some leeway.
I apologize once again for any inconvenience. This blog will be updated when the repair is complete.
We had a few network blips in our New York data center today which caused a few sporadic timeouts to both Copilot and FogBugz On Demand services.
After examination, we determined the failure to be in our equipment and were able to reproduce the problem. We have taken the appropriate steps to prevent this specifc error from recurring, and are engaging the vendor to find the most reasonable fix for the fundamental behavior.
Our colo facility performed maintenance this morning which caused a brief blip in network connectivity. While this only lasted for a few seconds, the interface state change caused one our routers to enter into a bad state. While in this state, we were intermittantly dropping packets and causing Copilot and On Demand services to perform poorly at times.
All services have been restored.
We have isolated the bad NIC, narrowed the problem down to an issue with its driver, and will be replacing this system entirely this week to avoid future problems of this nature.
These errors were isolated to account Administrators accessing FogBugz through a particular web server (unrelated to the "Periodic web errors" post below). The issues on that server have been resolved, though we're still working to determine their root cause.
One of our webservers is having a bad day and is periodically throwing ASP errors at visitors to shop.fogcreek.com and jobs.joelonsoftware.com. We've pulled it out of our load balancer so you won't have to look at them any more, but you may see a bit of a performance hit on those sites, discuss.joelonsoftware.com, and copilot.com as a result. Updates coming as soon as this server is fixed.
Some customers were experiencing problems reaching FogBugz on Demand. One of the front end servers was having difficulty reaching the database server. That server has now been taken offline, so everything should be working again.
update: Things are in working order. Total lag was about 30 minutes.
Hi Folks. We're in the midst of migrating a hurting backend service right now to some new hardware. During this move, new On Demand accounts will be queued up for creation. We expect this to be wrapped up in an hour or so. Once all is well, all of the queued up account creation requests will flow through as expected.
This does not impact existing On Demand users.
I'll update this post as soon as the work is complete.