Between 16:32 and 16:45 EST, one of our database servers suffered massive performance issues that customers on that server may have seen as slowness, timeouts, or FogBugz error messages.
Our monitors warned us this morning of an expected drive failure on this server. To get the drive replaced, Dell asked that I run their Dell E-Support Tool (DSET) to gather the relevant log entries. We've run this in the past with no issues and were assured that it was designed to operate unobtrusively on production machines. Today, unfortunately, it obtruded. Within minutes of requesting a report, memory owned by SQL Server began paging out. This brought the machine to a near-standstill.
SQL Server typically eats up about 44GB of memory on this particular server. When I finally managed to kill the DSET process, it had fallen to just over 3GB. The root issue was resolved at this point, but slowness lingered until about 16:45 as SQL Server reclaimed enough memory to function properly.
I apologize for the outage. Naturally, we won't be running this reporting tool again outside a contained test environment.