The Kiln issues reported earlier today have been resolved. To clarify, the exact symptoms were:
- Delays in code pushes being reflected in the Kiln interface
- Delays in repository creation
- Delays in code search updates
No pushes or created repositories were lost -- this was strictly a display issue.
When you push code to Kiln, your data streams through to our Mercurial backends. As soon as that push is complete, those backends queue an event to inform the frontends that there's new information to be indexed and displayed. These events are handled by a worker process that, in this case, had slowed down to a crawl and was unable to keep up with the incoming messagaes.
This slowdown was fixed by restarting the worker process. Our developers are working to discover why it slowed down in the first place, but we've added a bandaid script to ensure that events are processed in due time even if the slowdown happens again.
This also exposed an area of monitoring that we need to improve upon since, to our checks, it looked like this process was doing its job. In the future, we should catch these types of events before they're even noticeable.