[kjanshair.github.io] Monitoring with Prometheus

March 13, 2018

My own tech blogging site.Monitoring applications & application servers is an important part of the today’s DevOps culture & process. You want to continuously monitor your applications and servers for application exceptions, server’s CPU...

Continue reading

[xaprb.com] Schrodinger’s Outage

March 8, 2018

A couple months ago we had an incident, in which a legacy recovery mechanism proved to be inadequate to our current scale. In our internal post-incident review, we asked if we should improve this...

Continue reading