The theme of this week's issue is complexity. For those not in the trenches, site reliability sounds simple - just add some monitoring and a load balancer, right?
Facebook, the 3rd most visited site on the internet, went down proving that even with $1 million engineers things can go wrong. AWS is struggles figuring out how to provide usable IaC tools. And finally, GitLab and eBay provide some great details on what it takes to observe complex, distributed systems.