SRE NEWSLETTER

Issue #17 // March 5, 2021

Increment: Reliability
// increment.com
This months issue of Increment shares approaches to reliability and resiliency in our software, technologies, and teams, and offers perspectives on the realities of failure in the systems we build.
An Oral History of #hugops
// protocol.com
A little over 10 years ago, a group of operations-oriented engineers decided they were fed up with software developers who didn't care if their code actually worked. Those engineers created the Velocity Conference in order to band together and to come together as a community. That community sparked a revolution known as DevOps.
How We Minimized the Overhead of Kubernetes in our Job System
// datadoghq.com
Datadog, moved their job system to Kubernetes. It took substantially more CPU time than before, yet completed jobs at a 40-50% slower rate. This post describes how they solved this performance regression. The solution involved some performance experiment design, light performance tuning, and timing analysis to get back to parity.
How Etsy Prepared for Historic Volumes of Holiday Traffic in 2020
// codeascraft.com
For Etsy, 2020 was a year of unprecedented and volatile growth. Their site traffic leapt up in the second quarter, when lockdowns went into widespread effect, by an amount it normally would have taken several years to achieve. If they over-scaled, there was a risk of wasting money. If they under-scaled, it could have been much worse for their sellers.
Jonah Edwards - Internet Archive Infrastructure
// archive.org
[Video] Jonah Edwards runs the core infrastructure team at the Internet Archive. In this video he goes through the scale of the infrastructure that runs the Internet Archive.
Why You Should Take a Look at Nomad before Jumping on Kubernetes
// atodorov.me
A decent part of the proposals for a “new Kubernetes” are design choices made by Hashicorp's Nomad, which is a pretty underrated orchestrator, and drastically simpler.
Principles of a Good Product Roadmap
// productcrunch.substack.com
Can the characteristics of an agile roadmap be distilled into a set of universal principles? Principles help you assess an existing roadmap or create a roadmap with them in mind. The result is an attempt at defining agile roadmapping principles.
Microsoft previews Windows Server 2022
// theregister.com
Windows Server releases every three years or so, and continues to make progress on its goals of easier administration, removing reliance on the server desktop GUI, and stripping down the operating system so that most features are optional components.
Retry Pattern in Microservices
// engineering.mercari.com
This blog post provides an understanding of the retry pattern used in microservices architecture, why it should be used, a few considerations while using the retry pattern, and how to use it in Python.
SolarWind, Enough with the Password Already!
// gru.gq
This is a discussion on the complexity of the SolarWind hack. Could SolarWind have been too difficult for the KGB to use them in an enablement operation? Yes. Of course, SolarWind wasn’t close to reaching that level.