SRE Newsletter

Planning & Estimating Large Scale Software Projects: // tomrussell.co.uk
Tom Russell describes how they plan 8 figure software projects that involve multiple teams across several quarters of effort while still remaining somewhat agile.
Managing the Risk of Cascading Failure: // infoq.com
Cascading failures are failures that involve something causes a reduction in capacity, or an increase in latency, or a spike in errors. What happens next is that the response of other components of your software system causes widespread failure, your load will increase, and your backends will get flattened.
Pragmatic Incident Response: 3 Lessons Learned from Failures: // firehydrant.io
Declare and run retros for the small incidents. Decrease the time it takes to analyze an incident. Alert on pain felt by people — not computers.
De-Siloing Incident Management: // rootly.io
De-siloing the organization is such a crucial part of managing reliability. This post explains why breaking down the silos that separate SREs from other teams is so important, and practical strategies for doing so.
Automatic Remediation of Kubernetes Nodes: // blog.cloudflare.com
Cloudflare is open sourcing Sciuro, their replacement of node-problem-detector that has one job: synchronize Kubernetes node conditions with currently firing alerts in Alertmanager.
If You Want To Transform IT, Start With Finance: // zwischenzugs.com
We should consider a deeper structural cause of cultural problems in change management: how money flows through the organisation.
No, We Don’t use Kubernetes: // ably.com
Ably explains the tradeoffs of running thousands of Docker instances without Kubernetes and why they think it doesn't make sense for a lot of companies that have adopted it.
Runbooks for Better Incident Management: // ashpatel.substack.com
A short post describing why runbooks are useful and different ways that teams have setup their runbooks.
Mitchell's New Role at HashiCorp: // hashicorp.com
HashiCorps founder and CEO steps down to become… an individual contributor.
AWS Now Allows Customers To Pay in Advance: // aws.amazon.com
With Advance Pay, you can now pay for your AWS usage in advance, and pay your future invoices automatically.