Member-only story
Awesome Books in SRE
2 min readOct 26, 2023
- Practical Linux Infrastructure
- Site Reliability Engineering: How Google Runs Production Systems
- The Site Reliability Workbook: Practical Ways to Implement SRE
- Observability Engineering: Achieving Production Excellence
- The Practice Of Cloud System Administration: Designing and Operating Large Distributed Systems
- Web Operations — Keeping the Data On Time
- The Checklist Manifesto: How to Get Things Right
- Microservices in Production — Standard Principles and Requirements
- Production-Ready Microservices — Building Standardized Systems Across an Engineering Organization
- Systems Performance: Enterprise and the Cloud [Sample chapter titled CPUs
- Monitoring Distributed Systems: Case Studies from Google’s SRE Teams
- The Human Side of Postmortems: Managing Stress and Cognitive Biases
- Chaos Engineering: Building Confidence in System Behavior through Experiment
- Post-Incident Reviews: Learning from Failure for Improved Incident Responses
- Antifragile Systems and Teams
- How to Monitoring the SRE Golden Signals (E-Book)
- Incident Management for…