Sources & Bibliography

References, deep dives, and further reading for Vol 1: Foundations of the Digital World.

00

Chapter Zero: The Anatomy of a Post-Mortem

Related Documents and Articles

de Havilland Comet Crashes

2 sources

Murphy's Law (Edward A. Murphy Jr.)

2 sources

The Black Box (Flight Data Recorder)

2 sources

The "5 Whys"

3 sources

Pre-Mortem

1 sources

System Definition

1 sources

Blamess Post-Mortem

2 sources
Part 1

The Physics of Failure

Fundamental forces that hate your uptime: Complexity and Entropy.

01

Chapter 1: When Numbers Lie

Case Studies

Mariner 1

3 sources

Ariane 5 (Flight 501)

4 sources

Intel Pentium FDIV Bug

4 sources

Vancouver Stock Exchange

2 sources
  • Journal of Economic Literature Kevin Quinn Accessed: 2025-12-14

    See Reference #25 in the PDF for the original WSJ citation.

    Reproduces the graph of the index drop.

  • Ever Had Problems Rounding Off Figures? This Stock Exchange Has
    The Wall Street Journal 1983

    Date: November 8, 1983. Page: 37.

    Original article is not available online without a paid subscription (ProQuest ID: 1983-11-08).

Explainers

Floating Point Math (IEEE 754)

2 sources

Integer Overflow (Ada/Ariane Context)

2 sources

Rounding vs. Truncation

2 sources
02

Chapter 2: Time Is Broken

Case Studies

Y2K (The Original Millennium Bug)

2 sources

Y2K10 (The 2010 Glitches)

4 sources

Y2K10 - The German Bank Card Glitch

1 sources

PSN 2010 (PlayStation 3 Leap Year Bug)

2 sources

Y2K20 (The 2020 Glitches)

3 sources

Z2K9 (The Zune Bug)

2 sources

Linux Leap Second (2012)

2 sources
Explainers

Astronomical Time (Sidereal vs. Tropical)

2 sources

Calendar History & Reform

2 sources

Mars Time

2 sources

Leap Second Elimination

1 sources

The Year 2038 Problem

1 sources

Encodings (Octal/Hex)

3 sources
03

Chapter 3: The Memory Problem: Too Much Trash, Not Enough Walls

Case Studies

Chrome Memory Leak Saga

2 sources

Windows 98 GDI Leak

3 sources

Meltdown & Spectre

3 sources

Rowhammer

3 sources
Explainers

Core Concepts

2 sources
04

Chapter 4: The Race That Nobody Wins

Case Studies

Mars Pathfinder (1997)

3 sources

Spider-Man: No Way Home (2021) Ticket Frenzy

No specific report due to distributed nature of the failure. Sources confirm the incident happening and discuss the technical causes.

4 sources

Apple TCC Privacy Bug (2021)

2 sources

Ethereum DAO Hack (2016)

2 sources

Therac-25 (1985-1987)

1 sources

Toyota Unintended Acceleration (2009-2010)

2 sources
Explainers

Multitasking & Context Switching

1 sources
  • Operating System Concepts
    Abraham Silberschatz, Peter B. Galvin, Greg Gagne

    Standard textbook definition of Process Control Blocks (PCB) and Context Switching overhead.

Race Conditions

2 sources
05

Chapter 5: Scale & Exponential Pain

Case Studies

The Comair Christmas Meltdown (2004)

3 sources

The Buffett Overflow (2021)

3 sources

The Morris Worm (1988)

4 sources

SQL Slammer (2003)

3 sources

AT&T Long-Distance Outage (1990)

4 sources

Stack Overflow Regex Outage (2016)

3 sources
Explainers

Exponential Growth

1 sources

Factorial Growth & Graph Theory

3 sources
Part 2

The Internet’s House of Cards

06

Chapter 6: BGP - How Internet Finds Itself

Case Studies

Verizon BGP Leak (2019)

3 sources

Facebook Outage (2021)

4 sources

Rogers Communications Outage (2022)

3 sources
Explainers

Official Documentation & Standards

The "Source Code" of the Internet

3 sources

Deep Dives & Academic Resources

4 sources
07

Chapter 7: DNS - How Names Replace Numbers

Case Studies

Dyn DNS Attack (2016)

3 sources

Swedish (.se) Outage (2009)

2 sources

Slack's "Rollback from Hell" (2021)

3 sources

Comcast Blocks NASA (2012)

2 sources

Akamai DNS Software Blunder (2021)

2 sources
Explainers

RFC 1034: The foundational definition of the DNS

2 sources

RFC 4033: DNSSEC and the "Chain of Trust"

1 sources
08

Chapter 8: How the Internet learned to trust itself – SSL/TLS

Case Studies

Cloudbleed (2017)

3 sources

FREAK (2015)

3 sources

DigiNotar (2011)

3 sources

Symantec CA Distrust (2017-2018)

3 sources

AddTrust (2020)

3 sources
Explainers

Diffie-Hellman Key Exchange (The Paint Analogy)

2 sources

Trust Chain & Certificates

2 sources
09

Chapter 9: How We Learned to Share the Load

Case Studies

Cloudflare (July 2, 2019)

3 sources

Slack (January 4, 2021)

2 sources

Roblox (October 2021)

2 sources

Google Cloud (2019 & 2020)

3 sources

Azure Front Door (October 29, 2025)

Azure status page is rollingly updated and no hard link to specific report is available. As of preparing it's still visible but will dissapear over time

2 sources
Explainers

OSI Model

3 sources

Load Balancers (L2 - L7)

7 sources
10

Chapter 10: When the Cloud Rains

Case Studies

AWS US-East-1 Outage (2017)

3 sources

AWS US-East-1 Outage (2025)

3 sources

Fastly Outage (2021)

3 sources

Google Global Outage (2020)

3 sources

Azure AD Outage (2021)

2 sources
Explainers

Cloud-Native

2 sources

SaaS (Software as a Service)

1 sources

Content Delivery Network (CDN)

1 sources
Part 3

When Data Fights Back — Why Storage Failures Hurt the Most

11

Chapter 11: Data – Death by Natural Causes

Case Studies

SSD Bit Rot & Flash Amnesia

4 sources

The Google Lightning Strike (2015)

3 sources

The Facebook Indoor Rainstorm (2011)

2 sources

The Wrocław University of Science and Technology Dust Bowl (2023)

2 sources

Voyager 1 Bit Rot in Deep Space (1977-Present)

3 sources

BBC Domesday Project (1986-2002)

4 sources

Adobe Flash (1996-2020)

3 sources
Explainers

Bit Rot vs Data Decay

4 sources
12

Chapter 12: Accountant’s Revenge

Case Studies

NASA — The Lost Moon Tapes

5 sources

HBO Max / Warner Bros. Discovery – The Accountant's Cut

5 sources

The Great Magnetic Purge (TV Archives)

11 sources

The Gaming Purge (1970s–Present)

6 sources

NASA and NOAA Climate Data Gaps

2 sources
Explainers

Why Storing Data Was (and Still Is) a Problem

10 sources
13

Chapter 13: DB Blunders

Case Studies

MongoDB Duplicate ObjectIds

2 sources

Salesforce Permageddon

4 sources

Instapaper Storage Limit Meltdown

3 sources

GitHub MySQL Outage

5 sources

CERN CASTOR Tape Catalog Corruption (2008)

5 sources

PostgreSQL VACUUM Bug

2 sources

Discord The Memory Wall

3 sources

Target Canada Master Data Disaster

5 sources
Explainers

Metadata

2 sources

Indexes

2 sources

OLTP vs OLAP

2 sources
14

Chapter 14: Lost in Migration

Case Studies

TSB Bank Meltdown (UK, 2018)

5 sources

The Great British Police Purge (2021)

2 sources

Canada's Phoenix Pay System (2016)

3 sources

Japan’s My Number Meltdown (2023)

3 sources

MySpace Migration Meltdown (2019)

2 sources
15

Chapter 15: Schrödinger's Backup—Both There and Not There Until You Need It

Case Studies

NASA Curiosity Rover Storage Leak (2013)

3 sources

GitLab Database Incident (2017)

4 sources

Toy Story 2 (1998)

3 sources

King’s College London (2016)

2 sources

Kyoto University Supercomputer Wipe (2021)

4 sources

T-Mobile Sidekick Outage (2009)

5 sources
Explainers

RTO & RPO

1 sources

Incremental vs. Differential Backups

2 sources

Storage Types (Block vs. Object vs. File)

5 sources
Part 4

The Illusion of Safety — When Clouds Bite Back

Part IV Introduction

The Shared Responsibility Model

5 sources
16

Chapter 16: When Storage Becomes Billboards

Case Studies

Case Set: When Data Is Left in Public

Disclaimer: Sources referencing tools like Google Dorks are for educational purposes only. Unauthorized testing is unethical and illegal.

3 sources

Verizon (2017)

3 sources

Accenture (2017)

3 sources

BlueBleed (Microsoft, 2022)

4 sources

Alice’s Table (2024)

3 sources

U.S. Defense Contractor (2017)

4 sources

Capital One (2019)

2 sources

Uber (2016)

3 sources

Dow Jones Watchlist Leak

4 sources
Explainers

SSRF (Server-Side Request Forgery)

2 sources

Git and GitHub

2 sources

ElasticSearch

2 sources
17

Chapter 17: When "Scale" Scales Your Bills

Case Studies — Due to the understandable reluctance of corporations to officially report purely financial losses caused by cloud billing errors, a significant portion of the sources in this chapter originates from community discussions on internet forums and social media. While these sources naturally lack the formal authority of official reports, the author has rigorously vetted each anecdote for technical plausibility and engineering credibility to ensure its reliability.

The $450,000 Bill in 45 Days

2 sources

$4,500 in Two Days

1 sources

$10,000 Bill from CloudTrail-S3 Loop

1 sources

$72,000 Overnight

1 sources

$75,000 in 48 Hours

1 sources

Cryptojacking — Mining Coins, Burning Cash

5 sources

NASA’s $30 Million Oversight

3 sources
Explainers

Recursion

4 sources

Serverless / Lambda

2 sources
18

Chapter 18: When Security Tools Become the Threat

Case Studies

CrowdStrike

5 sources

SentinelOne Outage

4 sources

Zscaler Reboot Loop Outage

Zscaler’s official Trust Portal maintains a limited incident history (typically one year) and primarily lists external ISP or cloud provider issues, often omitting significant internal service failures. Consequently, the analysis of the "Reboot Loop" and other listed incidents relies heavily on third-party monitoring logs, independent technical analysis, and community reports from impacted IT administrators. While the vendor’s official channels offer little data on these events, the external footprint—documented in the links below—provides a verified timeline of service disruptions that are otherwise absent from the company's permanent public record. This selective transparency necessitates a reliance on community-driven documentation to fully understand the scope of these failures. Update (December 2025): Since the initial compilation of this research, Zscaler has expanded their Trust Portal to include the most recent 30 days of incident history. While this represents an improvement in short-term transparency, the incidents documented in this chapter (spanning 2022-2025) remain absent from the vendor's permanent public record. The analysis therefore continues to rely on third-party monitoring and community reports for historical context.

6 sources

Okta Breach

3 sources

When Windows Defender Fought Linux

2 sources
Explainers

Kernel, Antivirus, and EDR

7 sources

VPNs, Zero Trust, and Authentication

6 sources
Part 5

The Illusion of Resilience – Disaster Recovery and Other Optimistic Plans

19

Chapter 19: Redundancy Is Not Resilience

Case Studies

Armenian Internet (The Spade Incident)

4 sources

Knight Capital (2012)

3 sources

AWS US-EAST-1 Kinesis Outage (2020)

3 sources

OVHcloud Fire (2021)

4 sources

The 2003 Northeast Blackout

2 sources

The 2003 Italy Blackout

2 sources
Explainers

Redundancy & High Availability

5 sources

Reliability and the Magic Nines

4 sources
20

Chapter 20: Resilience Theater – Practicing Safety While Everything Burns

Case Studies

Delta Airlines (2016)

4 sources

T-Mobile (2020)

4 sources

Cloudflare (2025)

4 sources

Atlassian (2022)

3 sources
Explainers

Parachute Principle (Disaster Recovery)

3 sources
21

Chapter 21: Where Outages Become Fatal

Case Studies

September 11, 2001 (Financial Sector Resilience)

5 sources

August 12, 1985 (Japan Airlines Flight 123)

4 sources

June 1, 2009 (Air France Flight 447)

3 sources

July 6, 1988 (Piper Alpha)

4 sources

NHS Pandemic Continuity Plans (2020)

5 sources

March 11, 2011 (Fukushima Daiichi)

5 sources