Disaster Recovery (DR)

Disaster Recovery (DR) involves a set of policies, tools, and procedures to enable the recovery or continuation of vital technology infrastructure and systems following a natural or human-induced disaster.

Strategies: 1. Backup and Restore (Slowest). 2. Pilot Light. 3. Warm Standby. 4. Multi-site Active-Active (Fastest). Standards: ISO/IEC 27031, ISO 22301.

        graph LR
  Center["Disaster Recovery (DR)"]:::main
  Rel_log_management["log-management"]:::related -.-> Center
  click Rel_log_management "/terms/log-management"
  classDef main fill:#7c3aed,stroke:#8b5cf6,stroke-width:2px,color:white,font-weight:bold,rx:5,ry:5;
  classDef pre fill:#0f172a,stroke:#3b82f6,color:#94a3b8,rx:5,ry:5;
  classDef child fill:#0f172a,stroke:#10b981,color:#94a3b8,rx:5,ry:5;
  classDef related fill:#0f172a,stroke:#8b5cf6,stroke-dasharray: 5 5,color:#94a3b8,rx:5,ry:5;
  linkStyle default stroke:#4b5563,stroke-width:2px;

      

🧒 Explain Like I'm 5

Imagine you are building a giant LEGO castle. [Disaster recovery](/en/terms/disaster-recovery) is like having a photo of your castle and a spare set of bricks kept at your grandma's house. If your cat knocks over your castle and breaks it, you can just go to your grandma's, look at the photo, and build it again exactly how it was.

🤓 Expert Deep Dive

Technically, DR planning is defined by two critical metrics: 'Recovery Time Objective' (RTO) and 'Recovery Point Objective' (RPO). RTO is the amount of 'time' it takes to get systems back up. RPO is the amount of 'data' you can afford to lose (e.g., if you back up every hour, your RPO is 1 hour). To achieve low RTO/RPO, companies use 'Geographic Redundancy' and 'Asynchronous Replication' across multiple 'Availability Zones'. We also categorize recovery sites: 'Cold Sites' (empty space, take days to start), 'Warm Sites' (hardware ready, but data needs loading), and 'Hot Sites' (real-time mirror of the main system). Automation tools like Terraform or CloudFormation are now used to 'spawn' infrastructure in minutes during a disaster.

📚 Sources