Disaster Recovery Plans That Actually Work

I have a question I like to ask when I'm getting to know a company's technology posture: "When was the last time you tested your disaster recovery plan?" The most common answer is a long pause followed by a change of subject.

The second most common answer is "we tested it when we set it up," which is another way of saying "never, in any meaningful sense."

The Document Problem

Most disaster recovery plans are documents. Thick ones, usually produced during an audit or compliance cycle, that describe in careful detail what should happen when things go wrong. They sit in a SharePoint folder or a binder on someone's shelf, and they give everyone a warm feeling of preparedness.

But a document isn't a plan. A plan is a set of actions that people can actually execute under stress, with incomplete information, when the systems they normally rely on aren't available. That's a very different thing from a document.

What Goes Wrong

Disaster recovery plans fail for a few consistent reasons.

They assume too much. The plan says "restore from backup," but doesn't account for the fact that the backup server is in the same facility as the primary, or that the restore process takes forty-eight hours and the business can only survive twenty-four hours of downtime.

They're too complex. When everything is on fire, nobody is going to flip through a hundred-page document looking for step 47b. If your plan requires reading comprehension during a crisis, it's not a plan — it's a wish.

They depend on specific people. The person who wrote the plan left the company two years ago. The person who understands the backup system is on vacation. The vendor contact in the plan retired. Plans that depend on specific people are fragile by design.

They've never been tested. This is the big one. An untested plan is a hypothesis. You don't know if it works until you've tried it, and the middle of a disaster is the worst possible time to find out.

Building Something Real

A disaster recovery plan that works has a few characteristics.

It's short. One page of critical steps, one page of contact information, one page of system priorities. Everything else is reference material. The core plan should be executable by someone who's stressed, tired, and working from a personal laptop in a parking lot.

It's prioritized. Not everything recovers first. Your plan should have a clear recovery order based on business impact: what systems must be running in the first four hours, the first twenty-four hours, the first week. This isn't a technology decision — it's a business decision, and leadership should be making it.

It's tested regularly. At minimum, quarterly tabletop exercises where you walk through a scenario. At least annually, an actual technical test: restore from backup, fail over to secondary systems, verify that your recovery time objectives are realistic. Every test will reveal something you didn't know. That's the point.

It's maintained. Assign ownership. Review after every test, every incident, and every significant infrastructure change. A plan that was accurate six months ago may not be accurate today.

The Investment

Building and maintaining a real disaster recovery capability takes time and money. Not as much as you might think — the basics are surprisingly accessible — but it does require ongoing attention. The question isn't whether you can afford it. The question is whether you can afford the alternative: discovering, in the worst moment, that your plan was just a document.