Today I’m going to answer a second question from Ronan, the Product Director of a growing startup in the transport industry. His question is…
What is a disaster recovery plan?
Ronan and his team recently developed a new solution. And as they are growing, they need to increase the standard of their infrastructure.
Building a high-standard infrastructure is a huge topic, and we already covered parts of it in the previous videos. If you want to check them out, watch What type of technical team does a high-standard production require? and How can a startup build a high-standard SaaS software for large companies?.
However, today is all about what a disaster recovery plan is and how to write one.
So first off, a disaster recovery plan is a detailed piece of documentation to help managers and technicians deal with a major business-affecting disaster.
You might think that as a startup, you shouldn’t be concerned with such a heavy process… right? Think again.
What if your hosting provider disappears? What if you lose your app’s source code? Don’t you think you might need a properly documented technical procedure to recover it?
You definitely will. There are dozens of reasons to use a disaster recovery plan.
When to use it?
- Server crash
- Hijacking or virus
- Data center burglary
- Losing critical skills
- Terrorist act (like the World Trade Center attack in 2011)
The interesting fact is that if you face a major disaster, your business has a 90% chance of dying if you do not have a disaster recovery plan (DRP)!
To figure out a DRP, you need to establish a few things.
What is acceptable?
- How much time would it be acceptable for your business to be down?
- Is there a risk of losing relevant or confidential data?
- If the media reports on your losses, how would it affect your business?
- Will customers continue to trust you?
Once you’ve answered these questions, let’s establish the minimum a startup should be able to recover, no matter what.
What minimum data to recover?
1. Backups and well-tested restoration procedure
The most important part of a DRP are backups and a well-tested restoration procedure.
Without backups, a recovery is not possible. So, double-check that you have a history of reliable backups, with several previous versions on a second hosting provider.
2. External infrastructure ready to deploy
Talking about a second provider, make sure that the second infrastructure is ready to deploy. It doesn’t have to run 24/7, but the DRP should contain the already-tested procedure to deploy.
There is also a recent technique called “infrastructure as code,” which consists of programming the deployment of your entire infrastructure to rebuild it within a few minutes. That might be interesting to consider at some point.
3. Data synchronizations
Going back to the backup side of things. Usually a full backup is performed during the night, often once a day or once a week.
But what if your system crashes at 8pm? Would you lose what was done during the day? You could, that’s why we create what is called standby synchronization or read replica.
These are real-time synchronizations between two systems that let you have an updated version of your database. These systems let you restore at any point in time, whenever you require it, which can be very useful in case of hijacking. So, check with your your engineers.
4. Knowledge management
DRP also entails knowledge management.
What would happen to your startup if, god forbid, your CTO or your lead developer got into a serious accident and became unavailable for a few weeks or months? Who will have enough knowledge to keep the project afloat?
So first, make sure you have the required documentation to recover production and also to be able to reuse existing solutions under development.
5. Password and encryption key management
The same goes for password and encryption key management. Do you know exactly where your passwords are? Who has access to them?
Double-check that always at least two people in the organization have access to your entire password database.
6. Decision-making procedure
This next one really comes from return on experience.
When you build a system, when you prepare a failover procedure, and when an issue comes up six months or a year later, it’s not always easy to remember in which case you should migrate just one component or a big part of our infrastructure.
So you need to have a decision making procedure saying:
If the component X fails, then we do this
If the component Y fails, then we do that
IF the components X and Y fail, then we do this… etc.
I sincerely hope you got my point. A disaster recovery plan is the most important piece of documentation your company will ever need. So regardless of what happens to your solution, or if a key member of your team leaves without providing vital information, having such a refined procedure will strengthen your startup’s reliability.
And now if like Ronan, you have a specific question for your project, just go ahead and post them on myctofriend.co/ask.
I will do my best to answer your question in a video or redirect you to any existing content that will answer it.
Also, be sure to go through our other content here at myctofriend.co to learn more from real startup growth experiences and better manage your startup development.
I’ll be waiting for your questions, and I look forward to seeing you in other videos.