Hard Truths & Overlooked Realities About Business Disaster Recovery

Hard Truths & Overlooked Realities About Business Disaster Recovery

The hackers were on the incident response call, but no one realized it until they spoke up, announcing their readiness to initiate a DDoS attack since the ransom had not been paid. Within minutes, everyone in the room and on the MS Teams call began losing connectivity, and notifications on their cell phones confirmed that the attack was underway.

“Once you pay the ransom in bitcoin, we will reverse the attack,” they declared.

The Cybersecurity team initiated a DDoS response without considering the impact on internal users. Everyone except those on the call lost connectivity. Company-wide network access was disrupted, and external customers were unable to access their accounts.

Thirty minutes later, the bitcoin transfer was initiated, services began to be restored, and the hackers ominously warned, “We will be back!”

The company had just become a victim of ransomware and another DDoS attack. With no real plans in place, they felt violated and at the mercy of the attackers.

 

Are you a statistic?

It is challenging to calculate the actual impact an event like this may have on your company. I have read multiple articles and studies trying to quantify the costs, but many fall short. Looking at just the cost of the human resources required to respond, the cost of lost sales, or the cost of paying a ransom to clean the infected systems. From my vast experience in actual event response, planning, and management, here are some key lessons learned that should explain the vast costs related to such responses.

 

Ransomware hidden costs:

The truth, not well known, is that ransomware recoveries last much longer than just the immediate data recovery. Although a very well-prepared company may be able to restore the data in less than a day, the ransomware itself can remain in your systems and take months to eradicate. The larger the network, the greater the potential for hidden exploits. Because of the way the bad actors obtained access to the network and the length of time they have been wandering through your systems, it can take months and sometimes well over a year to identify all the compromises.  

Often, consulting teams or response teams are brought in to not only guide and manage a response but also oversee the long-term remediation and certification of the system's state.

 

What is the cost of downtime? 

The cost of downtime varies by company and system. Several articles I have read claim the cost can average $300,000 per hour, and costs can exceed $1 million per hour. Frequent outages lead to higher operating costs due to the need for more staff to maintain productivity. Effective plans can reduce response costs and staffing needs.

These estimates are the cost at the time of impact, essentially the loss of business, and do not take into consideration the cost of response, after-action, and long-term remediation. Regardless of whether the long-term actions are performed internally, by a contracted company, or a hybrid solution, the cost for these services can be pretty high.

 

Risk Categories that matter:

IT system risks generally fall into two categories: hardware and cyber. Maintenance planning can reduce hardware failures, and good monitoring can identify trouble points. However, it's crucial to have system redundancy to avoid high impacts from hardware failures, though this can be costly.

Cyber threats are continuous; hackers only need to succeed once, while your cyber team must catch them every time. Planning and exercising can mitigate these risks, as impacts can escalate quickly.

Additionally, the risk posed by uninformed internal users increases the likelihood of experiencing an impact exponentially.

AI is becoming a handy tool for identifying patterns or potentially compromised items. Utilizing AI rules for identifying, isolating, and triggering appropriate responses allows a team to be more effective and available to address what matters for a human intervention.

 

Predictable Planning:

 Natural disasters are relatively predictable based on location, time of year, and environmental conditions. While planning for these events can be costly, a little forethought can significantly reduce their impact and associated costs, depending on the business and its footprint. For example, downtime expenses can translate into lost revenue, delayed projects, and disrupted supply chains, costing businesses thousands of dollars per hour. Physical damage to business premises, equipment, and inventory can also strain budgets, with repair and replacement costs running into the millions. Effective planning and preparedness can mitigate these financial impacts and facilitate quicker recovery.

Hardware outages are also a predictable item that your architect teams should be planning appropriately with redundancy. With the availability of cloud solutions, redundancy either through automated multi-region or other levels of redundancy makes planning much easier.  

 

As a resiliency professional, let me share some greater insight.

1.         I have never worked or consulted with a company that had no plans, perfect plans, or initially proven plans. What I mean is that being prepared is a continual improvement process, requiring testing your systems, exercising your people, and identifying areas of improvement.

2.         Every company I have worked for or with has had a costly impact. Though after responding to an outage, they may experience fines, reputational impacts, or stock decline; these are reactive outcomes. Companies that excel at planning with realistic expectations can reduce the costs of these outages. I have seen companies plan to throw money at problems if they occur. Well, you could be throwing maybe 20k in overtime versus 20 million(or more) in response teams, contractors, fines, and other costs.

3.         Every company thought either that it could not happen to them or that their plans were good enough. A foolish perspective that will become an expensive lesson. I once worked for a company that planned to have at least 90% of the staff go to another company location 300 miles away if a hurricane threatened their facility. However, this was never discussed with the staff or the challenges that might arise. At a town hall, I asked the staff to raise their hands if they would be available to meet this need. About 40% of the staff raised their hands. Then I asked, what if the company did not let you bring your family or provide housing while there? Not a single hand stayed up. Plans must be vetted, and do not have expectations that have not been vetted.

4.         Every company was a statistic! That is right. Every company will have had a debilitating/ impactful event. Plan on it.

Until they took the four pillars of disaster preparedness seriously, they reacted without much control.

 

Planning shifts your company from reacting to acting, from being out of control to having control. The level of control depends on the maturity of your planning and how well it is integrated into your company's culture, encompassing response, recovery, and mitigation.

I helped a company develop a mature program, evidenced by a successful DDoS attack response. Before my engagement, they had experienced several attacks that left customers and employees unable to access systems, websites, and perform transactions for hours. Throughout our engagement, we focused on building plans, identifying issues, escalation, and adaptive response strategies.

After extensive planning and exercises, they faced another DDoS attack. This time, their network monitoring system detected abnormalities early. The service desk received complaints of slow web responses and timeouts, which were escalated to the response team. After triage, they enacted the partially automated DDoS plan. DNS issues and changes were implemented, and within minutes, systems began recovering, and call volume diminished. The impact lasted only 15-20 minutes, without a complete system outage. The cost was minimal, and customer impact was limited to minor delays in service access.

The resulting impact cost them only a few thousand dollars in lost production time, and the reputational impact was negligible. The change in their planning, systems, and culture took a few years and a moderate investment. Still, they matured from reacting to actively implementing pre-planned activities and managing incidents and impacts. They chose not to be a statistic and are likely one of the reasons they have become a Fortune 500 company today.

 

 

James Knox is a resiliency expert with an innovative spirit who thrives when building meaningful solutions to various daily problems in the corporate world. He is an avid outdoorsman and loves extreme rock crawling, fishing, and hunting. As a survivalist, James has learned from necessity how to prepare for life’s bumps and thrive with practical and sensible solutions, supporting his family's self-sustaining lifestyle.