Ransomware IS a Disaster

By Ben Miller June 28, 2023

Once upon a time in a land not so far way and not so long-ago building Disaster Recovery was looked at like preparing for the apocalypse. After all, the only reason production would have to run on the end-of-lease equipment lounging in retirement at the secondary colo was if all of the expensive and well-designed fault-tolerant components failed. We had RAID, and multiple power supplies and circuits, twin stacks of core and top-of-rack switches terminating dual drops to each server and multiple circuits from WAN providers. The generator even had a backup generator. The more money we poured into the high availability design of the production environment, the less we’d have to think about actually trying to run workloads on the old stuff.

Those days are gone. Even the promise of high availability hyperscale cloud native micro-services devops containerized CI/CD well architected designs are not immune to the new evil that dawned on the horizon. But ransomware is a security problem not a DR issue? Right? This is a problem for the blue-robed, well-funded, executive-sponsored security heroes, not that DR team in the basement, with their desks piled with dusty binders and bins of tapes. Right?

Well, what do you call it when your data center is off-line, or worse, possessed by the malicious software holding your data hostage and exfiltrating it to be auctioned on the dark web. A DISASTER! And what is the security team going to do about that!? Well, their job of course, a very hard job of coordinating with legal, law enforcement, insurance, marketing, oh, and the incident response team, the data forensics team, and the million other screaming people who want to know what happened, how it’s going to be fixed, when it’s going to be fixed, and what their going to do so it never happens again. But while they’re doing that, where is production running? Who is serving your customers? Certainly not your competitor offering a special discount especially for your users.

The on-premises recovery approach of local immutable backups with instant recovery are well meaning. But still near-sighted. Most data centers don’t have the capacity in storage or compute to power off an infected environment, leave it in place for forensics, and fire up another copy of production. If they do, they likely don’t want to until they know for certain that prod has been thoroughly purged and cleaned, perhaps with a flame thrower. I’ve heard ghost stories of malware hiding in wireless access points, waiting to reinfect as soon as servers come back online.

Let’s face it, Ransomware IS a disaster. A data-center-stopping disaster of the first order that doesn’t care how much you’ve spent on high availability and fault-tolerant designs. If the data center is a crime scene, where do you run production?

You run it at the Disaster Recovery site.

But not your grandfather’s (or even yours from five years ago) recovery site. The single-pain-of-glass and single-sign-on tools that made managing multiple data centers much easier and efficient have also made it that much easier for ransomware to infect the DR infrastructure. Point to point circuits allow infection to jump from prod to DR at the literal speed of light. Yesterday’s DR designs are obsolete.

In today’s landscape, a DR strategy needs to have several design components to harden it against cross contamination such as immutable data, air gaps, multi-factor authentication, and monitored security. Yes, the security team needs to be intimately involved with DR from a design and a recovery perspective. More on that part later.

Immutable Data, once called WORM (Write Once Read Many), because we are so good at naming things in IT, simply means the backup or replication data in the recovery environment cannot be altered or deleted short of physical destruction. It’s the buzz word in all new backup and replication tech today, but it wasn’t heard of years ago. Get some.

Air Gap simply refers to there being air between the affected environment and the recovery environment. That’s a nice theory, but it can be challenging in practice. There are good and expensive ways to air gap environments, but for most people with budgets, theoretical air gaps will have to suffice. Even data diodes (I said that just to impress you) are not truly air gapped. If there is copper between the two data centers, there’s no air.

A few ways to look at air gapping are segmenting credentials and segmenting the data replication path from the recovery environment. Segmenting credentials means that there are no credentials that your production environment or production team has access to that Ransomware can compromise and use to infect, damage, or impact the DR site. This can be achieved through a separate, secure, credential vault or by leveraging third party providers to manage the DR site and recovery.

Replication path segmentation means having the data movers on networks that are out of band and have no route to the DR recovery networks. This typically is achieved by separate replication subnets and circuits or VPNs.

Multi-factor authentication. Speaks for itself. Get it, deploy it, love it. Yes, it can add an extra step here or there, but if your users can figure it out for their Instagram and Facebook accounts, they can use it at work.

This brings us to monitored security. Yes, security involved in deploying detectors, collectors, and eyeballs on the DR environment. Even with the hardening measures above, the DR site represents an attack vector and a target. If there are no eyes on DR, we can’t be sure it’s clean for recovery.

Alright, since we’re talking about security and their cooperation with DR, this new buddy-cop friendship will be critical during recovery. With ransomware dwell time averaging in months, recovering from a time before the initial infection is not practical. The recovery team needs information from the security about what was infected, with what, and through what means. The DR playbook needs a Ransomware specific scenario that describes how the environment is brought up in isolation, scanned for indicators of compromise, purged with the appropriate tools, validated as clean, remediated against repeat infection, and brought back online for end users. You can see how critical coordination with the security team and their sophisticated SIEM service is.

That’s a lot of work. Recovering from ransomware is a blend of DR teams and security teams working together in perfect harmony (with or without a Coke). In many organizations, this cooperation means an adjustment not only to the DR strategy, but the corporate org-chart. You can see this is not just an IT initiative, but a whole organization responsibility with top-down sponsorship.

Ransomware has reshaped the landscape of disaster recovery. It not only requires newer strategies and techniques than it did a few short years ago, but also the cooperation of teams that are typically segmented withing an organization. But it’s not all hard news. Gartner’s DRaaS Market Guide indicated that leveraging a partner and doing DRaaS with a Cloud Provider can reduce the cost to do DR by 40%-60% over doing it in-house.

The people, processes and technology exist today to tackle the challenges of a modern Disaster Recovery plan. If you want to investigate it more, reach out to InterVision and we can help do more than just refresh your DR Site. We can even buy your DR and Security teams a Coke.