Limitations of Cloud Providers in Disaster Recovery

Apr 13, 2021
11 min read

Updated: May 3, 2021

I thought this article from one of Connexion Point's customers was thought provoking and worthy of sharing;

- Confronting the skewed gospel of the “cloud evangelists” In this article I want to address a very serious “elephant in the room” issue that is becoming more serious as time goes by. The issue, is simply that cloud based recovery does not scale, is fundamentally insecure and does not work for a true “black swan” incident. A black swan incident in this context entails a catastrophic loss of I.T. Infrastructure and may embrace networks, servers, data and personnel. This is due to an unforeseen event. I also want to emphasize some areas and advantages of tape based DR systems not commonly alluded to. I do this as a systems engineer whose main focus is the engineering of back up and recovery systems, for mission critical servers. I do not sell tapes, tape equipment or represent any vendor in these areas. I have no preference for any particular brand and am quite happy with server, networking & data storage equipment from HP, IBM, Dell, Cisco, EMC or any other respectable vendor. Moving All or Part of Your Primary I.T. Infrastructure to the Cloud Firstly, let us define the cloud, as to some none technical people, there is a connotation of some exotic technology, operating in a heavenly “ether” type realm. The cloud is simply a remote data centre, owned by a 3rd party company like Equinix, NTT, Vocus or NextDC. It may be 1000 or 5000 km away. The cloud provider like Amazon or Microsoft Azure, does not usually own the data centre, they simply lease part of the facilities. Amazon is usually quite secretive on the location of the data centres, that it actually uses. You then use the interface/portal of the cloud provider to access your data or run your systems remotely. When you move to the cloud, you simply exchange your local server room or data centre, for a remote one. That is all you do, in essence. Despite all the talk of “elastic bean stalks”, glaciers, 5 nines architecture, availability zones etc, etc – that is all you do ! You have not made a quantum leap, gone through the stargate or been initiated into an exotic fraternity. I believe migrating mission critical databases containing medical or any type of personal, customer or confidential data to be wrong. Whereas it is acceptable for certain email and servers and Netflix type media providers etc. However I am not going to address that here, or go through the counter arguments as we do not have space. Rather, I am just going to focus on the negligence of after moving all or part of your mission critical infrastructure to the cloud, you are tempted to also move your disaster recovery [DR] infrastructure, to the cloud also. The Fallacy of moving your Main Secondary System / Disaster Recovery System to the Cloud The main issue in summary, is that DR from the cloud does not work – for a catastrophic loss of data and/or infrastructure. It will work for some DR scenarios, but it will not work for the main DR scenario of wholesale data corruption, infrastructure failure or climate related disaster. This is because of the following reasons. I do however see the viability and wisdom of keeping an encrypted copy of mission critical data sets in the cloud as a last line of defense and do not consider myself anti-cloud, but more of a cloud realist. The main issues against cloud based disaster recovery are as follows. 1. It cannot be used for recovery of large datasets. This means for a significant DR incident involving more than say 40 Gb of data it is just impractical. For example, a large mission critical core banking system, utilizing 3 Oracle database instances and associated binaries, may easily total 500 Gb. In Australia, where the average broadband speed is 25 Mbps, that is a 50 hour download. And that is under ideal conditions. What would happen if the cloud provider’s primary data centre was based in on the East coast of Australia and affected by recent floods? You have to pull data from a different “availability zone” to use the Amazon terminology, possibly based in the States or South/North East Asia, which will seriously affect your download speed and at least double the download time. Would 48 to 96 hrs, to recover a core banking system and ATM access, be acceptable to the majority of Westpac, NAB, ANZ & CBA customers ? I think not. Maybe that is one reason that no major bank in Australasia uses cloud for mission critical systems, to the best of my knowledge. In comparison, look at the download speed to tape, for an actual back up job for an international environmental organization tasked with protecting the Mekong delta sub-region. It has downloaded 536 Gb in 105 minutes. This is from an on premise fibre channel tape system directly connected to a standard Dell ESXi server, connected in turn in standard fashion to a with a Synology Network Attached Storage appliance. That approximates to a staggering 750 Mbps! As a practical observation, restores of data to disk, are often faster than back ups. In real terms your restore speed could approach 1 Gbps.

This is not on cutting edge equipment but an average set up, employing LTO 5 technology, which is a tape standard over 10 years old. There are no special snapshots, de-duplication SANs, expensive cloud syncing storage devices or any wizardry. The network attached storage [NAS] feeding the ESXi server does not even have an ISCSI connection, it is Vanilla CIFS via a CAT 6 cable - and just good standardized engineering. In plain language it is a run of the mill normal system using slightly obsolete equipment. Another advantage is the reliability of the download. An average Oracle database may have 15 “*.dbf” files. A few corrupt blocks on 1 of those files, [that the cloud download tool does not detect] and your restored database may refuse to boot. There is far less probability of corrupt data blocks in a tape based restore. To get similar performance from a cloud based system you an expensive appliance that interfaces to your cloud provider these are typically $US 50,000-100,000. If a power surge or firmware bug takes it out your in serious trouble, it will often corrupt the disks/flash storage of that appliance. Ransomware can also easily invalidate and lock the data on one of these appliances. However If a power surge takes out a tape system there is almost zero possibility of it corrupting the tapes, which sit in plastic magazines electrically isolated from the main circuit boards. There is no known ransomware strain that has successfully attacked a tape system. Theoretically it would be possible but it is an order of magnitude more difficult and the attacker would have to write specific code overcome a host of other issues. Similarly a firmware bug would not corrupt tapes already written. 2. The “Black Swan” that takes out your Primary Data Infrastructure May take out Supporting Inter-Continental or National Network Infrastructure. One of the main risk factors of cloud based DR is it increases and concentrates the risk and damage of a manageable incident. When your Internet infrastructure is lost, or the connecting infrastructure to the cloud provider goes down. Even if your cloud data is perfectly intact, if a country, state or city is isolated by a major infrastructure fault, you are dead in the water. You have absolutely zero options till the switching gear, cell towers or submarine cable is restored. Think of Hurricane Katrina on the East coast of the USA where some networks were still down after 3 weeks. The point is, if you have a local copy of your data on tape, you have options. You can relocate that data to a different city or region and rebuild systems in hours and overcome that isolation. Cloud based DR Systems have No Protection against Corruption / Malware Propagated into the Cloud Put simply, this means that if your main dataset is hit with a ransomware attack or other virus or hardware failure induced data corruption, it will propagate up to your cloud based DR copy, in a matter of minutes –possibly seconds. Even if you have multiple redundant availability zones. This is especially true if you have some sort of expensive storage back end appliance, mentioned earlier, in your server room designed for snapshotting into the cloud, or snapshotting locally in the same server room [Think of an event similar to the Aust. Taxation Office storage systems failure in Dec 2016]. When corruption propagates up into your cloud DR data, you have lost the main pillar of your cloud based DR recovery. 3. Many Cloud Providers use Structually Weak Data Centres & are Accredited by Organizations with a Conflict of Interest This is the other elephant in the room. There are many things to write here - I will stick to the bare essentials. Many cloud data centres are structurally weak. They are primarily constructed of concrete panels, have large amounts of glass and composite tile cladding. Go to the following link showing what happens when one of these centres catches fire: https://www.youtube.com/watch?v=gNHCzQ823sY

Samsung obviously had not planned for an external fire that breached their containment system. The following link shows a representation of a data centre based in Bangkok with what appears to be a corrugated sheet metal roof , probably less than a few mm thick, protecting vital infrastructure – this is the norm and not the exception: http://www.supernap.co.th/gallery/#7

Many of the larger ones that have opened in the last few years are basically just large

warehouses. The providers focus on their “9 levels of electronic security” but ignore more

basic potent threats from storms, earthquakes, truck bombs and terrorist attack.

Relatively thin concrete panels do not offer sufficient blast, storm or earthquake protection.

A 5 tonne cement truck driven at 60 kmh would easily breach and cause significant structural

damage and decimate 90% of the data centres operated by TrueIDC, Equninix,CS loxley and NTT

amd many other providers.

Ask your cloud provider the following:

a/ How much reinforced concrete is in the walls and roof protecting the data cabinets ?

b/ If a fire breaches the automatic gas discharge containment system like at Samsung – what

is your plan B ?

c/ As climate change disasters are a reality, what is your protection against floods /

earthquake and Hayan / Katrina level events ?[Please do not accept “ We are in a no earthquake

/flood /storm zone”]

d/ Did the data centre provider pay money to get private accreditation from British Standards,

TUI or similar organization ? This introduces a huge conflict of interest. Just look at the

recent building standards fiasco in Sydney or Grenfell tower in London.

Your cloud provider cannot give assurances on these issues as they do not control the physical

infrastructure of the data centre in most cases. They sub-contract it out to the data centre

provider.

4. You Completely lose Control of your Data Privacy and you are Locked In.

Once your data is uploaded you have effectively lost control of it. This is especially

true in the Asia Pacific region. There is significantly less data protection in restricted

access countries, than there is in Europe or the States. Please take this from someone who

has had decades of experience dealing with security agencies in these countries. You are much

more at risk from legal acquisition of your data than hacking.

As a registered provider I am legally required to furnish any assistance required to the

authorities. “Un-cooperative” attitudes can get your visa pulled and a little visit from

special branch. Your business will be effectively bankrupted. If the authorities “ask” for

encryption keys, your cloud provider has to furnish them. [This is why I support independent

encryption where only the customer knows the key]

If you have not encrypted your data, then they just have to run a cross- connect in the

data centre and they have access to everything. You would be very naive to believe this does

not happen in some countries. The recent debacle over Huawei is not paranoia and their

equipment proliferates in many data centres based in this region.

If your cloud provider houses your data in China, Hong Kong SAR, Vietnam, Laos PDR, Malaysia,

Singapore etc, do you honestly believe these governments will respect your data privacy - or

that you will even know about it, when they want access to your data ? Please ask that

question to your friendly neighbourhood cloud evangelist.

5. The Cloud Bankrupts Local Capability and Staff Development.

One of the main advantages touted by the cloud providers is you can slash your IT budgets and

head count. In the long run it contributes to destruction of the local IT industry and

destroys it’s capability. Can you expect more from companies like Amazon who seem almost proud

to have bankrupted thousands of small businesses and booksellers? [see the “sickly gazelle”

comment by Bezos]

Do any of their initiatives actually develop young engineers / developers or support staff?

Apart from their window-dressing CSR programs, we must understand that their core business

model is to debilitate the local IT industries and eco-systems, in the markets they operate in

- and convince you it is somehow a good thing.

6: The “Shared Responsability” Model == No Responsability for Your Data or DR Incident

A black swan DR scenario is a stressful experience requiring clearly demarcated lines of

responsibility. It is rare for data & systems to be restored without at least some unforeseen

issues. If you cannot recover your systems, eg when you finally download the data multiple

times you find it is not clean enough for your database. Or, for example, there is an

interface issue between the web server and the database that cannot be resolved. Please do not

expect your cloud provider to take responsibility and help you manage your DR incident – it is

all in the legal file print and it’s termed the “Shared Responsibility” model. Remember the

vast majority of people who you deal with, when contemplating moving your DR system to the

cloud [sales & marketing, data migration specialists, evangelists, account managers etc] have

never ever handled a major DR incident or engineered a recovery of a mission critical system.

Finally, Some Professional Advice from one Evangelist to Another & Just who are the Greatest

users of Tape Based Systems ?

Although I am not actually anti-cloud provider and do accept their premises in some instances

– [eg. I use cloud email myself] I would like to go into more theological area, since they

purloin he term evangelist to describe themselves.

Please understand this is coming from someone with decades of evangelical and missionary

experience with the Lao Lloum, Thai, Hmong, Khmu and Lao Soung people groups. My main areas

are interpretation of prophetic scripture in the Old Testament in the Lao and Thai languages

and believe me, it takes years and years to get proficient in these areas. Forget the

television preacher caricature, real evangelism in restricted access countries is an extremely

serious business and we deal with people who have suffered and been imprisoned for their faith

and some who operate under deep cover.

The major principle I would urge is therefore get your doctrine correct or you cause serious

damage. This is something I have to deal with in my Christian work regularly. You “cloud

evangelists” have a serious issue with skewed doctrine that is definitely not “scriptural”.

You are operating for the sole purpose of increasing your profit margins.

The Biggest Utilizers of Tape Based Systems are Cloud Providers

Finally, why are the biggest utilizers of tape based back up systems the cloud providers

themselves? How is Amazon Glacier backed up? [by tape] Why don’t they just use each other to

back up? For example, why doesn’t Amazon just use the Azure cloud and vice versa?. The

answer is tape systems run cooler, have no bandwidth bottlenecks, are much cheaper / efficient

and stable than cloud based back up systems, plus a myriad of other advantages. To paraphrase

a well known Bangkok newspaper columnist, any comment would be superfluous.

Black Swans exist, if ever you have the opportunity go to the beautiful Bibra Lake nature

reserve south of Perth you can see these graceful creatures for yourselves. [I believe this is

the only place in the world where you can see them en masse] I grew up just 2 km from this

reserve.

They also exist in the area of systems recovery and back up and just like the shallow water

torpedo at Pearl Harbour [which theoretically did not exist] , you will encounter them if you

do not plan for them - and the effects could be catastrophic – just like at Pearl.

Please do not move your Disaster Recovery / Primary back up system to the cloud- It is

negligent. Take it from a real evangelist.

My Background:

Gregory Hayes is a Unix recovery & back up systems engineer, based in the Laos PDR. He is the

owner of Eshcol Data Protection and the Pastor of Holy Seed Church Vientiane Capital. He has

no facebook friends, instagram / twitter followers or blog posts and also is a professional

coffee drinker

Limitations of Cloud Providers in Disaster Recovery

Related Posts

Comments