The Deficiencies of Cloud Providers in Disaster Recovery

Posted on 28-Sep-2019

I thought this article from one of Connexion Point's customers was thought provoking and worthy of sharing;

-Confronting the skewed gospel of the “cloud evangelists”

In this article I want to address a very serious “elephant in the room” issue that is becoming
more serious as time goes by. The issue, is simply that cloud based recovery does not scale,
is fundamentally insecure and does not work for a true “black swan” incident. A black swan
incident in this context entails a catastrophic loss of I.T. Infrastructure and may embrace
networks, servers, data and personnel. This is due to an unforeseen event.

I also want to emphasize some areas and advantages of tape based DR systems not commonly
alluded to. I do this as a systems engineer whose main focus is the engineering of back up
and recovery systems, for mission critical servers. I do not sell tapes, tape equipment or
represent any vendor in these areas. I have no preference for any particular brand and am
quite happy with server, networking & data storage equipment from HP, IBM, Dell, Cisco, EMC or
any other respectable vendor.

Moving All or Part of Your Primary I.T. Infrastructure to the Cloud

Firstly, let us define the cloud, as to some none technical people, there is a connotation of
some exotic technology, operating in a heavenly “ether” type realm. The cloud is simply a
remote data centre, owned by a 3rd party company like Equinix, NTT, Vocus or NextDC. It may be
1000 or 5000 km away. The cloud provider like Amazon or Microsoft Azure, does not usually own
the data centre, they simply lease part of the facilities. Amazon is usually quite secretive
on the location of the data centres, that it actually uses. You then use the interface/portal
of the cloud provider to access your data or run your systems remotely.

When you move to the cloud, you simply exchange your local server room or data centre, for a
remote one. That is all you do, in essence. Despite all the talk of “elastic bean stalks”,
glaciers, 5 nines architecture, availability zones etc, etc – that is all you do ! You have
not made a quantum leap, gone through the stargate or been initiated into an exotic
fraternity.

I believe migrating mission critical databases containing medical or any type of personal,
customer or confidential data to be wrong. Whereas it is acceptable for certain email and
servers and Netflix type media providers etc. However I am not going to address that here, or
go through the counter arguments as we do not have space.

Rather, I am just going to focus on the negligence of after moving all or part of your mission
critical infrastructure to the cloud, you are tempted to also move your disaster recovery [DR]
infrastructure, to the cloud also.

The Fallacy of moving your Main Secondary System / Disaster Recovery System to the Cloud

The main issue in summary, is that DR from the cloud does not work – for a catastrophic loss
of data and/or infrastructure. It will work for some DR scenarios, but it will not work for
the main DR scenario of wholesale data corruption, infrastructure failure or climate related
disaster. This is because of the following reasons. I do however see the viability and
wisdom of keeping an encrypted copy of mission critical data sets in the cloud as a last line
of defense and do not consider myself anti-cloud, but more of a cloud realist. The main issues
against cloud based disaster recovery are as follows.

1. It cannot be used for recovery of large datasets. This means for a significant DR incident
involving more than say 40 Gb of data it is just impractical. For example, a large mission
critical core banking system, utilizing 3 Oracle database instances and associated binaries,
may easily total 500 Gb. In Australia, where the average broadband speed is 25 Mbps, that is
a 50 hour download. And that is under ideal conditions. What would happen if the cloud
provider’s primary data centre was based in on the East coast of Australia and affected by
recent floods? You have to pull data from a different “availability zone” to use the Amazon
terminology, possibly based in the States or South/North East Asia, which will seriously
affect your download speed and at least double the download time.

Would 48 to 96 hrs, to recover a core banking system and ATM access, be acceptable to the
majority of Westpac, NAB, ANZ & CBA customers ? I think not. Maybe that is one reason that
no major bank in Australasia uses cloud for mission critical systems, to the best of my
knowledge.

In comparison, look at the download speed to tape, for an actual back up job for an
international environmental organization tasked with protecting the Mekong delta sub-region.
It has downloaded 536 Gb in 105 minutes. This is from an on premise fibre channel tape system
directly connected to a standard Dell ESXi server, connected in turn in standard fashion to a
with a Synology Network Attached Storage appliance. That approximates to a staggering 750
Mbps! As a practical observation, restores of data to disk, are often faster than back ups.
In real terms your restore speed could approach 1 Gbps.

This is not on cutting edge equipment but an average set up, employing LTO 5 technology, which
is a tape standard over 10 years old. There are no special snapshots, de-duplication SANs,
expensive cloud syncing storage devices or any wizardry. The network attached storage [NAS]
feeding the ESXi server does not even have an ISCSI connection, it is Vanilla CIFS via a CAT 6
cable - and just good standardized engineering. In plain language it is a run of the mill
normal system using slightly obsolete equipment.

Another advantage is the reliability of the download. An average Oracle database may have 15
“*.dbf” files. A few corrupt blocks on 1 of those files, [that the cloud download tool does
not detect] and your restored database may refuse to boot. There is far less probability of
corrupt data blocks in a tape based restore.

To get similar performance from a cloud based system you an expensive appliance that
interfaces to your cloud provider these are typically $US 50,000-100,000. If a power surge or
firmware bug takes it out your in serious trouble, it will often corrupt the disks/flash
storage of that appliance. Ransomware can also easily invalidate and lock the data on one of
these appliances.

However If a power surge takes out a tape system there is almost zero possibility of it
corrupting the tapes, which sit in plastic magazines electrically isolated from the main
circuit boards. There is no known ransomware strain that has successfully attacked a tape
system. Theoretically it would be possible but it is an order of magnitude more difficult and
the attacker would have to write specific code overcome a host of other issues. Similarly a
firmware bug would not corrupt tapes already written.

2. The “Black Swan” that takes out your Primary Data Infrastructure May take out Supporting
Inter-Continental or National Network Infrastructure.

One of the main risk factors of cloud based DR is it increases and concentrates the risk and
damage of a manageable incident. When your Internet infrastructure is lost, or the connecting
infrastructure to the cloud provider goes down. Even if your cloud data is perfectly intact,
if a country, state or city is isolated by a major infrastructure fault, you are dead in the
water. You have absolutely zero options till the switching gear, cell towers or submarine
cable is restored. Think of Hurricane Katrina on the East coast of the USA where some networks
were still down after 3 weeks. The point is, if you have a local copy of your data on tape,
you have options. You can relocate that data to a different city or region and rebuild
systems in hours and overcome that isolation.

Cloud based DR Systems have No Protection against Corruption / Malware Propagated into the
Cloud

Put simply, this means that if your main dataset is hit with a ransomware attack or other
virus or hardware failure induced data corruption, it will propagate up to your cloud based DR
copy, in a matter of minutes –possibly seconds. Even if you have multiple redundant
availability zones. This is especially true if you have some sort of expensive storage back
end appliance, mentioned earlier, in your server room designed for snapshotting into the
cloud, or snapshotting locally in the same server room [Think of an event similar to the Aust.
Taxation Office storage systems failure in Dec 2016]. When corruption propagates up into your
cloud DR data, you have lost the main pillar of your cloud based DR recovery.

3. Many Cloud Providers use Structually Weak Data Centres & are Accredited by Organizations
with a Conflict of Interest

This is the other elephant in the room. There are many things to write here - I will stick to
the bare essentials. Many cloud data centres are structurally weak. They are primarily
constructed of concrete panels, have large amounts of glass and composite tile cladding. Go to
the following link showing what happens when one of these centres catches fire:
https://www.youtube.com/watch?v=gNHCzQ823sY

Samsung obviously had not planned for an external fire that breached their containment system.

The following link shows a representation of a data centre based in Bangkok with what appears
to be a corrugated sheet metal roof , probably less than a few mm thick, protecting vital
infrastructure – this is the norm and not the exception:
http://www.supernap.co.th/gallery/#7

Many of the larger ones that have opened in the last few years are basically just large
warehouses. The providers focus on their “9 levels of electronic security” but ignore more
basic potent threats from storms, earthquakes, truck bombs and terrorist attack.

Relatively thin concrete panels do not offer sufficient blast, storm or earthquake protection.
A 5 tonne cement truck driven at 60 kmh would easily breach and cause significant structural
damage and decimate 90% of the data centres operated by TrueIDC, Equninix,CS loxley and NTT
amd many other providers.

Ask your cloud provider the following:

a/ How much reinforced concrete is in the walls and roof protecting the data cabinets ?

b/ If a fire breaches the automatic gas discharge containment system like at Samsung – what
is your plan B ?

c/ As climate change disasters are a reality, what is your protection against floods /
earthquake and Hayan / Katrina level events ?[Please do not accept “ We are in a no earthquake
/flood /storm zone”]

d/ Did the data centre provider pay money to get private accreditation from British Standards,
TUI or similar organization ? This introduces a huge conflict of interest. Just look at the
recent building standards fiasco in Sydney or Grenfell tower in London.

Your cloud provider cannot give assurances on these issues as they do not control the physical
infrastructure of the data centre in most cases. They sub-contract it out to the data centre
provider.

4. You Completely lose Control of your Data Privacy and you are Locked In.

Once your data is uploaded you have effectively lost control of it. This is especially
true in the Asia Pacific region. There is significantly less data protection in restricted
access countries, than there is in Europe or the States. Please take this from someone who
has had decades of experience dealing with security agencies in these countries. You are much
more at risk from legal acquisition of your data than hacking.

As a registered provider I am legally required to furnish any assistance required to the
authorities. “Un-cooperative” attitudes can get your visa pulled and a little visit from
special branch. Your business will be effectively bankrupted. If the authorities “ask” for
encryption keys, your cloud provider has to furnish them. [This is why I support independent
encryption where only the customer knows the key]

If you have not encrypted your data, then they just have to run a cross- connect in the
data centre and they have access to everything. You would be very naive to believe this does
not happen in some countries. The recent debacle over Huawei is not paranoia and their
equipment proliferates in many data centres based in this region.

If your cloud provider houses your data in China, Hong Kong SAR, Vietnam, Laos PDR, Malaysia,
Singapore etc, do you honestly believe these governments will respect your data privacy - or
that you will even know about it, when they want access to your data ? Please ask that
question to your friendly neighbourhood cloud evangelist.

5. The Cloud Bankrupts Local Capability and Staff Development.

One of the main advantages touted by the cloud providers is you can slash your IT budgets and
head count. In the long run it contributes to destruction of the local IT industry and
destroys it’s capability. Can you expect more from companies like Amazon who seem almost proud
to have bankrupted thousands of small businesses and booksellers? [see the “sickly gazelle”
comment by Bezos]

Do any of their initiatives actually develop young engineers / developers or support staff?
Apart from their window-dressing CSR programs, we must understand that their core business
model is to debilitate the local IT industries and eco-systems, in the markets they operate in
- and convince you it is somehow a good thing.

6: The “Shared Responsability” Model == No Responsability for Your Data or DR Incident

A black swan DR scenario is a stressful experience requiring clearly demarcated lines of
responsibility. It is rare for data & systems to be restored without at least some unforeseen
issues. If you cannot recover your systems, eg when you finally download the data multiple
times you find it is not clean enough for your database. Or, for example, there is an
interface issue between the web server and the database that cannot be resolved. Please do not
expect your cloud provider to take responsibility and help you manage your DR incident – it is
all in the legal file print and it’s termed the “Shared Responsibility” model. Remember the
vast majority of people who you deal with, when contemplating moving your DR system to the
cloud [sales & marketing, data migration specialists, evangelists, account managers etc] have
never ever handled a major DR incident or engineered a recovery of a mission critical system.

Finally, Some Professional Advice from one Evangelist to Another & Just who are the Greatest
users of Tape Based Systems ?

Although I am not actually anti-cloud provider and do accept their premises in some instances
– [eg. I use cloud email myself] I would like to go into more theological area, since they
purloin he term evangelist to describe themselves.

Please understand this is coming from someone with decades of evangelical and missionary
experience with the Lao Lloum, Thai, Hmong, Khmu and Lao Soung people groups. My main areas
are interpretation of prophetic scripture in the Old Testament in the Lao and Thai languages
and believe me, it takes years and years to get proficient in these areas. Forget the
television preacher caricature, real evangelism in restricted access countries is an extremely
serious business and we deal with people who have suffered and been imprisoned for their faith
and some who operate under deep cover.

The major principle I would urge is therefore get your doctrine correct or you cause serious
damage. This is something I have to deal with in my Christian work regularly. You “cloud
evangelists” have a serious issue with skewed doctrine that is definitely not “scriptural”.
You are operating for the sole purpose of increasing your profit margins.

The Biggest Utilizers of Tape Based Systems are Cloud Providers

Finally, why are the biggest utilizers of tape based back up systems the cloud providers
themselves? How is Amazon Glacier backed up? [by tape] Why don’t they just use each other to
back up? For example, why doesn’t Amazon just use the Azure cloud and vice versa?. The
answer is tape systems run cooler, have no bandwidth bottlenecks, are much cheaper / efficient
and stable than cloud based back up systems, plus a myriad of other advantages. To paraphrase
a well known Bangkok newspaper columnist, any comment would be superfluous.

Black Swans exist, if ever you have the opportunity go to the beautiful Bibra Lake nature
reserve south of Perth you can see these graceful creatures for yourselves. [I believe this is
the only place in the world where you can see them en masse] I grew up just 2 km from this
reserve.

They also exist in the area of systems recovery and back up and just like the shallow water
torpedo at Pearl Harbour [which theoretically did not exist] , you will encounter them if you
do not plan for them - and the effects could be catastrophic – just like at Pearl.

Please do not move your Disaster Recovery / Primary back up system to the cloud- It is
negligent. Take it from a real evangelist.

############################################

My Background:
Gregory Hayes is a Unix recovery & back up systems engineer, based in the Laos PDR. He is the
owner of Eshcol Data Protection and the Pastor of Holy Seed Church Vientiane Capital. He has
no facebook friends, instagram / twitter followers or blog posts and also is a professional
coffee drinker

 

Like what you've read? Get social & click the share buttons.


 
Copyright CXP, All rights reserved
SEO North Shore