RISK INSIGHT WAVESTONE ’ S CONSULTANT RI SK MANAGEMENT AND CYBERSECURI TY LETTER
C Y B E R - R E S I L I E N C E A N EW P I L L A R O F C Y B E R S E C UR I T Y S T RAT EGY
E D I T O R I A L
Summer 2017 has shown how global cyber attacks unfold in practice, espe- cially with NotPetya. Although the full consequences of the “ransomworm” are still to be determined, Merck group already announced in late November 2017 that they expected the cyber attack to cost themmore than 600million dollars over the 2017 period! Adding together the latest announcements, the 2 billion dollars threshold in lost revenues is clearly within reach. This is the first time such a high impact is measured following a cyber incident. This unprecedented escalation is rallying C-level executives who are looking for the means to limit the impacts of such attacks but also for the positions to adopt during an actual attack. We hope that the articles below will help you get a clearer view and plan the required actions.
FOLDER CYBER-RESILIENCE: THE KEY ACTIONS.............................................2 NOTPETYA: 6 MONTHS LATER, WHAT ARE THE IMPACTS?........................... 4 CYBER-CRISIS, A FULLY-FLEDGED MEDIA TOPIC..............9 THE CLOUD: THE END OF IT BACKUP – OR A NEWWAY OF DOING IT?................. 12
Gérôme BILLOIS Partner Cybersecurity & Digital Trust
NOTPETYA: 6 MONTHS LATER , WHAT ARE THE IMPACTS?
the NotPetya 1 billion malware
6 months later what are the
On June 27 th
THREAT major A
as the epicenter Ukraine
The World as collateral damages
more than 1 billion $ of losses
Gérôme BILLOIS, Partner email@example.com Denis BLANDIN, Consultant firstname.lastname@example.org
*Project ion over Q4 2017 released by t he company **Loss of revenue
Top 6 of the financial losses due to this cyberattack (publicly released)
The World as collateral damages
more than 1 billion $ of losses
*Project ion over Q4 2017 released by t he company **Loss of revenue
Top 6 of the financial losses due to this cyberattack (publicly released)
Weeks of damages caused in only 1 HOUR of malware execution
14 days 13 days 23 days 36 days 84 days on-going
Time for a restoration of a majority of the affected systems (publicly released)
an ActiveDirectory was necessary to launch the restorations while the ActiveDirectory backup was a prerequisite to rebuild it. The same findings hold for industrial IS. Industrial digital systems are resilient against technical breakdowns or anticipatedmecha- nical incidents. However, they were rarely designedwith the consideration of deliberate attack and as a result often lack advanced security systems. To compound on this, industrial IS has lifecycles of several decades which expose them to old vulnerabilities. Finally, the independence of control channels from the digital systems which they oversee is not always implemented.
Moreover, these IS continuity plans are frequently intimately linked to the resources they protect and are equally affected by the attacks. For over a decade, continuity pro- cesses (either user fallback or IT recovery) have adopted principles of infrastructure pooling and “hot” recovery to cope with both rapid business recovery and the need for improved operation. In effect, this « proximity » between the regu- lar IS and its recovery counterpart makes continuity plans vulnerable to cyber attacks. WHAT V U L N E R A B I L I T I E S I N BU S I N E S S CON T I NU I T Y S Y S T EMS ? As an example, various dedicated and connected recovery stations in fallback sites were contaminated by NotPetya and were useless for the remediation. Legacy « cold » recovery/emergency plans (often consisting of activating a recovery system in case of incident) concern fewer and fewer applications, and the remaining ones are often secondary. Unfortunately, when systems have been deeply compromised, backups during this period may also include the malevolent elements such as malwares, base camps, or modifications meticulously operated by attackers beforehand, due to the fact that intrusions go undetected for a long period of time (detection often happens hundreds of days following the initial infection). In addition, the continuity of the backup sys- tems themselves is often neglected. During the management of the NotPetya crisis, the backup management servers were also destroyed. Restoring them took several days, due to their complexity and nested nature within the information system;
CYBER-RES I L I ENCE : THE KEY ACT IONS S u c c e s s i v e c y b e r a t t a c k s , Wa n n a c r y a n d N o t P e t y a , h a ve h i g h l i g h t e d t h e l i m i t s o f c u r re n t re s i l i e n c e a n d b u s i - n e s s c o n t i n u i t y p l a n s , a s w e l l a s t h e f u l l c a p a c i t y o f c y b e r t h r e a t s t o c r i p p l e I n f o r m a t i o n S y s t e m s . T h e a f f e c t e d o r g a n i z a t i o n s p a i d a h i g h p r i c e . W h a t c a n we l e a r n ? Wh a t a c t i o n s c a n we t a ke t o p re p a re f o r ma j o r c y b e ra t t a c k s? H ow c a n we e n s u re cy b e r- re s i l i e n ce? When confronted with a major cyber attack, whether destructive or leading to a loss of trust in vital systems, the first reaction of a majority of companies is to activate their business continuity plan (BCP). This stra- tegic element of resiliency is enacted to ensure the organization’s survival against disasters whosemagnitudemay cause com- puting resources, communication infrastruc- tures, buildings, and possibly even users to be unavailable. Yet major cyber attacks, have not been taken into account when developing most BCPs, even though they can be as destructive in scale as either Wannacry or NotPetya, or, more often, lead to a loss of trust in the basic components of the infrastructure (network, access control, inventory, etc.). By Focusing on an availability agenda, organizations fail to address the issue arising from the simulta- neous destruction or the loss of confidence in Information System (IS) caused by cyber attacks.
Main issues experienced during cybercrisis managemenent
Understanding and mobilisation of the board of directors and the business lines
Difficulty to make the teams functions (scarcity of necessary skills, vacations, “will” to help)
Adataption of crisis management organizations functioning well but new to“cyber topic”
Lack of tools for “trust” crisis management , independant of the main information system
Insufficient global view of the IS
Lack of logs and investigation capabilities
Inability to use back-ups because of the anteriority of the attacks
Inability to use secondary services as they could be compromised
Efficiency loss during the crisis
Major difficulties during the crisis
A comp romi s e a nd l o s s o f con f i d e n ce i n I n fo rma t i o n Sy s tems It concerns a targeted attack does not impact the proper functioning of the system. Rather, it aims to give attackers access to all of the company’s information systems (email and messaging, files, business applications, etc.) allowing them to steal the identity of any employee and carry out actions in their name. The attackers may then extract any type of data or carry out business actions which require several successive validations. These attacks affected a large number of companies across all sectors incurring mas- sive fraud as a result, including the bank of Banglasdesh. These attacks also affected financial and payment data theft as was the case for several distribution groups in the United States including Target and Home Depot. The situation at the start of the crisis is complex since there is no confidence in the Information System and there is consi- derable uncertainty about what the attacker
TWO I L L U S T R AT E D MA J OR AT TAC K S C E NA R I O S Log i c a l d e s t r u c t i on o r t h e un ava i l a b i l i ty o f a l a rg e c hun c k o f a n I n fo rma t i o n Sy s tem As evidenced by attacks from true-false ransomware, Wannacry and NotPetya. This type of attack causes mass unavailability of services due to the encryption of data files and/or the operating system. The companies affected by this attack (Merck, Maersk, Saint Gobain, Fedex… as well as Sony Pictures and Saudi Amramco) lost up to 95% of their Information Systems (tens of thousands of computers and servers) in a timeframe that often lasts less than an hour. At the start of such crisis, the situation is highly difficult since there is no longer any means of com- munication or exchange mechanism within the affected company, including ISD. Victims have outlined losses of several hundred of million euros following these attacks.
could do and their motives. It involves quietly investigating until being able to remove the attacker and rebuild a secure system. Victims affected by these attacks have also repor- ted financial impacts worth several hundred million euros.
Cybercrisis management method
Crisis unit mobilisation
Defense and recovery plan building Capability to trigger
Investigation Understand the attack, its scope, its target and identify how to stop it
the plan in case of “emergency stop”
Triggering a defense and a recovery plan
Heightened a surveillance on a 24/7 basis
STRENGTHEN I NG CR I S I S MANAGEMENT Cyber crisis are specific: they are often long (several weeks) and sometimes dif- ficult to grasp (what has the attacker been able to do? for how long? what is the impact?). Often, affected external par- ties such as lawyers, authorities, suppliers, and sometimes even clients themselves are not well-prepared on the subject mat- ter. Thus, it is necessary to adjust existing plans that have not been designed to cater to the cyber threat aspects.
Even if they is an operational player in cyber crisis management, the CIO should not be over-utilized in either the investiga- tion or the defense measures if it is detri- mental to overall production and recovery. Anticipation of these kinds of measures is vital to the recovery effort. It is necessary to clearly identify the teams which need to be mobilized to respond to the crisis in a timely manner, and to organize the parallel activities of investigation and construction of the defense plan.
Beyond the organizational point of view, the CIO will have to ensure that they also have the investigation tools (mapping, search for attack signature, independent crisis management IS, capability to analyze unknown malware, etc.), remediation tools (Capabilities to rapidly deploy technical corrections, fragmentation of the IS to save what could be saved, IS surveillance toolkit) and reconstruction tools (access to backup, access to minimal documenta- tion, capabilities to deploy workstation) required to understand the activities that the attacker undertook in the IS, to repel it and to ensure it doesn’t return. Writing a crisis management guide that defines the essential steps, the macro- level responsibilities, and the key decision points is a further recommended step. With that, it is essential to conduct crisis exercises to ensure readiness for when one actually occurs.
Functionnal integrity control chain
Functional integrity control chain
Mechanism to reconcile the application outpouts to detect a possible compromission
Lifting up of the autorized overdraft level on an account done without going through the banck back-office interface Creation of an AD admin account without crea- tion of a ticket in the helpdesk ticketing tool
1 + 2 + 3 = OK 2 + 3 = NOK
1 + 2 + 3 = OK Act 1
R E T H I NK I NG CON T I NU I T Y P L AN S Continuity plans have to evolve to adapt to cyberthreats. Sometimes, this means they may have to be completely rebuilt.
operating system, or production teams. It is an extreme solution, very costly and difficult to maintain, but one that is considered for specific, critical applications in the financial industry – most notably, payment system infrastructure. Other less complex solutions such as adding functional integrity control in the business process have also been considered. The concept relies on the implementation of regular controls, at various levels and at dif- ferent places within the application chain (“multi-level controls”). This enables quick detection of attacks. An alert could be raised in case of an interaction with techni- cal layers, such as a modification of a value directly inside a database, without passing through regular business workflows (via graphical interfaces), for example. In another case, these mechanisms can also be applied to infrastructure systems by reconciling admin account creation request tickets with the number of accounts really in the system.
As a more intermediate complexity level solution, it is possible to implement a “flood- gate”, or as a system and network isolation zone. This floodgate – for example, the indus- trial IS – can be activated in the event of an attack and could isolate the most sensitive systems from the rest of the IS. These, often major, develoments must be part of an existing recovery strategy review so that one can assess their vulnerability and the interest of deploying new cyber-resi- lience solutions, particularly on the most cri- tical systems. The development of Business Impact Analysis (BIA) to include this dimen- sion can be a key first step.
There are many possible solutions that can cover all types of continuity plans.
The user recovery plan, for example, can evolve to integrate USB keys containing an alternative system which could be used in case of logical destruction of employee workstations. Some organizations have also decided to provision an allotted number of workstations directly with their suppliers to have them delivered quickly in case of physical destruction. The IT continuity plan, on the other hand, can include new solutions which could be efficient in the event of a cyberattack. The most publicized one aims to build “non- similar facilities” by duplicating an appli- cation without using the same software,
Example of actions to be taken in a cyber-resilience strategy
ANTICIPATE FOR NOT BREAKING Spreading diversity and flexibility (workstations, infratrustructures, applications, third parties...) Limit amplification effect (harden, partition...) Reshape alerts and continuity plans (prioritize, practice...)
ACT RAPIDLY AND EFFECTIVELY
Organize (Structure crisis units, communicate with authorities, mobilize expertise, have sufficient fallback telecommunication means...) Identify and prioritize what can be saved (Ensure audit trail, investigate, immunize...)
REBUILD FAST AND SAFELY
Test the strength (Realize penetration tests...) Industrialize the reconstruction (Restart unaffected services quickly, parallelize, rely on users...)
W I T HOU T C Y B E R S E C UR I T Y, C Y B E R - R E S I L I E N C E I S NOT H I NG Implementing these new cyber-resilience measures requires significant efforts. Note that these efforts can be wasted if both these recovery solutions and the regular sys- tems are not already appropriately secured and under detailed surveillance. The CISO is the key player to ensure that these often
started but rarely finalized initiatives come to fruition. Help from the Risk Manager (RM), or the Business Continuity Manager (BCM) if such a position is in place, will be valuable. It is widely acknowledged today that it is impossible to secure a system 100%, which means that organizations have to accept the inevitability of an attack occurring, at which moment the RM or the BCM will make full use of their role.
Gérôme BILLOIS, Partner email@example.com Frédéric CHOLLET, Senior Manager firstname.lastname@example.org Protect, detect, respond, remediate, and rebuild. These are the pillars of a strong cyber-resilience program which can only be attained if the BCM and the CISO roles combine their full range of capabilities and work hard, hand-in-hand!
Whether it’s strategic events such as presi- dential elections or everyday private conver- sations on digital media that are compro- mised, the crisis’ media effect ismagnified by the extraordinary nature of the event. This is the result of both its supposed impossibility and the confidence that the public entrusts it. The sudden rupture of the trust placed in these «institutions» of major importance, erected in good stead in a 2.0 version of Maslow’s pyramid, then generates itself the interest and the need to know, translated into an explosion of the number of requests for information to the organization in crisis.
When the cyber crisis results in data leakage, for example, it is not only the subject of the crisis that is newsworthy, but its very object. In fact, when the data leaks or is stolen, its nature arouses curiosity, whether it is perso- nal data, a State secret or simply a private conversation. This mechanic logically gene- rates for many audiences both the need to know the unknown, and to make sure that they are not the victim. These two primary needs of curiosity and reassurance are the essential drivers of media coverage and more generally encourage the information consumer, the stakeholder, the client to fill that need and seek to obtain this informa- tion. The same logic assumes that the source of this information, in this case the legitimate data holder, addresses these requests and communicates on the incident.
A l t h o u g h t h ey a r e b a s e d o n s i m i l a r o b j e c t i ve s , me t h o d s a n d t o o l s , c r i s i s ma n a g eme n t a n d c r i s i s c ommu n i c a - t i o n n e ce s s a r i l y a pp ro p r i a te t h e s p e - c i f i c s o f t h e i s s u e s t h ey d e a l w i t h t o b e re l eva n t a n d t h e re f o re e f f e c t i ve . I n t h e c a s e o f a c r i s i s o f cy b e r o r i g i n , c o n s i d e r i n g i t s c h a r a c t e r i s t i c s a n d i t s ex p o s u re t o o f t e n l a rg e n umb e r s o f u s e r s , re q u i re s s p e c i f i c a n t i c i p a - t i o n a nd p re p a ra t i o n . T h e f i rs t s te p i s u n d e r s t a n d i n g t h e ex p e c t e d s c a l e o f me d i a ex p o s u re. ADDR E S S I NG T H E N E E D TO KNOW AND T H E N E E D FOR R E A S S UR AN C E Supported by the increased number of inci- dents and attacks on information systems, the cyber crisis has moved into the public realm. The democratization of its vocabu- lary is a clear indicator of the place that this subject takes up in the media. Data leakage, ransomware, hacktivist, DDoS, phishing, whistle-blower, these terms have left the server rooms and specialist blogs to make their way into national newspaper columns and most people’s vocabulary. The cyber crisis is no longer a mere quality incident discreetly handled in-house but has become an event that arouses the interest of a broad audience. This interest transforms the cyber crisis into a communicational crisis. However, while this theme’s new popularity is logically transposing into an increase in coverage, other elements justify a significant increase in solicitations, whether internal or external to the organization in crisis. CYBER-CR I S I S , A FUL LY-FLEDGED MEDI A TOP I C
Figure 1: Maslow Pyramid Example
5 - Need for self-actualisation
4 - Need for esteem
3 - Need for belonging
2 - Need for safety
1 - Physiological needs
COMMUNICATION WAR BETWEEN THE ATTACKER AND THE COMMUNICATOR Cyber crisis communication is thus a speci- fic exercise given the subject it deals with, but also by the nature of the actors present. In fact, when immeasurable sums of money are stolen without warning or institutions fall under «citizens» hacktivist attacks, opinion tends to sympathise towards the attacker perceived as a modern hero, a romantic pirate or a anonymous vigilante. This public figure, aware of its image and the codes of the communication world, will of course be able to play this environment. Thus, the verymethods of the attackers rein- force the central place of communication in the management of cyber crises. Attacks on political, ideological andmilitant grounds are no longer confined to the compromise of a systembut send a message whose publicity must be maximised. This obvious appropriation of the activists’ specific methods is illustrated in several ways: prior warning of a DDoS, defacing a website, publication over time of proofs of a theft on social networks, dissemination of information such as exchanges of com- promising private mail conversations, etc. If the attackers have learned to maximize the reputational impact of their attacks, they also use this lever to disrupt their target’s crisis management and make a noise that will buy them time once their attack is disco- vered. While one of crisis management’s key success factors of is regaining control of this rhythmand the publication of new elements, the cyber crisis inevitably leaves this power to a malicious third party.
A data leak is thus not only perceived as an attack perpetuated by a malicious third party, but also as negligence in the defenses of the company victim to the theft. The latter is automatically designated as responsible and its reputation is logically impacted. Even as the attackers have become professional, the attacks complexify and the absence of vulnerabilities is a myth, cyber-attacks are now a subject of crisis management and communication in their own right. Because of its potential impact on the general public’s daily life and therefore its newsworthy nature, it forces the victim, considered to be co-responsible for its loss, to express itself.
This third party can also, if the compromise goes deeply, alter the company’s means of communication. While it tries to respond to the need to express itself urgently and widely, this can severely hinder the fluidity of its communication. Without email, how to spread a message to employees? Without social networks, how to be close to the com- munity and answer their questions? Fascinated by the attackers and the magni- tude of the attacks, the general public is nonetheless intransigent at a timewhen trust and data are the very value of a company. Intrinsically, preserving the first assumes the protection of the second. When the organi- zation fails to achieve this goal, crisis com- munication is the only one able to restore this relationship of trust on which depends the future of the relation with customers and partners, who will or will not continue to entrust their data or the management of their tools, as well as their services to an organization. This trust requirement also brings about, when it’s is broken, the search for whom to point the blame. Although the reality of the facts is much more complex, the gene- ral public will easily assume that informa- tion system attacks are made possible by exploiting a vulnerability and therefore a fault. R E S TOR I NG T H E T RU S T R E L AT I ON S H I P T HROUGH COMMUN I C AT I ON
T RY TO K E E P I T S I MP L E FOR B E T T E R C R I S I S COMMUN I C AT I ON Beyond defining a clear, shared and timely strategy, managing a cyber crisis with its particular rhythm and the obstacles caused by the attackers must be accompanied by a special communication which implies a final effort: keeping it simple. Confronted by a cyber crisis, like any type of crisis, communicating implies being able to translate the events and corrective actions into clear impacts and to address them in a coherent manner. Of course, the complexity of the terms and the mechanics of a cyber crisis makes this exercise tricky and is ano- ther particularity to take into account. In this context, through their ability to translate the technical cause into business consequences and more generally into lay- man’s terms, the CISO and their team’s role is central. During business as usual as well as in times of crisis, the CISO’s mission is the responsibility for translating the facts and
technical components not only into business impacts but also into understandable and convincing impacts for diverse non-expert audiences. They may also have to conceive or even bear responsibility for elements of crisis communication language in the same way that a human resources representative is exposed during a social crisis. Without presupposing their exposure on a major TV channel’s news program, informa- tion security experts’ words will be expec- ted on social networks, on professional networks, in the specialized press or in- house. In crisis communication, everyone is responsible for everything and everyone has to be prepared for it.
Thus, the subject of cyber carries a media power of its own; the immediate consequence of which is the considerable increase in expectations and requests to be informed from different divisions of an organization as well as from the public. If the impending occurrence of an information security incident involves a specific defense and continuity of operations planning, it also requires anticipation of these requests and an active preparation for this overall com- munication effort.
Swann LASSIVA, Consultant email@example.com
Ge t t i ng con t ra c t s r i gh t i s key In the vast majority of cases, SaaS provi- ders have no provisions in their contracts on how they will manage disaster recovery, even though they might stress their ability to handle that risk. In fact, contracts usually include default Act of God clauses stipula- ting that the supplier is not liable for a breach of contractual obligations if this is caused by an event beyond their reasonable control. The legal risks must therefore be addressed when framing the agreement, and these types of clauses should be removed to ensure an appropriate level of cover. Just as they do when framing conventio- nal contracts, customers must ensure that clear service level agreements are in place, in particular for disaster recovery. These need to cover: Recovery times (Recovery Time Objective – RTO) and data loss (Recovery Point – RPO) in the event of a disaster; The provider’s disaster recovery plan, including crisis management procedures , as well as the obligation to carry out conclusive tests every year with real-world scenarios, as part of the plan, with the customer having the option to review the test report; Financial penalties and the right to terminate the contract (in particular, with a provision to recover usable data) if commitments are breached. / / / / / /
to raw data, source codes, applications that could duplicate the infrastructure, etc.), so it has to rely on the provider’s goodwill. Leve l s o f d i s a s te r re cove r y a re va r i a b l e fo r S a a S , d e p e nd i ng o n t h e p rov i d e r ’ s d e g re e o f ma t u r i ty Three major trends are emerging: Providers who offer an inclusive disaster recovery plan. As part of their standard offering, the provider offers recovery at a remote data cen- ter, usually augmented with outsour- ced backup. However, they rarely of- fer commitments on recovery times. Examples are the big SaaS players (such as: Office 365, SalesForce, and SAP), as well as some intermediate players (such as Evernote, and Xero); Suppliers who offer outsourced backup only. In their case, there is no clearly established disaster recovery plan, as such. The customer then has to question the ability of the provider to restore backup files in the event of a disaster at the main site. Examples are intermediate suppliers (such as Zervant and Sellsy); Suppliers who don’t mention the is- sue or do not have anything in place. The subject of backup doesn’t even get raised, so it’s better to assume that nothing is being done. Small players are usually in this situation. / / / / / /
B u s i n e s s e s a r e i n c r e a s i n g l y u s i n g c l o ud s e r v i ce s (S a a S , Pa a S , a nd I a a S) i n t he i r I T env i ronmen t s . T hey p rov i de mo re f l ex i b i l i t y o n c o s t s a n d c a n b e mo re att ra ct i ve t han u s i ng conven t i o - nal IT infrastructure. In 2016, in France, 4 8 % o f c omp a n i e s emp l o y i n g mo r e t h a n 2 5 0 p e o p l e u s e d i t—a n i n c re a s e o f 1 2 p e r c e n t a g e p o i n t s , c omp a r e d w i t h 2 01 4 . T h e g re a te r ava i l a b i l i ty o f c l o u d i n f ra s t r u c t u re i s o f t e n i d e n t i - f i e d a s a n o p p o r t u n i t y. Howeve r, t h e r i s k o f f a i l u re o f a s e r v i c e p rov i d e r ’ s d a t a ce n te r i s ra re l y a dd re s s e d , eve n t h o u g h i t s s e r v i ce s re l y o n d a t a ce n- t e r s t h a t a re d e c i d e d l y p hy s i c a l a n d n o t i n t h e c l o u d . S u c h d a t a c e n t e r s f a c e t h e s ame t h re a t s a s t ra d i t i o n a l data centers: natural disasters, human e r ro r, e tc . How, t h e re fo re, c a n b a c k up b e p rov i - d e d fo r t h e s e c l o ud i n f ra s t r u c t u re s? S AA S COMPU T E R BAC KUP : T H E S E R V I C E P ROV I D E R ’ S R E S PON S I B I L I T Y TO PU T I N P L AC E SaaS (Software as a Service) is software that is made available on, and consumed directly from, the internet. It is managed by one or more providers. The customer does not have the wherewithal to carry out the bac- kup activities is case of disaster (no access THE CLOUD: THE END OF I T BACKUP – OR A NEW WAY OF DOING I T?
I AA S / PAA S D I S A S T E R R E COV E RY: T H E C U S TOME R ’ S R E S PON S I B I L I T Y TO PU T I N P L AC E Infrastructure as a Service (IaaS) is a stan- dardized, automated offering of computing, storage, and network resources owned and hosted by a provider, and made available to the customer on demand. A Platform as a Service (PaaS) offering is similar to an IaaS offer, but it is different in that it only applies to software development stack (database, EDI, business processmanagement…) accor- ding to Gartner’s definition. Unlike SaaS, disaster recovery remains the customer’s responsibility in both cases: IaaS/PaaS pro- vidersmake services available in various data centers, and the customer is responsible for their use and configuration. Two solutions are available to customers using these ser- vices: to entrust things to a provider, or manage it themselves. Cloud disaster recovery providers are refer- red to by the acronym DRaaS: Disaster Recovery as a Service. Initially, DRaaS pro- viders offered cloud-based IS disaster reco- very of an “on premise” datacenter. But, today, they also offer to provide recovery for infrastructure already in the cloud, such as AWS or Azure. Levels of maturity remain very variable, depending on the provider and which cloud is used. Some DRaaS providers require that their own cloud is used for reco- very, which means they cannot offer a PaaS recovery service. T h e ma r ke t fo r c l oud d i s a s te r re cove r y i s n o t a ma t u re on e
There are four main ones:
As with SaaS, there are no default contrac- tual provisions . Therefore, any guarantees required for data loss or recovery time will need to be negotiated. Suppliers generally promise to be able to tailor their offer to the customer’s requirements! To ensure that the recovery performs correctly, the customer must plan for disaster recovery tests to be carried out regularly (we recommend once a year). Op e ra t i ng yo u r own d i s a s te r re cove r y p l a n , u s i ng to o l s o f fe re d by t h e s upp l i e r For “on-premise” infrastructure, you will need to think about, and define, your DRP strategy right from the design phase. This strategy must include the option of per- forming tests to ensure a sufficient level of confidence in your plan. Implementation can be simplified by the tools offered by cloud providers, and the high levels of standardization in cloud envi- ronments. The major players have set out, in white papers, the key guidelines to follow in pursuing such a project (for example, AWS and Azure). Conceptually, these DRP strategies remain close to those used in “on-premise” data centers.
/ / backup and restore: simple backups of data and images of machines on a remote site, which are restored if an incident occurs; pilot light: replication of databases and the provision of machines, in the form of images, ready to be used if an incident occurs; warm standby: full replication of the main site (data and machines); the recovery site is undersized in perfor- mance terms but ready to scale up if an incident occurs; multi-site (or active-active): the two sites are identical and share the load from users. If an incident occurs, the remaining site can scale up to cover all users. Hybrid solutions that are better designed to take account of recovery time requirements, and cost and complexity considerations, can also be considered. / / / / / /
This option was available as a “public pre- view” at the end of May 2017. There is no equivalent project in train from the other main IaaS/PaaS providers.
The monitoring and alert tools, which are also on offer, are intended to facilitate in- service support and can be used to detect an incident in the shortest possible time, or in some cases, partially automate the acti- vation of a backup site. Lastly, this ability to provision new resources within a few minutes enables the associa- ted OPEX to be minimized. By using such a strategy, it’s possible to make gains of 40 to 70% on the cost of DRP infrastructure . During 2017, Azure is planning to offer an option to provide recovery for virtual machines hosted on its platform by enhan- cing its “Site Recovery” service. In fact, “Site Recovery”, in its current form, offers to support traditional site backup, by using the Azure cloud to host the secondary site, but Microsoft wants to extend this service to provide a Recovery as a Service option. This tool would allow the automatic deploy- ment of the secondary site (of the active- passive type), automatic data replication, and easier testing. Towa rd g re a te r s uppo r t by p rov i d e r s?
The real contribution that the cloud can make to DRP is the numerous tools that it can offer to simplify its implementation and activation. As a result, data replication can be simplified for asynchronous geo-replication options (where multiple copies are replicated to other regions). The RPO varies, depending on the types of data and tools involved. Aside from this option, local data redun- dancy is almost always included. The high degree of standardization also makes it possible to automate the recovery: the scripts or APIs made available by provi- ders make it possible to automate deploy- ment of infrastructures, resize instances (according to previously defined configura- tion), distribute loads and traffic, carry out IP addressing, etc., in order to considerably speed up a backup site’s activation time.
T H E C LOUD AND P ROV I D E R S Y S T EM I C R I S K
Backup of cloud-based services is dealt with differently, depending on the type of ser- vice used. SaaS recovery must be managed through contracts and are the responsibility of the provider, while IaaS/PaaS recovery, simplified by the tools available, remains the responsibility of the customer. There is a risk of the widespread failure of a provider’s hosting region as recent incidents have shown. Even though these incidents have been short-lived, or have had minor impacts, the possibility of widespread failure cannot be ignored. The issue of cyber-resi- lience, then, must still be dealt with. Using a second cloud provider can cover the risk of destruction, or a major outage of a first provider’s infrastructure. This solution is very complex because portability between pro- viders is a difficult issue. For now, there are few companies that have risked it, although Snapchat is an example: it uses Google’s cloud for its production, and plans to use Amazon’s for its DRP within five years. Etienne LAFORE, Manager firstname.lastname@example.org Lesly MERINE, Senior Consultant email@example.com Valentin LEMENUT, Consultant firstname.lastname@example.org
Discover our expertise on Risk Insight: Riskinsight-wavestone.com
Director of the publication: Frédéric GOUX Editor-in-chief: Gérôme BILLOIS Contributors: Denis BLANDIN, Frédéric CHOLLET, Swann LASSIVA, Etienne LAFORE, Lesly MERINE, Valentin LEMENUT ISSN 1995-1975
Wavestone is a consulting firm, created from the merger of Solucom and Kurt Salmon’s European Business (excluding retails and consumer goods outside of France). The firm is counted amongst the lead players in European independent consulting. Wavestone’s mission is to enlighten and guide their clients in their most critical decisions, drawing on functional, sectoral and technological expertise.
2018 I © WAVESTONEPage 1 Page 2 Page 3 Page 4 Page 5 Page 6 Page 7 Page 8 Page 9 Page 10 Page 11 Page 12 Page 13 Page 14 Page 15 Page 16
Made with FlippingBook HTML5