The cloud that won't be blown away even by a tornado
The story that describes our unique experience is not one that everyone has experienced in Central Europe. Is your IT operation ready for a natural disaster or a major accident?
Today, with business processes becoming increasingly digitalised, many managers are dealing with the topic of ensuring the continuity of a business or operation of an organization. The operation of organizations is increasingly dependent on IT, bringing many risks – such as technology failures, human factor error, cyberattacks, or natural disasters. Placing critical services and processes in the cloud is a great help in ensuring high IT availability.
But what if you have reasons for not using public international cloud services, and you prefer the private version which has high availability in the Czech Republic? In June 2021, our data centre which offers these services, passed a unique live test, previously unparalleled in Europe. It was able to maintain the operation of customer services even after a direct hit from a tornado with a strength of F3-F4! That is a story worth reading.
High availability with Tier III certification
For uninterrupted operation, the data centre has all critical components fully redundant. For power supply, two independent routes are used. In the event of a network power failure, there are two backup motor generators. These have a supply of fuel for 48 hours of operation and there is a possibility to extend the required time as needed. This enables continuous operation, even without an external power supply. To switch to motor generators, both supply branches are also backed up by modular UPSs, which can work for at least 1 hour. The same level of security applies to the cooling method, where there are 3 turbocharger units in N+1 mode.
Of course, it is also necessary to ensure data connectivity for uninterrupted operation, therefore data connections with national and transnational networks are fully backed up. As a result, the data centre meets the TIER III certification level, and has a guaranteed availability of 99.981%, i.e. only 1.6 hours of failure per year.
To minimize risks, there is another geographically separated data centre in Prague. Not only can traffic and services be transferred there, but both data centres also have redundant interconnection and external connectivity. At the same time, all critical infrastructure systems that ensure the safe and uninterrupted operation of the entire data centre have their own autonomous monitoring system directly connected to the monitoring centre. Qualified operators of these critical technologies are also constantly present in the data centre.
Professional arms are the foundation, but only direct combat will prove whether this is enough.
An F3-F4 tornado struck Lužice on 24 June, 2021, in the early evening hours, without warning. A data centre surveillance specialist saw a power failure alarm, but then everything happened so fast that he fled to the engine room to save himself. The media showed how devastated the area around the data centre building was. But what was going on inside at that moment? The data centre triggered the activation of a crisis plan. Customers agreed that non-critical systems had to be shut down, and other specialists were relocated. The area was difficult to access, there was a huge amount of rubble in the area. The direct strike of the tornado seemed to have damaged the motor generators, and the data centre had to rely only on the UPS in the first hour. The engine generators were put into operation in time, but problems persisted with the turbochargers for cooling. The tornado strike was extremely strong.
So, what happened after? In theory, limited resources would have allowed the data centre to cool, stopping everything from running. In the event of a collapse, it would have been possible to transfer the operation to the backup centre in Prague, but in the end, everything remained primarily in Lužice. Additional technicians and spare parts were sent to the site for repairs. At midnight, the police closed the villages in the area. There was another complication though. On arrival, technicians had to prove their identity with documents confirmed by the company management to the bodies of the integrated rescue system. However, at night the cooling system turbocharger was put into operation – and the operation of the data centre came to life. In the morning, customers were informed that they could start running their systems again.
From the morning, other repair works on the turbochargers were also going on. The situation was complicated due to the failure of one of the motor generators, plus another incident had been reported - the possible collapse of the data line. This was supplied by the UPS, and the supplier was unsure if they would succeed in putting diesel aggregate into operation due to the situation in the area. We were able to remain calm because the two lines were available for data connectivity. In the morning, another turbocharger and data centre cooling was repaired. At noon, an additional crucial aspect was commissioned - a coffee machine. Later that day, the damaged motor generator was repaired and the operation could be transferred to it. At that time, all systems – even uncritical ones – were already running.
Early morning the next day, one of the data connectivity lines failed to work. This was no problem, however, because it had been expected. It was operated on the 2nd line and fixed within 2 hours. Sunday was calm, routine monitoring was in progress, re-transition to another motor generator was made, and the arrival of the diesel tank was planned. Electric power supply in the location was not so stable in the area.
The most enjoyable thing is always the reaction of customers
The situation required the highly professional approach of many specialists. They had to deal with difficult access to the location due to the rubble, as well as their own psychological difficulties. Despite this, the reaction of some customers was encouraging.
"As a customer of the mentioned data centre, I can confirm that if we had not seen this disaster on the news, we would not have even noticed anything. Great job under unpredictable conditions."
"The backbone of our systems is in this datacentre. When everything failed on Thursday at 10 p.m. and we saw the chaos on TV, we didn't believe that on Friday it would work again. But at 4:00 a.m., everything began – all shipments were delivered and picked up on Friday, without any effect on our customers."
"Despite the monstrous disaster that the datacentre went through, everything worked and still works. The communication was really amazing."