It is one of the worst IT meltdowns to hit British Airways in recent memory – thousands of passengers had flights disrupted or cancelled at the weekend and there were chaotic scenes at London’s Heathrow and Gatwick airports.
Some travellers are still waiting to be reunited with their luggage, five days later.
But what really went wrong? No-one seems to have a firm answer.
BA has apologised and blamed a “power surge” affecting IT equipment – but many engineers have reacted sceptically to that, pointing out that major firms are meant to have redundancy plans in place to avoid disruption when primary systems fail.
What has BA said?
In a slightly more detailed statement on Wednesday, the airline said a loss of power to a UK data centre was “compounded” by a power surge that took out its IT systems.
The firm claimed that this did not constitute an IT failure, but rather “it was an electrical power supply which was interrupted”.
An investigation is being carried out and it has been reported that BA’s board is set to demand an external inquiry into what happened.
How has the explanation been received?
BA’s statement has failed to satisfy everyone.
Several IT workers expressed doubt and the explanation was labelled “too simplistic” by independent defence and aerospace analyst Howard Wheeldon.
One of the chief questions that remains unanswered is why a back-up or secondary system did not come into play, even if a power surge affected the main one?
Could a power surge be the culprit?
According to one informed observer, yes – especially under specific circumstances.
Data centres generally rely on an uninterruptible power supply, or UPS, which is designed to keep providing power to a data centre even if the mains supply fails.
This secondary source of power could be based on batteries or a generator running on fuel.
As independent IT consultant Marcel van den Berg pointed out, a power surge might have occurred after this secondary power supply failed.
Since the UPS might also be designed to protect systems from power surges, without it servers might have been made vulnerable.
The Daily Mail has reported that the UPS system at Boadicea House, the home of one of BA’s data centres near Heathrow, failed on Saturday.
Where did the surge come from?
Practically any piece of equipment could cause a power surge, perhaps due to a fault, for example.
But to stick with the UPS line of inquiry, one provider – UPS Systems – notes on its website: “Power surges could be caused by the shut-down of a generator or other industrial motor on the local supply circuit.
“Will cause systems to crash, can cause components to wear and degrade over extended periods.”
What about disaster recovery?
Mr van den Berg told the BBC that while a power surge was a valid explanation in principle, it was still unclear why such an event had the catastrophic impact that it did.
“This shouldn’t have happened because there should be enough resilience to allow another UPS to take over or a secondary data centre,” he said.
Many large businesses have “disaster recovery” plans in place – often these involve the capability to quickly switch operations to a back-up data centre in a completely different location.
BA has not revealed whether, for example, it was unable to activate such a facility.
Is outsourcing to blame?
BA recently outsourced some of its IT contracts to India’s Tata Consultancy Services (TCS) and some have questioned whether TCS is to blame.
Sunbird and AIT Partnership Group, two firms that have in the past provided software and services to BA’s data centres, released a statement on Wednesday saying they “had no involvement” with the recent incident.
British Airways denied that outsourcing jobs had anything to do with the power issues.