DCSIMG

Govt IT system crash was a major incident

Tynwald buildings, Douglas

Tynwald buildings, Douglas

A software bug in the government’s IT network which caused 57 systems to freeze for some 18 hours was treated as a major incident, a Tynwald committee heard.

Mark Lewin, director of the Information Systems Division, apologised for the fault and admitted it was still not known exactly how it was caused.

He told the Economic Committee that lessons had been learned and steps taken to reduce the chance of the incident happening again.

The committee heard that while the fault had not affected the police or hospital systems, it had led to results of blood tests being delayed. There were problems, too, for social care’s child register with staff unable to look at their appointments schedule.

And the incident has raised the prospect of elements of the centralised government system - consolidated at great cost to the taxpayer - having to be ‘decoupled’, separating some systems so they wouldn’t be affected if the main network went down.

Mr Lewin told the hearing: ‘We managed it as a major incident. We recognised it caused a significant impact across a range of government areas.

‘Clearly there are lessons to learned. We have taken steps to ensure the probability of it happening in the future is minimised. We are disappointed it happened and we apologise. We agreed it was unacceptable and should not have happened.’

The IT fault started at 1.50am on November 11 this year. Some 57 systems out of a total of 1,000 were left in a suspended state until a full recovery was made that evening.

Mr Lewin told the committee there had been a total of three software errors that had resulted in the controller that sits above the discs having a ‘panic’.

It was designed to recover itself but didn’t do so as expected.

‘As to what caused that incident we may never know, It could be a number of things - a blip on the network or back-up processor,’ he said,

He said the system had been designed and installed to be resilient.’

Committee chairman Leonard Singer MHk asked if lives or safety had been put at risk.

Mr Lewin replied: ‘We don’t believe it had that level of effect.’ He confirmed that the Department of Social Care’s children services had been unable to look at its appointments schedule.

Dudley Butt MLC said there had been complaints that the GP system was still ‘very slow and not functioning properly’. He claimed the results of blood tests had not been available for five days.

Mr Lewin said the problem with performance of the GP system was unrelated to November’s incident. That system was five year’s old and needed upgrading, he told the committee.

He said that while significant savings had been made by consolidating the government’s IT system, there were ‘keys systems that need to be decoupled’.

Mr Lewin said that since the incident, ISD had changed its policy on emergency ‘patches’. He said if such patches had been applied, the system would have recovered ‘more gracefully’.

Some patches are ‘intrusive’, leading to a reluctance to apply them all. But he said: ‘Our stance had now changed - every patch that comes out we will apply it.’

 

Comments

 
 

Back to the top of the page