Background
The new signalling system of TWL developed by the contractor is divided into two control zones. In each control zone, it comprises three signalling zone controller computers, namely the Primary("A), Hot-standby("B"), and Warm-standby("C") computers.
Computers A, B and C are of the same hardware and loaded with common software.
They are configured to perform functions of Computer A, B and C through a hardware identity plug, which allows the common software to process dynamic data among the three computers correspondingly.
Computer C only receives selected dynamic data from Computers A/B so as to avoid common mode failure.
This configuration aims to improve system availability and service recovery through high resilience
Computer C is housed at a different station which enhances system security through access control and diverse power supply.
The Panel agrees the warm-standby arrangement is novel in contractor's signalling system application for reducing the recovery time during signalling failure incidents.
Cause
The Panel found that the contractor made three software implementation errors when performing a software change in 2017, to achieve the design intention of avoiding common mode failure in Computer C, should there be a problem in computers A and B.
To do that, the contractor needs to exclude selected data to be transferred from computer A/B to computer C, and the excluded data should be re-created by computer C, so as to avoid common mode failure.
three implementation errors:
1. internal software development documents of the contractor's software team did not denote clearly the exclusion of "Conflict Zone Data" from being transferred to computer C. This led to no subsequent specific test, risk assessment or safety analysis, including laboratory verification simulation and on-site testing, being done to verify the "Conflict Zone Data" when computer C took over the control of the signalling system.
2. the contractor made a software implementation error which resulted in computer C not re-creating the "Conflict Zone Data"properly
3. while the "Conflict Zone Protection" was absent in computer C, the software logic developed by the contractor did not stop the computer from taking over the control of the system. The absence of the conflict zone protection resulted in the incident.
The Panel also concluded that the software implementation errors reflected inadequacies in ATDJV's software development process with respect to
software quality assurance,
risk assessment and
the extent of simulation
on this software ( "Conflict Zone Data" re-creation") change.
Recommendations
to prevent recurrence
(a) replace software design and development team
(b) confirm after the software fix
(c) traceable in the changes
(d) external independent software assessor (ISA) for Quality Assurance and Audit
To assist ATDJV
(a) expand scope of ISA
(b) upgrade training simulator
contractor : Alstom-Thales DUAT Joint Venture
btw, where is the KISS principle applied? Is MTR relying too much on the contractor's deliverables to carry out the drills? Does MTR understand and, before the March 18 drill, cross check the "Conflict Zone protection" is properly working?
press release
wiki
facebook
The new signalling system of TWL developed by the contractor is divided into two control zones. In each control zone, it comprises three signalling zone controller computers, namely the Primary("A), Hot-standby("B"), and Warm-standby("C") computers.
Computers A, B and C are of the same hardware and loaded with common software.
They are configured to perform functions of Computer A, B and C through a hardware identity plug, which allows the common software to process dynamic data among the three computers correspondingly.
Computer C only receives selected dynamic data from Computers A/B so as to avoid common mode failure.
This configuration aims to improve system availability and service recovery through high resilience
Computer C is housed at a different station which enhances system security through access control and diverse power supply.
The Panel agrees the warm-standby arrangement is novel in contractor's signalling system application for reducing the recovery time during signalling failure incidents.
Cause
The Panel found that the contractor made three software implementation errors when performing a software change in 2017, to achieve the design intention of avoiding common mode failure in Computer C, should there be a problem in computers A and B.
To do that, the contractor needs to exclude selected data to be transferred from computer A/B to computer C, and the excluded data should be re-created by computer C, so as to avoid common mode failure.
three implementation errors:
1. internal software development documents of the contractor's software team did not denote clearly the exclusion of "Conflict Zone Data" from being transferred to computer C. This led to no subsequent specific test, risk assessment or safety analysis, including laboratory verification simulation and on-site testing, being done to verify the "Conflict Zone Data" when computer C took over the control of the signalling system.
2. the contractor made a software implementation error which resulted in computer C not re-creating the "Conflict Zone Data"properly
3. while the "Conflict Zone Protection" was absent in computer C, the software logic developed by the contractor did not stop the computer from taking over the control of the system. The absence of the conflict zone protection resulted in the incident.
The Panel also concluded that the software implementation errors reflected inadequacies in ATDJV's software development process with respect to
software quality assurance,
risk assessment and
the extent of simulation
on this software ( "Conflict Zone Data" re-creation") change.
Recommendations
to prevent recurrence
(a) replace software design and development team
(b) confirm after the software fix
(c) traceable in the changes
(d) external independent software assessor (ISA) for Quality Assurance and Audit
To assist ATDJV
(a) expand scope of ISA
(b) upgrade training simulator
contractor : Alstom-Thales DUAT Joint Venture
btw, where is the KISS principle applied? Is MTR relying too much on the contractor's deliverables to carry out the drills? Does MTR understand and, before the March 18 drill, cross check the "Conflict Zone protection" is properly working?
press release
wiki
留言
張貼留言