The company did many things wrong, it’s an almost idealised example of total failure to take software seriously.
Most importantly they decided they didn’t need to test the software on their new machines because they’d already shipped previous machines running the software, so they “knew it worked”. The previous machines had hardware interlocks that made it impossible for the software to cause a massive dosing errors, the new machine was entirely software controlled.
Also they had exactly 1 “very smart” engineer build the software, who obviously wrote it for a hardware-safe machine. To be fair, I’m sure he was very smart, but safety critical and solo projects are not a great combo.
Also they had no mechanisms to ensure failures would be communicated to their engineers for investigation (failures were reported to them and then dropped into a black hole and forgotten about).
Also they didn’t even have any capability to test their machines after failures started popping up, because they knew the code worked perfectly so they didn’t need to waste any time or money on qa capability, massively slowing down their ability to fix things once people started dying
The company did many things wrong, it’s an almost idealised example of total failure to take software seriously.
Most importantly they decided they didn’t need to test the software on their new machines because they’d already shipped previous machines running the software, so they “knew it worked”. The previous machines had hardware interlocks that made it impossible for the software to cause a massive dosing errors, the new machine was entirely software controlled.
Also they had exactly 1 “very smart” engineer build the software, who obviously wrote it for a hardware-safe machine. To be fair, I’m sure he was very smart, but safety critical and solo projects are not a great combo.
Also they had no mechanisms to ensure failures would be communicated to their engineer
sfor investigation (failures were reported to them and then dropped into a black hole and forgotten about).Also they didn’t even have any capability to test their machines after failures started popping up, because they knew the code worked perfectly so they didn’t need to waste any time or money on qa capability, massively slowing down their ability to fix things once people started dying
The single engineer wasn’t mentioned on the podcast, episode but the rest of it was. It’s a really instructive story.
Really, the whole podcast is this kind of story.