It seems the debate over the benefits of better software verification is still alive and well. I just read a blog post by Frank Schirrmeister on Software Developer Attitude and the topic of hardware vs. software methodology. Part of the post brings up the argument that the cost of failure for hardware is very clear. The deadlines are fixed by tape out, and if the device doesn't work it means major schedule slip, lost revenue, etc.
The result of this high risk is that engineers take great care to avoid such negative consequences. The post reminds us that software is "soft" and can always be fixed by downloading patches, updating to the latest version, or better yet the system automatically downloads the new updates so it is always up to date.
Many of the software engineers I talk to understand that updating software is not so easy. I'm of the opinion that treating software more like hardware would be of great benefit, although I readily admit I don't always do it myself. I also think hardware verification is probably not as rigorous as we maintain and that the gap between hardware and software quality is really not that big. Hardware verification engineers tell me they just run out of time and cannot verify as much as they would like to.
When I was a hardware designer working in a server company we were designing systems using Intel processors. It was at the time when the Pentium took over from the i486. Although my memory is not that good I do recall getting a book of errata from Intel with each revision of the chip and often changing to the latest sample chips as we developed the processor boards. The book usually contained a list of various functional issues such as don't run this sequence of instructions, don't turn on this mode, and of course make sure to have a big heat sink and plenty of fans. Obviously this is very old data, but I was able to find an article about the machine from 1996. I expect somebody out there can tell similar stories about early silicon and actually using approaches that allow for multiple revisions of a chip.
The software world is not that much different. Coding probably takes about 20% of the overall time and testing takes anywhere between 40% and 80% depending on the level of quality required for the application (also from the The Mythical Man Month by Brooks, see link below).
One of the key aspects of software is that the longer it takes to find a bug the more it costs to fix. Currently, I write software at Cadence that goes into Specman. I can tell you that if I find a problem when I'm first developing a feature or during the "development window" I can fix it very easily just by changing the code.
I don't have to tell anyone, enter anything in the bug tracking system, or fill out any forms of any kind. If I find a problem in the same code after the release reaches beta, I need to enter it in the bug tracking system, prepare a new clearcase view with the fix, tell the release manager that I have a new fix to make, discuss why it's very much needed, is low risk and won't disrupt the release by breaking a bunch of other stuff, rerun all 80,000 tests, schedule a day to merge my fix, and do it when I get the go ahead to make the change.
This is a much bigger overhead compared to the first case when I just made the change by myself and was done. If an issue is found by a customer after the product ships the process is similar except there is some additional overhead to communicate with the user and maybe the customer support people about the issue and the solution. I can also tell you that the time from the initial code freeze until the product reaches downloads.cadence.com could be in the 4 to 8 week range, maybe not that much different from the time required to get first samples of a chip.
Sure, there is still some flexibility in case of a major issue may not be there for a chip, but you can see life is not as easy as just finding a bug and fixing it and telling somebody to download the fix.
Also remember, my story is EDA software that runs on an ordinary workstation. Embedded systems usually have more constraints on changes to software. For example, to change firmware in a disk drive may require an entire certification sequence to be run in a lab with a rack of hundreds of drives.
Such certification may take a couple of weeks to complete the required number of usage hours to declare the firmware certified for customer usage. My experience is that any serious software project, especially when it also involves custom hardware is not as easy as "just download service pack 2".
I would be willing to bet that the number of man hours that went into testing service pack 2 was just as much as testing the original release. In each subsequent update release the improvements to the software shrink and the testing time remains constant.
The key to improving software verification is to be able to put a metric or value on the result of the added verification, especially when it is added early in the process. Most software engineers have good intentions to test early and test often, but in reality nobody is really measuring much except everybody wants to know if the software engineer can get the features done before the code freeze. In this environment, it's difficult to decide to introduce extra testing that takes more time when the software might just work without it.
The conclusion seems to link back to my previous post on the need to have dedicated engineers assigned to verification, more metrics on the costs of fixing bugs at various stages in the life cycle, and better ways to eliminate bugs as early as possible. Better scheduling tools based on actual past activity may also help by avoiding the usual rush just before code freeze.
There are people in Cadence working on these issues for our own software, and I believe there are many opportunities to develop products that will help embedded software engineers by providing increased automation to start down the road to better software verification. Happy coding and may all your bugs be found quickly and painlessly.