Verifying embedded software functionality: fault localization, metrics and directed testing

Editor’s Note: In the third in a four part series Abhik Roychoudhury, author of Embedded Systems and software validation, discusses the pros and cons of metric base fault localization and directed testing for assessing software functionality.

So far, in Part 1 and Part 2 in this series, we have presented the dynamic slicing method, which is fully formal and requires examination of the control/data dependencies in an execution trace.

But, the difficulties in using it include (a) time and space overheads for storing/analyzing program traces and (b) potentially large slice sizes. In the preceding, we examined methods to deal with the second problem – comprehension of large slices.
However, we still have to grapple with the time and space overheads of dynamic slicing. As observed earlier, state-of-the-art dynamic slicing tools employ various tricks such as online compaction of the execution trace and program dependence analysis on the compact trace (without decompressing it).
Nevertheless, the time and space overheads for large real-life programs is still substantial, and the quest for lightweight methods remains. In this part in the series, we will discuss a class of such lightweight methods. In the following we use the terms execution trace and execution run interchangeably. Indeed, the existing literature on software debugging also uses these two terms interchangeably. Before proceeding any further, let us first give an illustrative example.
Illutrating the problem with Siemens TCAS
Our example is a fragment of the TCAS program from the Siemens benchmark suite, which has been extensively used in the software engineering community for research in testing/debugging. The TCAS program is an embedded software for altitude control.
In Figure 5.9 below, we show a fragment of the program. Note that Climb and Up are input variables of the program. There is a bug in the following program fragment, namely, lines 2 and 4 are reversed in order. In other words, line 2 should be separation = Up + 100 and line 4 should be separation = Up.

Now, consider an execution of the some program fragment with the inputs Climb = 1and Up = 100. The execution will lead to “Downward” being printed. Clearly, this is unexpected, because the developer would expect “Upward” to be printed for these inputs. Thus, the trace for the inputs Climb = 1, Up = 100 is a failing run that needs to be debugged.
We now have an example of a failing run, but what is a successful run?Asuccessful run is simply one where the program output is as expected. So, if the programmer expects the output to be “Upward,” the program should print “Upward,” and if the programmer expects the output to be “Downward,” the program should print “Downward.”
Consider the program execution with the inputs Climb = 0 and Up = 0. The output in this case is “Downward,” and this matches the developer’s expectations. Hence we deem this as a successful run. Usually, the process of determining whether a given run is failed or successful cannot be fully automated.
This involves matching the program output with the developer’s expectation, so the task of articulating the developer’s expectation remains manual. We have now explained what we mean by failing run and successful run. Our task is to debug a given “failed” run-to explain why it failed, that is, why the program output was not as expected.
We are trying to do so by comparing it with a successful run (where the program output was as expected) in order to gain insights about what went wrong in the failed run. The computed “difference” between the failed run and the chosen successful run is reported to the programmer as the bug report. The key questions now are:
1 – Given a failed run, how do we choose a successful run?
2 – Given a failed and a successful run, how do we compute their difference?
Both the questions have their answers in a evaluation metric for execution runs. A common (and very rough) metric is the set of statements executed in an execution run. If we have a successful run and a failed run, we can compute their difference by computing the difference of the set of statements executed.
The question now is how to get a successful run? In other words, how do we choose a successful run corresponding to a given failed run α f ? We will choose a successful run α s such that the set of statements executed in α s is “close” to the set of statements executed in α f .
In fact, given a program P and failed execution run αf in P, we can do the following:
1 – Typically the program P will be endowed with a test suite (set of test cases) based on some coverage criteria (covering all statements or all branches in the program).We construct the execution runs for the test cases from the test suite. Let this set of execution runs be Runsall (P).
2 – From among the execution runs in Runsall (P), we chose those that are successful, that is, runs where the program output meets the programmer’s expectations.
Let this set be Runsall clearly Succ Runsall (P)sube; Runsall (P). 3 – We choose an execution run αs ∈ Suc Runsall (P) such that the quantity | stmt αf – stmtαs | is minimized. Here stmt(_) is the set of statements in an execution run α and | S| is the cardinality or the number of elements in a set S. Note that for two sets S1 and S2, the quantity S1 – S2 denotes the set difference, that is, elements appearing in S1 but not in S2.
Thus, we choose a successful execution run αs, such that there are only a few statements appearing in the failed run α’f , but not in α’s. The idea here is that if a statement appears only in the failed run but not in the successful run, it is a likely error cause.