Benchmarks: UMD2007b

UMD2007b

Used in:
Strecker, J. and Memon, A. M. Relationships between test suites, faults, and fault detection in GUI testing. In ICST '08: Proceedings of the First International Conference on Software Testing, Verification, and Validation, Apr. 2008.
(Paper | Full list of publications)

(Image created with Graphviz. Click here to view .dot file.)

Artifacts

(Items marked with * are not available.)

Application under test

Compile with make all (CrosswordSage 0.3.5) or cd src; ant (FreeMind 0.8.0). Run with make run. The scripts package_crosswordsage-0.3.5, package_freemind-0.8.0, and replay_remote, while not distribution-ready and not recommended to be executed by others, are provided as documentation to show the commands and environment ("package") in which each application was run in the experiment.

Application under test, instrumented

Run with the oracle generator.

Application under test, faulty

In the collections (zip files) of faulty versions, each numbered directory contains one version, which contains one fault (one or a few changed lines in a single source file). Substitute the faulty class file into the non-faulty version of the application.

In the fault summaries, each line represents a fault. Its location (file and line), mutant type (c=class-level, s=method-level), and the change it induces on the source code are given. In generating the reports for the ICST paper, the fault summaries marked ICST above were used. However, it was noticed later that the line numbers for some of the faults (40 for CrosswordSage, 2 for FreeMind) were slightly off, so the corrected fault summaries are also provided here.

Supporting files

Test cases

The test suites consist of lists of test case names. These names correspond to test case files in the set of test cases.

Each test suite was constructed by randomly selecting a test-case length between 2 and 20 (inclusive) and then picking test cases until all GUI events were covered. The test pool consists of length-20 test cases (not counting reaching events); shorter test cases (e.g., length-19) are constructed by taking the prefix of a length-20 test case.

For some test cases in the pool, GUITAR failed to run all 20 steps. If a test case tended to fail for this reason on the nth step, then it was not included in any test suites of length-(n+1) or longer test cases. The value of n for each test case is given in the tables of adjusted test-case lengths.

Coverage information

In the coverage-by-step files, a listing of 4:3=2 means that the 3rd step/event of test case 4 covers the statement 2 times.

Report

In the test-suite fault (or coverage) matrix, each row corresponds to one test suite run on one mutant. Det (or Cov) is 1 when the test suite detects the fault (or covers the line containing the fault).

When a faulty line lies in a class-variable declaration (i.e., not inside a method), no statement-coverage information gets collected for it. Therefore, the coverage matrix says the line is covered whenever any statement in its Java file is covered. While this overestimates the number of times the fault is covered, it is a reasonable approximation for identifying when the fault is first covered.

In the test-case fault matrix, each row corresponds to one test case run on one mutant. Each SameOrc column corresponds to a step in the test case. SameOrc is 1 when the oracle information for the clean and mutant versions is the same, 0 when there is a non-trivial difference (i.e., the fault is detected), and -1 if the test step didn't run properly on the clean version.

In creating the ICST test-suite fault matrix, a heuristic was used to screen out false reports of fault detection (i.e., cases where GUITAR claimed the fault was detected but in fact it should not have been). If (1) the coverage information did not show that the test case covered the statement containing the fault and (2) the fault was located in a statement that was directly tracked by the coverage reports (i.e., it was inside a method), then the fault was not considered to be detected regardless of what the test-case fault matrix said. Note that screening happened at this stage, not earlier; the test-case fault matrices given here were NOT screened for false reports.

The corrected fault matrices were derived from the ICST fault matrices. Each fault was manually inspected to determine if it could ever be detected by GUI testing. For undetectable faults, all reports of fault detection were eliminated. For detectable faults, test cases in the test suite paired with the fault for analysis (SuiteNum i was paired with FaultNum i+1) that were reported to detect the fault were re-run on a private cluster (as opposed to the publicly-available distributed system on which they were originally run) to reduce timing issues, and the new reports of fault detection were checked against the coverage-by-step information (as opposed to coverage information at the granularity of test cases).

Executables

Instrumentor

instr

Oracle generator

JavaGUIReplayer2

Run replay_crosswordsage-0.3.5 or replay_freemind-0.8.0 with the -h flag for usage information.