The clustering technique has attracted a lot of attention as a promising
strategy for parallel debugging in multi-fault scenarios, this heuristic
approach (i.e., failure indexing or fault isolation) enables developers to
perform multiple debugging tasks simultaneously through dividing failed test
cases into several disjoint groups. When using statement ranking representation
to model failures for better clustering, several factors influence clustering
effectiveness, including the risk evaluation formula (REF), the number of
faults (NOF), the fault type (FT), and the number of successful test cases
paired with one individual failed test case (NSP1F). In this paper, we present
the first comprehensive empirical study of how these four factors influence
clustering effectiveness. We conduct extensive controlled experiments on 1060
faulty versions of 228 simulated faults and 141 real faults, and the results
reveal that: 1) GP19 is highly competitive across all REFs, 2) clustering
effectiveness decreases as NOF increases, 3) higher clustering effectiveness is
easier to achieve when a program contains only predicate faults, and 4)
clustering effectiveness remains when the scale of NSP1F is reduced to 20%