530 research outputs found

    Data-flow-based evolutionary fault localization

    Get PDF
    Fault localization is the activity of precisely indicating the faulty commands in a buggy program. It is known to be a highly costly and tedious process. Automating this process has been the goal of many studies, showing it to be a challenging problem. The coveragespectrum based approaches commonly apply heuristics grounded on the execution of control-flow components to calculate the odds of each program element to be the defective one. The present study aims to investigate another source of fault information by assessing how data-flow analysis are useful to compute suspiciousness scores; and how the combination of scores from different sources impacts fault localization. We present an approach to calculate the suspiciousness score for each program command by using the execution of data-flow components. Then we use an evolutionary algorithm to search sets of weights to combine heuristics from distinct sources of fault data (both control-flow and data-flow as well as a hybrid strategy). The approach was applied in programs with seeded faults and real faults and evaluated by using absolute metrics to asses its efficacy to locate faults. Furthermore, we introduce a new metric to investigate the dependence of tie-break strategies in building the ranking of suspicious commands. Data-flow based methods demonstrate high effectiveness but increase the need for tie-breaks, unlike the evolutionary hybrid method that keeps competitive the effectiveness and depends less on tie-break strategies

    Efficient Data Representation by Selecting Prototypes with Importance Weights

    Full text link
    Prototypical examples that best summarizes and compactly represents an underlying complex data distribution communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST) and insight (on CDC datasets) which arguably are the hallmarks of an effective data mining method.Comment: Accepted for publication in International Conference on Data Mining (ICDM) 201

    Transforming Graph Representations for Statistical Relational Learning

    Full text link
    Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of statistical relational learning (SRL) algorithms to these domains. In this article, we examine a range of representation issues for graph-based relational data. Since the choice of relational data representation for the nodes, links, and features can dramatically affect the capabilities of SRL algorithms, we survey approaches and opportunities for relational representation transformation designed to improve the performance of these algorithms. This leads us to introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. In particular, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey and compare competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed
    • …
    corecore