12,051 research outputs found

    Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach

    Full text link
    The definition of a concise and effective testbed for Genetic Programming (GP) is a recurrent matter in the research community. This paper takes a new step in this direction, proposing a different approach to measure the quality of the symbolic regression benchmarks quantitatively. The proposed approach is based on meta-learning and uses a set of dataset meta-features---such as the number of examples or output skewness---to describe the datasets. Our idea is to correlate these meta-features with the errors obtained by a GP method. These meta-features define a space of benchmarks that should, ideally, have datasets (points) covering different regions of the space. An initial analysis of 63 datasets showed that current benchmarks are concentrated in a small region of this benchmark space. We also found out that number of instances and output skewness are the most relevant meta-features to GP output error. Both conclusions can help define which datasets should compose an effective testbed for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary Computation Conference Companion, Kyoto, Japa

    Constraints, Lazy Constraints, or Propagators in ASP Solving: An Empirical Analysis

    Full text link
    Answer Set Programming (ASP) is a well-established declarative paradigm. One of the successes of ASP is the availability of efficient systems. State-of-the-art systems are based on the ground+solve approach. In some applications this approach is infeasible because the grounding of one or few constraints is expensive. In this paper, we systematically compare alternative strategies to avoid the instantiation of problematic constraints, that are based on custom extensions of the solver. Results on real and synthetic benchmarks highlight some strengths and weaknesses of the different strategies. (Under consideration for acceptance in TPLP, ICLP 2017 Special Issue.)Comment: Paper presented at the 33nd International Conference on Logic Programming (ICLP 2017), Melbourne, Australia, August 28 to September 1, 2017. 16 page

    Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset

    Full text link
    Defects4J is a large, peer-reviewed, structured dataset of real-world Java bugs. Each bug in Defects4J is provided with a test suite and at least one failing test case that triggers the bug. In this paper, we report on an experiment to explore the effectiveness of automatic repair on Defects4J. The result of our experiment shows that 47 bugs of the Defects4J dataset can be automatically repaired by state-of- the-art repair. This sets a baseline for future research on automatic repair for Java. We have manually analyzed 84 different patches to assess their real correctness. In total, 9 real Java bugs can be correctly fixed with test-suite based repair. This analysis shows that test-suite based repair suffers from under-specified bugs, for which trivial and incorrect patches still pass the test suite. With respect to practical applicability, it takes in average 14.8 minutes to find a patch. The experiment was done on a scientific grid, totaling 17.6 days of computation time. All their systems and experimental results are publicly available on Github in order to facilitate future research on automatic repair

    Performance analysis of a parallel, multi-node pipeline for DNA sequencing

    Get PDF
    Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53%

    TOR: modular search with hookable disjunction

    Get PDF
    Horn Clause Programs have a natural exhaustive depth-first procedural semantics. However, for many programs this semantics is ineffective. In order to compute useful solutions, one needs the ability to modify the search method that explores the alternative execution branches. Tor, a well-defined hook into Prolog disjunction, provides this ability. It is light-weight thanks to its library approach and efficient because it is based on program transformation. Tor is general enough to mimic search-modifying predicates like ECLiPSe's search/6. Moreover, Tor supports modular composition of search methods and other hooks. The Tor library is already provided and used as an add-on to SWI-Prolog.publisher: Elsevier articletitle: Tor: Modular search with hookable disjunction journaltitle: Science of Computer Programming articlelink: http://dx.doi.org/10.1016/j.scico.2013.05.008 content_type: article copyright: Copyright © 2013 Elsevier B.V. All rights reserved.status: publishe
    corecore