12,051 research outputs found
Analysing Symbolic Regression Benchmarks under a Meta-Learning Approach
The definition of a concise and effective testbed for Genetic Programming
(GP) is a recurrent matter in the research community. This paper takes a new
step in this direction, proposing a different approach to measure the quality
of the symbolic regression benchmarks quantitatively. The proposed approach is
based on meta-learning and uses a set of dataset meta-features---such as the
number of examples or output skewness---to describe the datasets. Our idea is
to correlate these meta-features with the errors obtained by a GP method. These
meta-features define a space of benchmarks that should, ideally, have datasets
(points) covering different regions of the space. An initial analysis of 63
datasets showed that current benchmarks are concentrated in a small region of
this benchmark space. We also found out that number of instances and output
skewness are the most relevant meta-features to GP output error. Both
conclusions can help define which datasets should compose an effective testbed
for symbolic regression methods.Comment: 8 pages, 3 Figures, Proceedings of Genetic and Evolutionary
Computation Conference Companion, Kyoto, Japa
Constraints, Lazy Constraints, or Propagators in ASP Solving: An Empirical Analysis
Answer Set Programming (ASP) is a well-established declarative paradigm. One
of the successes of ASP is the availability of efficient systems.
State-of-the-art systems are based on the ground+solve approach. In some
applications this approach is infeasible because the grounding of one or few
constraints is expensive. In this paper, we systematically compare alternative
strategies to avoid the instantiation of problematic constraints, that are
based on custom extensions of the solver. Results on real and synthetic
benchmarks highlight some strengths and weaknesses of the different strategies.
(Under consideration for acceptance in TPLP, ICLP 2017 Special Issue.)Comment: Paper presented at the 33nd International Conference on Logic
Programming (ICLP 2017), Melbourne, Australia, August 28 to September 1,
2017. 16 page
Automatic Repair of Real Bugs: An Experience Report on the Defects4J Dataset
Defects4J is a large, peer-reviewed, structured dataset of real-world Java
bugs. Each bug in Defects4J is provided with a test suite and at least one
failing test case that triggers the bug. In this paper, we report on an
experiment to explore the effectiveness of automatic repair on Defects4J. The
result of our experiment shows that 47 bugs of the Defects4J dataset can be
automatically repaired by state-of- the-art repair. This sets a baseline for
future research on automatic repair for Java. We have manually analyzed 84
different patches to assess their real correctness. In total, 9 real Java bugs
can be correctly fixed with test-suite based repair. This analysis shows that
test-suite based repair suffers from under-specified bugs, for which trivial
and incorrect patches still pass the test suite. With respect to practical
applicability, it takes in average 14.8 minutes to find a patch. The experiment
was done on a scientific grid, totaling 17.6 days of computation time. All
their systems and experimental results are publicly available on Github in
order to facilitate future research on automatic repair
Performance analysis of a parallel, multi-node pipeline for DNA sequencing
Post-sequencing DNA analysis typically consists of read mapping followed by variant calling and is very time-consuming, even on a multi-core machine. Recently, we proposed Halvade, a parallel, multi-node implementation of a DNA sequencing pipeline according to the GATK Best Practices recommendations. The MapReduce programming model is used to distribute the workload among different workers. In this paper, we study the impact of different hardware configurations on the performance of Halvade. Benchmarks indicate that especially the lack of good multithreading capabilities in the existing tools (BWA, SAMtools, Picard, GATK) cause suboptimal scaling behavior. We demonstrate that it is possible to circumvent this bottleneck by using multiprocessing on high-memory machines rather than using multithreading. Using a 15-node cluster with 360 CPU cores in total, this results in a runtime of 1 h 31 min. Compared to a single-threaded runtime of similar to 12 days, this corresponds to an overall parallel efficiency of 53%
TOR: modular search with hookable disjunction
Horn Clause Programs have a natural exhaustive depth-first procedural
semantics. However, for many programs this semantics is
ineffective. In order to compute useful solutions, one needs the
ability to modify the search method that explores the alternative
execution branches.
Tor, a well-defined hook into Prolog disjunction, provides this ability.
It is light-weight thanks to its library approach and efficient
because it is based on program transformation.
Tor is general enough to mimic search-modifying
predicates like ECLiPSe's search/6. Moreover, Tor supports
modular composition of search methods and other hooks.
The Tor library is already
provided and used as an add-on to SWI-Prolog.publisher: Elsevier
articletitle: Tor: Modular search with hookable disjunction
journaltitle: Science of Computer Programming
articlelink: http://dx.doi.org/10.1016/j.scico.2013.05.008
content_type: article
copyright: Copyright © 2013 Elsevier B.V. All rights reserved.status: publishe
- …