1,278 research outputs found
Minería de datos mediante programación automática con colonias de hormigas
La presente tesis doctoral supone el primer acercamiento de la metaheur stica de
programaci on autom atica mediante colonias de hormigas (Ant Programming) a
tareas de miner a de datos. Esta t ecnica de aprendizaje autom atico ha demostrado
ser capaz de obtener buenos resultados en problemas de optimizaci on, pero su
aplicaci on a la miner a de datos no hab a sido explorada hasta el momento.
Espec camente, esta tesis cubre las tareas de clasi caci on y asociaci on. Para la
primera se presentan tres modelos que inducen un clasi cador basado en reglas. Dos
de ellos abordan el problema de clasi caci on desde el punto de vista de evaluaci on
monobjetivo y multiobjetivo, respectivamente, mientras que el tercero afronta el
problema espec co de clasi caci on en conjuntos de datos no balanceados desde
una perspectiva multiobjetivo.
Por su parte, para la tarea de extracci on de reglas de asociaci on se han desarrollado
dos algoritmos que llevan a cabo la extracci on de patrones frecuentes. El primero de
ellos propone una evaluaci on de los individuos novedosa, mientras que el segundo
lo hace desde un punto de vista basado en la dominancia de Pareto.
Todos los algoritmos han sido evaluados en un marco experimental adecuado, utilizando
numerosos conjuntos de datos y comparando su rendimiento frente a otros
m etodos ya publicados de contrastada calidad. Los resultados obtenidos, que han
sido veri cados mediante la aplicaci on de test estad sticos no param etricos, demuestran
los bene cios de utilizar la metaheur stica de programaci on autom atica
con colonias de hormigas para dichas tareas de miner a de datos.This Doctoral Thesis involves the rst approximation of the ant programming metaheuristic
to data mining. This automatic programming technique has demonstrated
good results in optimization problems, but its application to data mining
has not been explored until the present moment.
Speci cally, this Thesis deals with the classi cation and association rule mining
tasks of data mining. For the former, three models for the induction of rule-based
classi ers are presented. Two of them address the classi cation problem from the
point of view of single-objective and multi-objective evaluation, respectively, while
the third proposal tackles the particular problem of imbalanced classi cation from
a multi-objective perspective.
On the other hand, for the task of association rule mining two algorithms for extracting
frequent patterns have been developed. The rst one evaluates the quality
of individuals by using a novel tness function, while the second algorithm performs
the evaluation from a Pareto dominance point of view.
All the algorithms proposed in this Thesis have been evaluated in a proper experimental
framework, using a large number of data sets and comparing their performance
against other published methods of proved quality. The results obtained
have been veri ed by applying non-parametric statistical tests, demonstrating the
bene ts of using the ant programming metaheuristic to address these data mining
tasks
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for developing large-scale,
data-intensive computation. Many frameworks that implement this paradigm have
recently been developed. To leverage these frameworks, however, developers must
become familiar with their APIs and rewrite existing code. Casper is a new tool
that automatically translates sequential Java programs into the MapReduce
paradigm. Casper identifies potential code fragments to rewrite and translates
them in two steps: (1) Casper uses program synthesis to search for a program
summary (i.e., a functional specification) of each code fragment. The summary
is expressed using a high-level intermediate language resembling the MapReduce
paradigm and verified to be semantically equivalent to the original using a
theorem prover. (2) Casper generates executable code from the summary, using
either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically
converting real-world, sequential Java benchmarks to MapReduce. The resulting
benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi
Search based software engineering: Trends, techniques and applications
© ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives.
This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E
- …