1,278 research outputs found

    Minería de datos mediante programación automática con colonias de hormigas

    Get PDF
    La presente tesis doctoral supone el primer acercamiento de la metaheur stica de programaci on autom atica mediante colonias de hormigas (Ant Programming) a tareas de miner a de datos. Esta t ecnica de aprendizaje autom atico ha demostrado ser capaz de obtener buenos resultados en problemas de optimizaci on, pero su aplicaci on a la miner a de datos no hab a sido explorada hasta el momento. Espec camente, esta tesis cubre las tareas de clasi caci on y asociaci on. Para la primera se presentan tres modelos que inducen un clasi cador basado en reglas. Dos de ellos abordan el problema de clasi caci on desde el punto de vista de evaluaci on monobjetivo y multiobjetivo, respectivamente, mientras que el tercero afronta el problema espec co de clasi caci on en conjuntos de datos no balanceados desde una perspectiva multiobjetivo. Por su parte, para la tarea de extracci on de reglas de asociaci on se han desarrollado dos algoritmos que llevan a cabo la extracci on de patrones frecuentes. El primero de ellos propone una evaluaci on de los individuos novedosa, mientras que el segundo lo hace desde un punto de vista basado en la dominancia de Pareto. Todos los algoritmos han sido evaluados en un marco experimental adecuado, utilizando numerosos conjuntos de datos y comparando su rendimiento frente a otros m etodos ya publicados de contrastada calidad. Los resultados obtenidos, que han sido veri cados mediante la aplicaci on de test estad sticos no param etricos, demuestran los bene cios de utilizar la metaheur stica de programaci on autom atica con colonias de hormigas para dichas tareas de miner a de datos.This Doctoral Thesis involves the rst approximation of the ant programming metaheuristic to data mining. This automatic programming technique has demonstrated good results in optimization problems, but its application to data mining has not been explored until the present moment. Speci cally, this Thesis deals with the classi cation and association rule mining tasks of data mining. For the former, three models for the induction of rule-based classi ers are presented. Two of them address the classi cation problem from the point of view of single-objective and multi-objective evaluation, respectively, while the third proposal tackles the particular problem of imbalanced classi cation from a multi-objective perspective. On the other hand, for the task of association rule mining two algorithms for extracting frequent patterns have been developed. The rst one evaluates the quality of individuals by using a novel tness function, while the second algorithm performs the evaluation from a Pareto dominance point of view. All the algorithms proposed in this Thesis have been evaluated in a proper experimental framework, using a large number of data sets and comparing their performance against other published methods of proved quality. The results obtained have been veri ed by applying non-parametric statistical tests, demonstrating the bene ts of using the ant programming metaheuristic to address these data mining tasks

    Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications

    Full text link
    MapReduce is a popular programming paradigm for developing large-scale, data-intensive computation. Many frameworks that implement this paradigm have recently been developed. To leverage these frameworks, however, developers must become familiar with their APIs and rewrite existing code. Casper is a new tool that automatically translates sequential Java programs into the MapReduce paradigm. Casper identifies potential code fragments to rewrite and translates them in two steps: (1) Casper uses program synthesis to search for a program summary (i.e., a functional specification) of each code fragment. The summary is expressed using a high-level intermediate language resembling the MapReduce paradigm and verified to be semantically equivalent to the original using a theorem prover. (2) Casper generates executable code from the summary, using either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically converting real-world, sequential Java benchmarks to MapReduce. The resulting benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi

    Search based software engineering: Trends, techniques and applications

    Get PDF
    © ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E
    corecore