Search CORE

50,471 research outputs found

ASlib: A Benchmark Library for Algorithm Selection

Author: Bischl Bernd
Frechette Alexandre
Hoos Holger
Hutter Frank
Kerschke Pascal
Kotthoff Lars
Leyton-Brown Kevin
Lindauer Marius
Malitsky Yuri
Tierney Kevin
Vanschoren Joaquin
Publication venue
Publication date: 01/01/2016
Field of study

The task of algorithm selection involves choosing an algorithm from a set of algorithms on a per-instance basis in order to exploit the varying performance of algorithms over a set of instances. The algorithm selection problem is attracting increasing attention from researchers and practitioners in AI. Years of fruitful applications in a number of domains have resulted in a large amount of data, but the community lacks a standard format or repository for this data. This situation makes it difficult to share and compare different approaches effectively, as is done in other, more established fields. It also unnecessarily hinders new researchers who want to work in this area. To address this problem, we introduce a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature. Our format has been designed to be able to express a wide variety of different scenarios. Demonstrating the breadth and power of our platform, we describe a set of example experiments that build and evaluate algorithm selection models through a common interface. The results display the potential of algorithm selection to achieve significant performance improvements across a broad range of problems and algorithms.Comment: Accepted to be published in Artificial Intelligence Journa

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Publications at Bielefeld University

On the role of pre and post-processing in environmental data mining

Author: Athanasiadis Ioannis
Comas Joaquim
Gibert Karina
Holmes Geoffrey
Izquierdo Joaquin
Sanchez-Marre Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2008
Field of study

The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

Research Commons@Waikato

User Review-Based Change File Localization for Mobile Applications

Author: Chen Taolue
Gall Harald
Huang Zhiqiu
Panichella Sebastiano
Su Yanqi
Zhou Yu
Publication venue
Publication date: 01/01/2020
Field of study

In the current mobile app development, novel and emerging DevOps practices (e.g., Continuous Delivery, Integration, and user feedback analysis) and tools are becoming more widespread. For instance, the integration of user feedback (provided in the form of user reviews) in the software release cycle represents a valuable asset for the maintenance and evolution of mobile apps. To fully make use of these assets, it is highly desirable for developers to establish semantic links between the user reviews and the software artefacts to be changed (e.g., source code and documentation), and thus to localize the potential files to change for addressing the user feedback. In this paper, we propose RISING (Review Integration via claSsification, clusterIng, and linkiNG), an automated approach to support the continuous integration of user feedback via classification, clustering, and linking of user reviews. RISING leverages domain-specific constraint information and semi-supervised learning to group user reviews into multiple fine-grained clusters concerning similar users' requests. Then, by combining the textual information from both commit messages and source code, it automatically localizes potential change files to accommodate the users' requests. Our empirical studies demonstrate that the proposed approach outperforms the state-of-the-art baseline work in terms of clustering and localization accuracy, and thus produces more reliable results.Comment: 15 pages, 3 figures, 8 table

arXiv.org e-Print Archive

Birkbeck Institutional Research Online

ZHAW digitalcollection

Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy

Author: Bertram Ludäscher
Brian
Junfei Qiu
Matei
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/10/2018
Field of study

Data collection for scientific applications is increasing exponentially and is forecasted to soon reach peta- and exabyte scales. Applications which process and analyze scientific data must be scalable and focus on execution performance to keep pace. In the field of radio astronomy, in addition to increasingly large datasets, tasks such as the identification of transient radio signals from extrasolar sources are computationally expensive. We present a scalable approach to radio pulsar detection written in Scala that parallelizes candidate identification to take advantage of in-memory task processing using Apache Spark on a YARN distributed system. Furthermore, we introduce a novel automated multiclass supervised machine learning technique that we combine with feature selection to reduce the time required for candidate classification. Experimental testing on a Beowulf cluster with 15 data nodes shows that the parallel implementation of the identification algorithm offers a speedup of up to 5X that of a similar multithreaded implementation. Further, we show that the combination of automated multiclass classification and feature selection speeds up the execution performance of the RandomForest machine learning algorithm by an average of 54% with less than a 2% average reduction in the algorithm's ability to correctly classify pulsars. The generalizability of these results is demonstrated by using two real-world radio astronomy data sets.Comment: In Proceedings of the 47th International Conference on Parallel Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 page

arXiv.org e-Print Archive

Crossref

Search based software engineering: Trends, techniques and applications

Author: Adamopoulos K.
Afzal W.
Afzal W.
Aguilar
Al Ba E.
Alander J. T.
Alander J. T.
Alander J. T.
Alba E.
Alba E.
Amoui M.
Ant Oniol G.
Antoniol G.
Antoniol G.
Arcuri A.
Aversano L.
Bodhuin T.
Bouktif S.
Canfora G.
Chang C. K.
Chang C. K.
Chang C. K.
Chao C.
Chicano F.
Clark J. A.
Cortellessa V.
Cowan G. S.
Dolado J. J.
Doval D.
Dozier G.
El-Faki H K.
Erformat M.
Evett M. P.
Fatiregun D.
Feather M. S.
Feather M. S.
Feldt R.
Ferreira M.
Funes P.
Gross H.-G.
Gross H.-G.
Harman M.
Harman M.
Hart J.
He P.
Hodjat B.
Jaeger M. C.
Jarillo G.
Jiang H.
Joshi A. M.
Katz G.
Khoshgoftaar T. M.
Khoshgoftaar T. M.
Kirsopp C.
Lefley M.
Li C.
Liu Y.
Liu Y.
Liu Y.
Mahanti P. K.
Mahdavi K.
Mahdavi K.
Mancoridis S.
Mancoridis S.
Mark Harman
Minohara T.
Mitchell B. S.
Mitchell B. S.
Mitchell B. S.
Monnier Y.
Nguyen C.
Pohlheim H.
Raiha O.
Ruhe G.
Ruhe G.
S. Afshin Mansouri
Sahraoui H. A.
Shan Y.
Shepperd M.
Shyang W.
Simons C. L.
Stephenson M.
Su S.
van Belle T.
Van Den Akker M.
Vivanco R.
Wang Z.
Wegener J.
Yoo S.
Yuanyuan Zhang
Zhang X.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/11/2012
Field of study

© ACM, 2012. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version is available from the link below.In the past five years there has been a dramatic increase in work on Search-Based Software Engineering (SBSE), an approach to Software Engineering (SE) in which Search-Based Optimization (SBO) algorithms are used to address problems in SE. SBSE has been applied to problems throughout the SE lifecycle, from requirements and project planning to maintenance and reengineering. The approach is attractive because it offers a suite of adaptive automated and semiautomated solutions in situations typified by large complex problem spaces with multiple competing and conflicting objectives. This article provides a review and classification of literature on SBSE. The work identifies research trends and relationships between the techniques applied and the applications to which they have been applied and highlights gaps in the literature and avenues for further research.EPSRC and E

Crossref

UCL Discovery

Brunel University Research Archive

Spatially Aware Dictionary Learning and Coding for Fossil Pollen Identification

Author: Fowlkes Charless
Kong Shu
Punyasena Surangi
Publication venue
Publication date: 03/05/2016
Field of study

We propose a robust approach for performing automatic species-level recognition of fossil pollen grains in microscopy images that exploits both global shape and local texture characteristics in a patch-based matching methodology. We introduce a novel criteria for selecting meaningful and discriminative exemplar patches. We optimize this function during training using a greedy submodular function optimization framework that gives a near-optimal solution with bounded approximation error. We use these selected exemplars as a dictionary basis and propose a spatially-aware sparse coding method to match testing images for identification while maintaining global shape correspondence. To accelerate the coding process for fast matching, we introduce a relaxed form that uses spatially-aware soft-thresholding during coding. Finally, we carry out an experimental study that demonstrates the effectiveness and efficiency of our exemplar selection and classification mechanisms, achieving

86.13\%

accuracy on a difficult fine-grained species classification task distinguishing three types of fossil spruce pollen.Comment: CVMI 201

arXiv.org e-Print Archive

Crossref

Test Set Diameter: Quantifying the Diversity of Sets of Test Cases

Author: Clark David
Feldt Robert
Poulding Simon
Yoo Shin
Publication venue
Publication date: 10/06/2015
Field of study

A common and natural intuition among software testers is that test cases need to differ if a software system is to be tested properly and its quality ensured. Consequently, much research has gone into formulating distance measures for how test cases, their inputs and/or their outputs differ. However, common to these proposals is that they are data type specific and/or calculate the diversity only between pairs of test inputs, traces or outputs. We propose a new metric to measure the diversity of sets of tests: the test set diameter (TSDm). It extends our earlier, pairwise test diversity metrics based on recent advances in information theory regarding the calculation of the normalized compression distance (NCD) for multisets. An advantage is that TSDm can be applied regardless of data type and on any test-related information, not only the test inputs. A downside is the increased computational time compared to competing approaches. Our experiments on four different systems show that the test set diameter can help select test sets with higher structural and fault coverage than random selection even when only applied to test inputs. This can enable early test design and selection, prior to even having a software system to test, and complement other types of test automation and analysis. We argue that this quantification of test set diversity creates a number of opportunities to better understand software quality and provides practical ways to increase it.Comment: In submissio

arXiv.org e-Print Archive

Crossref

Automated reliability assessment for spectroscopic redshift measurements

Author: A. Schmitt
Abdalla
B. Garilli
Baldry
Benitez
Beutler
Beutler
Bolzonella
Brammer
C. Surace
Chandola
Collister
Cool
D. Vibert
Dietterich
Fawcett
Feldmann
Garilli
Garilli
Guzzo
Hastie
Huterer
Ilbert
L. Pozzetti
Le Fèvre
Le Fèvre
Le Fèvre
Linder
M. Moresco
Machado
O. Le Fèvre
Patcha
S. Jamal
Schuecker
Scodeggio
Shahid
Simkin
Tonry
V. Le Brun
Wahba
Wang
Y. Copin
Publication venue: 'EDP Sciences'
Publication date: 01/01/2018
Field of study

We present a new approach to automate the spectroscopic redshift reliability assessment based on machine learning (ML) and characteristics of the redshift probability density function (PDF). We propose to rephrase the spectroscopic redshift estimation into a Bayesian framework, in order to incorporate all sources of information and uncertainties related to the redshift estimation process, and produce a redshift posterior PDF that will be the starting-point for ML algorithms to provide an automated assessment of a redshift reliability. As a use case, public data from the VIMOS VLT Deep Survey is exploited to present and test this new methodology. We first tried to reproduce the existing reliability flags using supervised classification to describe different types of redshift PDFs, but due to the subjective definition of these flags, soon opted for a new homogeneous partitioning of the data into distinct clusters via unsupervised classification. After assessing the accuracy of the new clusters via resubstitution and test predictions, unlabelled data from preliminary mock simulations for the Euclid space mission are projected into this mapping to predict their redshift reliability labels.Comment: Submitted on 02 June 2017 (v1). Revised on 08 September 2017 (v2). Latest version 28 September 2017 (this version v3

arXiv.org e-Print Archive

HAL-IN2P3

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

HAL AMU

HAL-INSU

OA@INAF - Istituto Nazionale di Astrofisica

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

HAL-Rennes 1

Identifying smart design attributes for Industry 4.0 customization using a clustering Genetic Algorithm

Author: Chen Yi
Flores Saldivar Alfredo
Goh Cindy SF
Li Yun
Yu Hongnian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Industry 4.0 aims at achieving mass customization at a mass production cost. A key component to realizing this is accurate prediction of customer needs and wants, which is however a challenging issue due to the lack of smart analytics tools. This paper investigates this issue in depth and then develops a predictive analytic framework for integrating cloud computing, big data analysis, business informatics, communication technologies, and digital industrial production systems. Computational intelligence in the form of a cluster k-means approach is used to manage relevant big data for feeding potential customer needs and wants to smart designs for targeted productivity and customized mass production. The identification of patterns from big data is achieved with cluster k-means and with the selection of optimal attributes using genetic algorithms. A car customization case study shows how it may be applied and where to assign new clusters with growing knowledge of customer needs and wants. This approach offer a number of features suitable to smart design in realizing Industry 4.0

Enlighten