Search CORE

9,802 research outputs found

Temporal Feature Selection with Symbolic Regression

Author: Fusting Christopher Winter
Publication venue: UVM ScholarWorks
Publication date: 01/01/2017
Field of study

Building and discovering useful features when constructing machine learning models is the central task for the machine learning practitioner. Good features are useful not only in increasing the predictive power of a model but also in illuminating the underlying drivers of a target variable. In this research we propose a novel feature learning technique in which Symbolic regression is endowed with a ``Range Terminal\u27\u27 that allows it to explore functions of the aggregate of variables over time. We test the Range Terminal on a synthetic data set and a real world data in which we predict seasonal greenness using satellite derived temperature and snow data over a portion of the Arctic. On the synthetic data set we find Symbolic regression with the Range Terminal outperforms standard Symbolic regression and Lasso regression. On the Arctic data set we find it outperforms standard Symbolic regression, fails to beat the Lasso regression, but finds useful features describing the interaction between Land Surface Temperature, Snow, and seasonal vegetative growth in the Arctic

ScholarWorks @ UVM

Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

Author: Bazlur Rashid A. N. M.
Choudhury Tonmoy
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2019
Field of study

The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

Research Online @ ECU

Hybrid Algorithms Based on Integer Programming for the Search of Prioritized Test Data in Software Product Lines

Author: A Arcuri
BJ Garvin
C Blum
C Nie
D Benavides
E Engström
F Ensan
H Cichos
K Pohl
M Lochau
MB Cohen
MF Johansen
N Siegmund
RE Lopez-Herrejon
S Oster
Publication venue
Publication date: 02/05/2017
Field of study

In Software Product Lines (SPLs) it is not possible, in general, to test all products of the family. The number of products denoted by a SPL is very high due to the combinatorial explosion of features. For this reason, some coverage criteria have been proposed which try to test at least all feature interactions without the necessity to test all products, e.g., all pairs of features (pairwise coverage). In addition, it is desirable to first test products composed by a set of priority features. This problem is known as the Prioritized Pairwise Test Data Generation Problem. In this work we propose two hybrid algorithms using Integer Programming (IP) to generate a prioritized test suite. The first one is based on an integer linear formulation and the second one is based on a integer quadratic (nonlinear) formulation. We compare these techniques with two state-of-the-art algorithms, the Parallel Prioritized Genetic Solver (PPGS) and a greedy algorithm called prioritized-ICPL. Our study reveals that our hybrid nonlinear approach is clearly the best in both, solution quality and computation time. Moreover, the nonlinear variant (the fastest one) is 27 and 42 times faster than PPGS in the two groups of instances analyzed in this work.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Partially funded by the Spanish Ministry of Economy and Competitiveness and FEDER under contract TIN2014-57341-R, the University of Málaga, Andalucía Tech and the Spanish Network TIN2015-71841-REDT (SEBASENet)

Crossref

Repositorio Institucional Universidad de Málaga

On the evolutionary optimisation of many conflicting objectives

Author: Fleming P.J.
Purshouse R.C.
Publication venue: Automatic Control and Systems Engineering, University of Sheffield
Publication date: 01/10/2003
Field of study

This inquiry explores the effectiveness of a class of modern evolutionary algorithms, represented by Non-dominated Sorting Genetic Algorithm (NSGA) components, for solving optimisation tasks with many conflicting objectives. Optimiser behaviour is assessed for a grid of mutation and recombination operator configurations. Performance maps are obtained for the dual aims of proximity to, and distribution across, the optimal trade-off surface. Performance sweet-spots for both variation operators are observed to contract as the number of objectives is increased. Classical settings for recombination are shown to be suitable for small numbers of objectives but correspond to very poor performance for higher numbers of objectives, even when large population sizes are used. Explanations for this behaviour are offered via the concepts of dominance resistance and active diversity promotion

White Rose Research Online

A novel FPGA-based evolvable hardware system based on multiple processing arrays

Author: Gallego Galán Ángel
Mora de Sambricio Javier
Otero Marnotes Andres
Riesgo Alcaide Teresa
Salvador Perea Rubén
Torre Arnanz Eduardo de la
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2013
Field of study

In this paper, an architecture based on a scalable and flexible set of Evolvable Processing arrays is presented. FPGA-native Dynamic Partial Reconfiguration (DPR) is used for evolution, which is done intrinsically, letting the system to adapt autonomously to variable run-time conditions, including the presence of transient and permanent faults. The architecture supports different modes of operation, namely: independent, parallel, cascaded or bypass mode. These modes of operation can be used during evolution time or during normal operation. The evolvability of the architecture is combined with fault-tolerance techniques, to enhance the platform with self-healing features, making it suitable for applications which require both high adaptability and reliability. Experimental results show that such a system may benefit from accelerated evolution times, increased performance and improved dependability, mainly by increasing fault tolerance for transient and permanent faults, as well as providing some fault identification possibilities. The evolvable HW array shown is tailored for window-based image processing applications

Archivo Digital UPM

Data Deluge in Astrophysics: Photometric Redshifts as a Template Use Case

Author: A. J. Connolly
AA Collister
Abdalla
B Hoyle
C. Laigle
CA Blake
CP Ahn
D Wittman
Daniel Masters
David W. Gerdes
H Hildebrandt
I Sadeh
JTA Jong de
K Carrasco
K Carrasco
M Annunziatella
M Bolzonella
M Brescia
M Brescia
M Brescia
Masayuki Tanaka
N Benitez
O Ilbert
O Laurino
P Dubath
S Arnouts
S Cavuoti
S Cavuoti
S Cavuoti
S Cavuoti
S Cavuoti
S Cavuoti
S Cavuoti
T Gneiting
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Astronomy has entered the big data era and Machine Learning based methods have found widespread use in a large variety of astronomical applications. This is demonstrated by the recent huge increase in the number of publications making use of this new approach. The usage of machine learning methods, however is still far from trivial and many problems still need to be solved. Using the evaluation of photometric redshifts as a case study, we outline the main problems and some ongoing efforts to solve them.Comment: 13 pages, 3 figures, Springer's Communications in Computer and Information Science (CCIS), Vol. 82

arXiv.org e-Print Archive

Crossref

Archivio della ricerca - Università degli studi di Napoli Federico II

OA@INAF - Istituto Nazionale di Astrofisica

ML-k’sNN: Label Dependent k Values for Multi-Label k-Nearest Neighbor Rule

Author: Cuevas-Muñoz José Manuel
García Pedrajas Nicolás
Publication venue: 'MDPI AG'
Publication date: 01/01/2023
Field of study

Multi-label classification as a data mining task has recently attracted increasing interest from researchers. Many current data mining applications address problems with instances that belong to more than one category. These problems require the development of new, efficient methods. Multi-label k-nearest neighbors rule, ML-kNN, is among the best-performing methods for multi-label problems. Current methods use a unique k value for all labels, as in the single-label method. However, the distributions of the labels are frequently very different. In such scenarios, a unique k value for the labels might be suboptimal. In this paper, we propose a novel approach in which each label is predicted with a different value of k. Obtaining the best k for each label is stated as an optimization problem. Three different algorithms are proposed for this task, depending on which multi-label metric is the target of our optimization process. In a large set of 40 real-world multi-label problems, our approach improves the results of two different tested ML-kNN implementations

Repositorio Institucional de la Universidad de Córdoba