1,575 research outputs found
TFS: a thermodynamical search algorithm for feature subset selection
This work tackles the problem of selecting a subset of features in an inductive learning setting, by introducing a novel Thermodynamic Feature Selection algorithm (TFS). Given a suitable objective function, the algorithm makes uses of a specially designed form of simulated annealing to find a subset of attributes that maximizes the objective function. The new algorithm is evaluated against one of the most widespread and reliable algorithms, the Sequential Forward Floating Search (SFFS). Our experimental results in classification tasks show that TFS achieves significant improvements over SFFS in the objective function with a notable reduction in subset size.Peer ReviewedPostprint (published version
Feature selection for microarray gene expression data using simulated annealing guided by the multivariate joint entropy
In this work a new way to calculate the multivariate joint entropy is presented. This measure is the basis for a fast information-theoretic based evaluation of gene relevance in a Microarray Gene Expression data context. Its low complexity is based on the reuse of previous computations to calculate current feature relevance. The mu-TAFS algorithm --named as such to differentiate it from previous TAFS algorithms-- implements a simulated annealing technique specially designed for feature subset selection. The algorithm is applied to the maximization of gene subset relevance in several public-domain microarray data sets. The experimental results show a notoriously high classification performance and low size subsets formed by biologically meaningful genes.Postprint (published version
Stream Learning in Energy IoT Systems: A Case Study in Combined Cycle Power Plants
The prediction of electrical power produced in combined cycle power plants is a key challenge in the electrical power and energy systems field. This power production can vary depending on environmental variables, such as temperature, pressure, and humidity. Thus, the business problem is how to predict the power production as a function of these environmental conditions, in order to maximize the profit. The research community has solved this problem by applying Machine Learning techniques, and has managed to reduce the computational and time costs in comparison with the traditional thermodynamical analysis. Until now, this challenge has been tackled from a batch learning perspective, in which data is assumed to be at rest, and where models do not continuously integrate new information into already constructed models. We present an approach closer to the Big Data and Internet of Things paradigms, in which data are continuously arriving and where models learn incrementally, achieving significant enhancements in terms of data processing (time, memory and computational costs), and obtaining competitive performances. This work compares and examines the hourly electrical power prediction of several streaming regressors, and discusses about the best technique in terms of time processing and predictive performance to be applied on this streaming scenario.This work has been partially supported by the EU project iDev40. This project has received funding
from the ECSEL Joint Undertaking (JU) under grant agreement No 783163. The JU receives support from the
European Union’s Horizon 2020 research and innovation programme and Austria, Germany, Belgium, Italy,
Spain, Romania. It has also been supported by the Basque Government (Spain) through the project VIRTUAL
(KK-2018/00096), and by Ministerio de EconomĂa y Competitividad of Spain (Grant Ref. TIN2017-85887-C2-2-P)
New acceleration technique for the backpropagation algorithm
Artificial neural networks have been studied for many years in the hope of achieving human like performance in the area of pattern recognition, speech synthesis and higher level of cognitive process. In the connectionist model there are several interconnected processing elements called the neurons that have limited processing capability. Even though the rate of information transmitted between these elements is limited, the complex interconnection and the cooperative interaction between these elements results in a vastly increased computing power; The neural network models are specified by an organized network topology of interconnected neurons. These networks have to be trained in order them to be used for a specific purpose. Backpropagation is one of the popular methods of training the neural networks. There has been a lot of improvement over the speed of convergence of standard backpropagation algorithm in the recent past. Herein we have presented a new technique for accelerating the existing backpropagation without modifying it. We have used the fourth order interpolation method for the dominant eigen values, by using these we change the slope of the activation function. And by doing so we increase the speed of convergence of the backpropagation algorithm; Our experiments have shown significant improvement in the convergence time for problems widely used in benchmarKing Three to ten fold decrease in convergence time is achieved. Convergence time decreases as the complexity of the problem increases. The technique adjusts the energy state of the system so as to escape from local minima
Deriving amino acid contact potentials from their frequencies of occurence in proteins: a lattice model study
The possibility of deriving the contact potentials between amino acids from
their frequencies of occurence in proteins is discussed in evolutionary terms.
This approach allows the use of traditional thermodynamics to describe such
frequencies and, consequently, to develop a strategy to include in the
calculations correlations due to the spatial proximity of the amino acids and
to their overall tendency of being conserved in proteins. Making use of a
lattice model to describe protein chains and defining a "true" potential, we
test these strategies by selecting a database of folding model sequences,
deriving the contact potentials from such sequences and comparing them with the
"true" potential. Taking into account correlations allows for a markedly better
prediction of the interaction potentials
Contextual normalization applied to aircraft gas turbine engine diagnosis
Diagnosing faults in aircraft gas turbine engines is a complex problem. It involves several tasks,
including rapid and accurate interpretation of patterns in engine sensor data. We have investigated
contextual normalization for the development of a software tool to help engine repair technicians
with interpretation of sensor data. Contextual normalization is a new strategy for employing
machine learning. It handles variation in data that is due to contextual factors, rather than the
health of the engine. It does this by normalizing the data in a context-sensitive manner. This
learning strategy was developed and tested using 242 observations of an aircraft gas turbine
engine in a test cell, where each observation consists of roughly 12,000 numbers, gathered over a
12 second interval. There were eight classes of observations: seven deliberately implanted classes
of faults and a healthy class. We compared two approaches to implementing our learning strategy:
linear regression and instance-based learning. We have three main results. (1) For the given
problem, instance-based learning works better than linear regression. (2) For this problem,
contextual normalization works better than other common forms of normalization. (3) The
algorithms described here can be the basis for a useful software tool for assisting technicians with
the interpretation of sensor data
The cavity method for large deviations
A method is introduced for studying large deviations in the context of
statistical physics of disordered systems. The approach, based on an extension
of the cavity method to atypical realizations of the quenched disorder, allows
us to compute exponentially small probabilities (rate functions) over different
classes of random graphs. It is illustrated with two combinatorial optimization
problems, the vertex-cover and coloring problems, for which the presence of
replica symmetry breaking phases is taken into account. Applications include
the analysis of models on adaptive graph structures.Comment: 18 pages, 7 figure
Recommended from our members
Combinatorial optimization and metaheuristics
Today, combinatorial optimization is one of the youngest and most active areas of discrete mathematics. It is a branch of optimization in applied mathematics and computer science, related to operational research, algorithm theory and computational complexity theory. It sits at the intersection of several fields, including artificial intelligence, mathematics and software engineering. Its increasing interest arises for the fact that a large number of scientific and industrial problems can be formulated as abstract combinatorial optimization problems, through graphs and/or (integer) linear programs. Some of these problems have polynomial-time (“efficient”) algorithms, while most of them are NP-hard, i.e. it is not proved that they can be solved in polynomial-time. Mainly, it means that it is not possible to guarantee that an exact solution to the problem can be found and one has to settle for an approximate solution with known performance guarantees. Indeed, the goal of approximate methods is to find “quickly” (reasonable run-times), with “high” probability, provable “good” solutions (low error from the real optimal solution). In the last 20 years, a new kind of algorithm commonly called metaheuristics have emerged in this class, which basically try to combine heuristics in high level frameworks aimed at efficiently and effectively exploring the search space. This report briefly outlines the components, concepts, advantages and disadvantages of different metaheuristic approaches from a conceptual point of view, in order to analyze their similarities and differences. The two very significant forces of intensification and diversification, that mainly determine the behavior of a metaheuristic, will be pointed out. The report concludes by exploring the importance of hybridization and integration methods
- …