31,578 research outputs found
A New Search Algorithm for Feature Selection in Hyperspectral Remote Sensing Images
A new suboptimal search strategy suitable for feature selection in very high-dimensional remote-sensing images (e.g. those acquired by hyperspectral sensors) is proposed. Each solution of the feature selection problem is represented as a binary string that indicates which features are selected and which are disregarded. In turn, each binary string corresponds to a point of a multidimensional binary space. Given a criterion function to evaluate the effectiveness of a selected solution, the proposed strategy is based on the search for constrained local extremes of such a function in the above-defined binary space. In particular, two different algorithms are presented that explore the space of solutions in different ways. These algorithms are compared with the classical sequential forward selection and sequential forward floating selection suboptimal techniques, using hyperspectral remote-sensing images (acquired by the AVIRIS sensor) as a data set. Experimental results point out the effectiveness of both algorithms, which can be regarded as valid alternatives to classical methods, as they allow interesting tradeoffs between the qualities of selected feature subsets and computational cost
Facilitating meta-design techniques for multi-disciplinary conceptual design
The research reported in this paper was supported by the EU FP6 funded project, SimSAC (Simulating Aircraft Stability and Control Characteristics for Use in Conceptual Design)
Learning the structure of Bayesian Networks: A quantitative assessment of the effect of different algorithmic schemes
One of the most challenging tasks when adopting Bayesian Networks (BNs) is
the one of learning their structure from data. This task is complicated by the
huge search space of possible solutions, and by the fact that the problem is
NP-hard. Hence, full enumeration of all the possible solutions is not always
feasible and approximations are often required. However, to the best of our
knowledge, a quantitative analysis of the performance and characteristics of
the different heuristics to solve this problem has never been done before.
For this reason, in this work, we provide a detailed comparison of many
different state-of-the-arts methods for structural learning on simulated data
considering both BNs with discrete and continuous variables, and with different
rates of noise in the data. In particular, we investigate the performance of
different widespread scores and algorithmic approaches proposed for the
inference and the statistical pitfalls within them
Reduction of the size of datasets by using evolutionary feature selection: the case of noise in a modern city
Smart city initiatives have emerged to mitigate the negative effects of a very fast growth of urban areas. Most of the population in our cities are exposed to high levels of noise that generate discomfort and different health problems. These issues may be mitigated by applying different smart cities solutions, some of them require high accurate noise information to provide the best quality of serve possible. In this study, we have designed a machine learning approach based on genetic algorithms to analyze noise data captured in the university campus. This method reduces the amount of data required to classify the noise by addressing a feature selection optimization problem. The experimental results have shown that our approach improved the accuracy in 20% (achieving an accuracy of 87% with a reduction of up to 85% on the original dataset).Universidad de Málaga. Campus de Excelencia Internacional AndalucĂa Tech.
This research has been partially funded by the Spanish MINECO and FEDER projects TIN2016-81766-REDT (http://cirti.es), and TIN2017-88213-R (http://6city.lcc.uma.es)
Bayesian semiparametric analysis for two-phase studies of gene-environment interaction
The two-phase sampling design is a cost-efficient way of collecting expensive
covariate information on a judiciously selected subsample. It is natural to
apply such a strategy for collecting genetic data in a subsample enriched for
exposure to environmental factors for gene-environment interaction (G x E)
analysis. In this paper, we consider two-phase studies of G x E interaction
where phase I data are available on exposure, covariates and disease status.
Stratified sampling is done to prioritize individuals for genotyping at phase
II conditional on disease and exposure. We consider a Bayesian analysis based
on the joint retrospective likelihood of phases I and II data. We address
several important statistical issues: (i) we consider a model with multiple
genes, environmental factors and their pairwise interactions. We employ a
Bayesian variable selection algorithm to reduce the dimensionality of this
potentially high-dimensional model; (ii) we use the assumption of gene-gene and
gene-environment independence to trade off between bias and efficiency for
estimating the interaction parameters through use of hierarchical priors
reflecting this assumption; (iii) we posit a flexible model for the joint
distribution of the phase I categorical variables using the nonparametric Bayes
construction of Dunson and Xing [J. Amer. Statist. Assoc. 104 (2009)
1042-1051].Comment: Published in at http://dx.doi.org/10.1214/12-AOAS599 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …