1,386 research outputs found

    Classification of chirp signals using hierarchical bayesian learning and MCMC methods

    Get PDF
    This paper addresses the problem of classifying chirp signals using hierarchical Bayesian learning together with Markov chain Monte Carlo (MCMC) methods. Bayesian learning consists of estimating the distribution of the observed data conditional on each class from a set of training samples. Unfortunately, this estimation requires to evaluate intractable multidimensional integrals. This paper studies an original implementation of hierarchical Bayesian learning that estimates the class conditional probability densities using MCMC methods. The performance of this implementation is first studied via an academic example for which the class conditional densities are known. The problem of classifying chirp signals is then addressed by using a similar hierarchical Bayesian learning implementation based on a Metropolis-within-Gibbs algorithm

    Automated supervised classification of variable stars I. Methodology

    Get PDF
    The fast classification of new variable stars is an important step in making them available for further research. Selection of science targets from large databases is much more efficient if they have been classified first. Defining the classes in terms of physical parameters is also important to get an unbiased statistical view on the variability mechanisms and the borders of instability strips. Our goal is twofold: provide an overview of the stellar variability classes that are presently known, in terms of some relevant stellar parameters; use the class descriptions obtained as the basis for an automated `supervised classification' of large databases. Such automated classification will compare and assign new objects to a set of pre-defined variability training classes. For every variability class, a literature search was performed to find as many well-known member stars as possible, or a considerable subset if too many were present. Next, we searched on-line and private databases for their light curves in the visible band and performed period analysis and harmonic fitting. The derived light curve parameters are used to describe the classes and define the training classifiers. We compared the performance of different classifiers in terms of percentage of correct identification, of confusion among classes and of computation time. We describe how well the classes can be separated using the proposed set of parameters and how future improvements can be made, based on new large databases such as the light curves to be assembled by the CoRoT and Kepler space missions.Comment: This paper has been accepted for publication in Astronomy and Astrophysics (reference AA/2007/7638) Number of pages: 27 Number of figures: 1

    Ensemble classifiers for land cover mapping

    Get PDF
    This study presents experimental investigations on supervised ensemble classification for land cover classification. Despite the arrays of classifiers available in machine learning to create an ensemble, knowing and understanding the correct classifier to use for a particular dataset remains a major challenge. The ensemble method increases classification accuracy by consulting experts taking final decision in the classification process. This study generated various land cover maps, using image classification. This is to authenticate the number of classifiers that should be used for creating an ensemble. The study exploits feature selection techniques to create diversity in ensemble classification. Landsat imagery of Kampala (the capital of Uganda, East Africa), AVIRIS hyperspectral dataset of Indian pine of Indiana and Support Vector Machines were used to carry out the investigation. The research reveals that the superiority of different classification approaches employed depends on the datasets used. In addition, the pre-processing stage and the strategy used during the designing phase of each classifier is very essential. The results obtained from the experiments conducted showed that, there is no significant benefit in using many base classifiers for decision making in ensemble classification. The research outcome also reveals how to design better ensemble using feature selection approach for land cover mapping. The study also reports the experimental comparison of generalized support vector machines, random forests, C4.5, neural network and bagging classifiers for land cover classification of hyperspectral images. These classifiers are among the state-of-the-art supervised machine learning methods for solving complex pattern recognition problems. The pixel purity index was used to obtain the endmembers from the Indiana pine and Washington DC mall hyperspectral image datasets. Generalized reduced gradient optimization algorithm was used to estimate fractional abundance in the image dataset thereafter obtained numeric values for land cover classification. The fractional abundance of each pixel was obtained using the spectral signature values of the endmembers and pixel values of class labels. As the results of the experiments, the classifiers show promising results. Using Indiana pine and Washington DC mall hyperspectral datasets, experimental comparison of all the classifiers’ performances reveals that random forests outperforms the other classifiers and it is computational effective. The study makes a positive contribution to the problem of classifying land cover hyperspectral images by exploring the use of generalized reduced gradient method and five supervised classifiers. The accuracy comparison of these classifiers is valuable for decision makers to consider tradeoffs in method accuracy versus complexity. The results from the research has attracted nine publications which include, six international and one local conference papers, one published in Computing Research Repository (CoRR), one Journal paper submitted and one Springer book chapter, Abe et al., 2012 obtained a merit award based on the reviewer reports and the score reports of the conference committee members during the conference period

    Machine learning approaches applied to GC-FID fatty acid profiles to discriminate wild from farmed salmon

    Get PDF
    In the last decade, there has been an increasing demand for wild-captured fish, which attains higher prices compared to farmed species, thus being prone to mislabeling practices. In this work, fatty acid composition coupled to advanced chemometrics was used to discriminate wild from farmed salmon. The lipids extracted from salmon muscles of different production methods and origins (26 wild from Canada, 25 farmed from Canada, 24 farmed from Chile and 25 farmed from Norway) were analyzed by gas chromatography with flame ionization detector (GC-FID). All the tested chemometric approaches, namely principal components analysis (PCA), t-distributed stochastic neighbor embedding (t-SNE) and seven machine learning classifiers, namely k-nearest neighbors (kNN), decision tree, support vector machine (SVM), random forest, artificial neural networks (ANN), naïve Bayes and AdaBoost, allowed for differentiation between farmed and wild salmons using the 17 features obtained from chemical analysis. PCA did not allow clear distinguishing between salmon geographical origin since farmed samples from Canada and Chile overlapped. Nevertheless, using the 17 features in the models, six out of the seven tested machine learning classifiers allowed a classification accuracy of ≥99%, with ANN, naïve Bayes, random forest, SVM and kNN presenting 100% accuracy on the test dataset. The classification models were also assayed using only the best features selected by a reduction algorithm and the best input features mapped by t-SNE. The classifier kNN provided the best discrimination results because it correctly classified all samples according to production method and origin, ultimately using only the three most important features (16:0, 18:2n6c and 20:3n3 + 20:4n6). In general, the classifiers presented good generalization with the herein proposed approach being simple and presenting the advantage of requiring only common equipment existing in most labs.This work was supported by the European project FOODINTEGRITY (FP7-KBBE-2013-single-stage, under grant agreement No 613688) and FCT (Fundação para a Ciência e Tecnologia, Portugal) under the Partnership Agreements UIDB 50006/2020, UIDB 00690/2020 (CIMO) and UIDB/5757/2020 (CeDRI). L. Grazina and M.A. Nunes acknowledge the FCT grant SFRH/BD/132462/2017 and SFRH/BD/130131/2017 financed by POPH-QREN (subsidised by FSE and MCTES).info:eu-repo/semantics/publishedVersio

    Algorithms for enhancing pattern separability, feature selection and incremental learning with applications to gas sensing electronic nose systems

    Get PDF
    Three major issues in pattern recognition and data analysis have been addressed in this study and applied to the problem of identification of volatile organic compounds (VOC) for gas sensing applications. Various approaches have been proposed and discussed. These approaches are not only applicable to the VOC identification, but also to a variety of pattern recognition and data analysis problems. In particular, (1) enhancing pattern separability for challenging classification problems, (2) optimum feature selection problem, and (3) incremental learning for neural networks have been investigated;Three different approaches are proposed for enhancing pattern separability for classification of closely spaced, or possibly overlapping clusters. In the neurofuzzy approach, a fuzzy inference system that considers the dynamic ranges of individual features is developed. Feature range stretching (FRS) is introduced as an alternative approach for increasing intercluster distances by mapping the tight dynamic range of each feature to a wider range through a nonlinear function. Finally, a third approach, nonlinear cluster transformation (NCT), is proposed, which increases intercluster distances while preserving intracluster distances. It is shown that NCT achieves comparable, or better, performance than the other two methods at a fraction of the computational burden. The implementation issues and relative advantages and disadvantages of these approaches are systematically investigated;Selection of optimum features is addressed using both a decision tree based approach, and a wrapper approach. The hill-climb search based wrapper approach is applied for selection of the optimum features for gas sensing problems;Finally, a new method, Learn++, is proposed that gives classification algorithms, the capability of incrementally learning from new data. Learn++ is introduced for incremental learning of new data, when the original database is no longer available. Learn++ algorithm is based on strategically combining an ensemble of classifiers, each of which is trained to learn only a small portion of the pattern space. Furthermore, Learn++ is capable of learning new data even when new classes are introduced, and it also features a built-in mechanism for estimating the reliability of its classification decision;All proposed methods are explained in detail and simulation results are discussed along with directions for future work

    Multistage classification of multispectral Earth observational data: The design approach

    Get PDF
    An algorithm is proposed which predicts the optimal features at every node in a binary tree procedure. The algorithm estimates the probability of error by approximating the area under the likelihood ratio function for two classes and taking into account the number of training samples used in estimating each of these two classes. Some results on feature selection techniques, particularly in the presence of a very limited set of training samples, are presented. Results comparing probabilities of error predicted by the proposed algorithm as a function of dimensionality as compared to experimental observations are shown for aircraft and LANDSAT data. Results are obtained for both real and simulated data. Finally, two binary tree examples which use the algorithm are presented to illustrate the usefulness of the procedure
    • …
    corecore