1,768 research outputs found

    Spectral Band Selection for Ensemble Classification of Hyperspectral Images with Applications to Agriculture and Food Safety

    Get PDF
    In this dissertation, an ensemble non-uniform spectral feature selection and a kernel density decision fusion framework are proposed for the classification of hyperspectral data using a support vector machine classifier. Hyperspectral data has more number of bands and they are always highly correlated. To utilize the complete potential, a feature selection step is necessary. In an ensemble situation, there are mainly two challenges: (1) Creating diverse set of classifiers in order to achieve a higher classification accuracy when compared to a single classifier. This can either be achieved by having different classifiers or by having different subsets of features for each classifier in the ensemble. (2) Designing a robust decision fusion stage to fully utilize the decision produced by individual classifiers. This dissertation tests the efficacy of the proposed approach to classify hyperspectral data from different applications. Since these datasets have a small number of training samples with larger number of highly correlated features, conventional feature selection approaches such as random feature selection cannot utilize the variability in the correlation level between bands to achieve diverse subsets for classification. In contrast, the approach proposed in this dissertation utilizes the variability in the correlation between bands by dividing the spectrum into groups and selecting bands from each group according to its size. The intelligent decision fusion proposed in this approach uses the probability density of training classes to produce a final class label. The experimental results demonstrate the validity of the proposed framework that results in improvements in the overall, user, and producer accuracies compared to other state-of-the-art techniques. The experiments demonstrate the ability of the proposed approach to produce more diverse feature selection over conventional approaches

    A review of domain adaptation without target labels

    Full text link
    Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

    Automated Domain Discovery from Multiple Sources to Improve Zero-Shot Generalization

    Full text link
    Domain generalization (DG) methods aim to develop models that generalize to settings where the test distribution is different from the training data. In this paper, we focus on the challenging problem of multi-source zero shot DG (MDG), where labeled training data from multiple source domains is available but with no access to data from the target domain. A wide range of solutions have been proposed for this problem, including the state-of-the-art multi-domain ensembling approaches. Despite these advances, the na\"ive ERM solution of pooling all source data together and training a single classifier is surprisingly effective on standard benchmarks. In this paper, we hypothesize that, it is important to elucidate the link between pre-specified domain labels and MDG performance, in order to explain this behavior. More specifically, we consider two popular classes of MDG algorithms -- distributional robust optimization (DRO) and multi-domain ensembles, in order to demonstrate how inferring custom domain groups can lead to consistent improvements over the original domain labels that come with the dataset. To this end, we propose (i) Group-DRO++, which incorporates an explicit clustering step to identify custom domains in an existing DRO technique; and (ii) DReaME, which produces effective multi-domain ensembles through implicit domain re-labeling with a novel meta-optimization algorithm. Using empirical studies on multiple standard benchmarks, we show that our variants consistently outperform ERM by significant margins (1.5% - 9%), and produce state-of-the-art MDG performance. Our code can be found at https://github.com/kowshikthopalli/DREAM

    A survey of multiple classifier systems as hybrid systems

    Get PDF
    A current focus of intense research in pattern classification is the combination of several classifier systems, which can be built following either the same or different models and/or datasets building approaches. These systems perform information fusion of classification decisions at different levels overcoming limitations of traditional approaches based on single classifiers. This paper presents an up-to-date survey on multiple classifier system (MCS) from the point of view of Hybrid Intelligent Systems. The article discusses major issues, such as diversity and decision fusion methods, providing a vision of the spectrum of applications that are currently being developed

    The impact of training data characteristics on ensemble classification of land cover

    Get PDF
    Supervised classification of remote sensing imagery has long been recognised as an essential technology for large area land cover mapping. Remote sensing derived land cover and forest classification maps are important sources of information for understanding environmental processes and informing natural resource management decision making. In recent years, the supervised transformation of remote sensing data into thematic products has been advanced through the introduction and development of machine learning classification techniques. Applied to a variety of science and engineering problems over the past twenty years (Lary et al., 2016), machine learning provides greater accuracy and efficiency than traditional parametric classifiers, capable of dealing with large data volumes across complex measurement spaces. The Random forest (RF) classifier in particular, has become popular in the remote sensing community, with a range of commonly cited advantages, including its low parameterisation requirements, excellent classification results and ability to handle noisy observation data and outliers, in a complex measurement space and small training data relative to the study area size. In the context of large area land cover classification for forest cover, using multisource remote sensing and geospatial data, this research sets out to examine proposed advantages of the RF classifier - insensitivity to training data noise (mislabelling) and handling training data class imbalance. Through margin theory, the research also investigates the utility of ensemble learning – in which multiple base classifiers are combined to reduce generalisation error in classification – as a means of designing more efficient classifiers, improving classification performance, and reducing reference (training and test) data redundancy. The first part of the thesis (chapters 2 and 3) introduces the experimental setting and data used in the research, including a description (in chapter 2) of the sampling framework for the reference data used in classification experiments that follow. Chapter 3 evaluates the performance of the RF classifier applied across 7.2 million hectares of public land study area in Victoria, Australia. This chapter describes an open-source framework for deploying the RF classifier over large areas and processing significant volumes of multi-source remote sensing and ancillary spatial data. The second part of this thesis (research chapters 4 through 6) examines the effect of training data characteristics (class imbalance and mislabelling) on the performance of RF, and explores the application of the ensemble margin, as a means of both examining RF classification performance, and informing training data sampling to improve classification accuracy. Results of binary and multiclass experiments described in chapter 4, provide insights into the behaviour of RF, in which training data are not evenly distributed among classes and contain systematically mislabelled instances. Results show that while the error rate of the RF classifier is relatively insensitive to mislabelled training data (in the multiclass experiment, overall 78.3% Kappa with no mislabelled instances to 70.1% with 25% mislabelling in each class), the level of associated confidence falls at a faster rate than overall accuracy with increasing rates of mislabelled training data. This study section also demonstrates that imbalanced training data can be introduced to reduce error in classes that are most difficult to classify. The relationship between per-class and overall classification performance and the diversity of members in a RF ensemble classifier, is explored through experiments presented in chapter 5. This research examines ways of targeting particular training data samples to induce RF ensemble diversity and improve per-class and overall classification performance and efficiency. Through use of the ensemble margin, this study offers insights into the trade-off between ensemble classification accuracy and diversity. The research shows that boosting diversity among RF ensemble members, by emphasising the contribution of lower margin training instances used in the learning process, is an effective means of improving classification performance, particularly for more difficult or rarer classes, and is a way of reducing information redundancy and improving the efficiency of classification problems. Research chapter 6 looks at the application of the RF classifier for calculating Landscape Pattern Indices (LPIs) from classification prediction maps, and examines the sensitivity of these indices to training data characteristics and sampling based on the ensemble margin. This research reveals a range of commonly used LPIs to have significant sensitivity to training data mislabelling in RF classification, as well as margin-based training data sampling. In conclusion, this thesis examines proposed advantages of the popular machine learning classifier, Random forests - the relative insensitivity to training data noise (mislabelling) and its ability to handle class imbalance. This research also explores the utility of the ensemble margin for designing more efficient classifiers, measuring and improving classification performance, and designing ensemble classification systems which use reference data more efficiently and effectively, with less data redundancy. These findings have practical applications and implications for large area land cover classification, for which the generation of high quality reference data is often a time consuming, subjective and expensive exercise

    An information adaptive system study report and development plan

    Get PDF
    The purpose of the information adaptive system (IAS) study was to determine how some selected Earth resource applications may be processed onboard a spacecraft and to provide a detailed preliminary IAS design for these applications. Detailed investigations of a number of applications were conducted with regard to IAS and three were selected for further analysis. Areas of future research and development include algorithmic specifications, system design specifications, and IAS recommended time lines

    Artificial Intelligence Based Classification for Urban Surface Water Modelling

    Get PDF
    Estimations and predictions of surface water runoff can provide very useful insights, regarding flood risks in urban areas. To automatically predict the flow behaviour of the rainfall-runoff water, in real-world satellite images, it is important to precisely identify permeable and impermeable areas. This identification indicates and helps to calculate the amount of surface water, by taking into account the amount of water being absorbed in a permeable area and what remains on the impermeable area. In this research, a model of surface water has been established, to predict the behavioural flow of rainfall-runoff water. This study employs a combination of image processing, artificial intelligence and machine learning techniques, for automatic segmentation and classification of permeable and impermeable areas, in satellite images. These techniques investigate the image classification approaches for classifying three land-use categories (roofs, roads, and pervious areas), commonly found in satellite images of the earth’s surface. Three different classification scenarios are investigated, to select the best classification model. The first scenario involves pixel by pixel classification of images, using Classification Tree and Random Forest classification techniques, in 2 different settings of sequential and parallel execution of algorithms. In the second classification scenario, the image is divided into objects, by using Superpixels (SLIC) segmentation method, while three kinds of feature sets are extracted from the segmented objects. The performance of eight different supervised machine learning classifiers is probed, using 5-fold cross-validation, for multiple SLIC values, while detailed performance comparisons lead to conclusions about the classification into different classes, regarding Object-based and Pixel-based classification schemes. Pareto analysis and Knee point selection are used to select SLIC value and the suitable type of classification, among the aforementioned two. Furthermore, a new diversity and weighted sum-based ensemble classification model, called ParetoEnsemble, is proposed, in this classification scenario. The weights are applied to selected component classifiers of an ensemble, creating a strong classifier, where classification is done based on multiple votes from candidate classifiers of the ensemble, as opposed to individual classifiers, where classification is done based on a single vote, from only one classifier. Unbalanced and balanced data-based classification results are also evaluated, to determine the most suitable mode, for satellite image classifications, in this study. Convolutional Neural Networks, based on semantic segmentation, are also employed in the classification phase, as a third scenario, to evaluate the strength of deep learning model SegNet, in the classification of satellite imaging. The best results, from the three classification scenarios, are compared and the best classification method, among the three scenarios, is used in the next phase of water modelling, with the InfoWorks ICM software, to explore the potential of modelling process, regarding a partially automated surface water network. By using the parameter settings, with a specified amount of simulated rain falling, onto the imaged area, the amount of surface water flow is estimated, to get predictions about runoff situations in urban areas, since runoff, in such a situation, can be high enough to pose a dangerous flood risk. The area of Feock, in Cornwall, is used as a simulation area of study, in this research, where some promising results have been derived, regarding classification and modelling of runoff. The correlation coefficient estimation, between classification and runoff accuracy, provides useful insight, regarding the dependence of runoff performance on classification performance. The trained system was tested on some unknown area images as well, demonstrating a reasonable performance, considering the training and classification limitations and conditions. Furthermore, in these unknown area images, reasonable estimations were derived, regarding surface water runoff. An analysis of unbalanced and balanced data-based classification and runoff estimations, for multiple parameter configurations, provides aid to the selection of classification and modelling parameter values, to be used in future unknown data predictions. This research is founded on the incorporation of satellite imaging into water modelling, using selective images for analysis and assessment of results. This system can be further improved, and runoff predictions of high precision can be better achieved, by adding more high-resolution images to the classifiers training. The added variety, to the trained model, can lead to an even better classification of any unknown image, which could eventually provide better modelling and better insights into surface water modelling. Moreover, the modelling phase can be extended, in future research, to deal with real-time parameters, by calibrating the model, after the classification phase, in order to observe the impact of classification on the actual calibration

    Highly Accurate Fragment Library for Protein Fold Recognition

    Get PDF
    Proteins play a crucial role in living organisms as they perform many vital tasks in every living cell. Knowledge of protein folding has a deep impact on understanding the heterogeneity and molecular functions of proteins. Such information leads to crucial advances in drug design and disease understanding. Fold recognition is a key step in the protein structure discovery process, especially when traditional computational methods fail to yield convincing structural homologies. In this work, we present a new protein fold recognition approach using machine learning and data mining methodologies. First, we identify a protein structural fragment library (Frag-K) composed of a set of backbone fragments ranging from 4 to 20 residues as the structural “keywords” that can effectively distinguish between major protein folds. We firstly apply randomized spectral clustering and random forest algorithms to construct representative and sensitive protein fragment libraries from a large-scale of high-quality, non-homologous protein structures available in PDB. We analyze the impacts of clustering cut-offs on the performance of the fragment libraries. Then, the Frag-K fragments are employed as structural features to classify protein structures in major protein folds defined by SCOP (Structural Classification of Proteins). Our results show that a structural dictionary with ~400 4- to 20-residue Frag-K fragments is capable of classifying major SCOP folds with high accuracy. Then, based on Frag-k, we design a novel deep learning architecture, so-called DeepFrag-k, which identifies fold discriminative features to improve the accuracy of protein fold recognition. DeepFrag-k is composed of two stages: the first stage employs a multimodal Deep Belief Network (DBN) to predict the potential structural fragments given a sequence, represented as a fragment vector, and then the second stage uses a deep convolution neural network (CNN) to classify the fragment vectors into the corresponding folds. Our results show that DeepFrag-k yields 92.98% accuracy in predicting the top-100 most popular fragments, which can be used to generate discriminative fragment feature vectors to improve protein fold recognition

    Ensemble classifiers for land cover mapping

    Get PDF
    This study presents experimental investigations on supervised ensemble classification for land cover classification. Despite the arrays of classifiers available in machine learning to create an ensemble, knowing and understanding the correct classifier to use for a particular dataset remains a major challenge. The ensemble method increases classification accuracy by consulting experts taking final decision in the classification process. This study generated various land cover maps, using image classification. This is to authenticate the number of classifiers that should be used for creating an ensemble. The study exploits feature selection techniques to create diversity in ensemble classification. Landsat imagery of Kampala (the capital of Uganda, East Africa), AVIRIS hyperspectral dataset of Indian pine of Indiana and Support Vector Machines were used to carry out the investigation. The research reveals that the superiority of different classification approaches employed depends on the datasets used. In addition, the pre-processing stage and the strategy used during the designing phase of each classifier is very essential. The results obtained from the experiments conducted showed that, there is no significant benefit in using many base classifiers for decision making in ensemble classification. The research outcome also reveals how to design better ensemble using feature selection approach for land cover mapping. The study also reports the experimental comparison of generalized support vector machines, random forests, C4.5, neural network and bagging classifiers for land cover classification of hyperspectral images. These classifiers are among the state-of-the-art supervised machine learning methods for solving complex pattern recognition problems. The pixel purity index was used to obtain the endmembers from the Indiana pine and Washington DC mall hyperspectral image datasets. Generalized reduced gradient optimization algorithm was used to estimate fractional abundance in the image dataset thereafter obtained numeric values for land cover classification. The fractional abundance of each pixel was obtained using the spectral signature values of the endmembers and pixel values of class labels. As the results of the experiments, the classifiers show promising results. Using Indiana pine and Washington DC mall hyperspectral datasets, experimental comparison of all the classifiers’ performances reveals that random forests outperforms the other classifiers and it is computational effective. The study makes a positive contribution to the problem of classifying land cover hyperspectral images by exploring the use of generalized reduced gradient method and five supervised classifiers. The accuracy comparison of these classifiers is valuable for decision makers to consider tradeoffs in method accuracy versus complexity. The results from the research has attracted nine publications which include, six international and one local conference papers, one published in Computing Research Repository (CoRR), one Journal paper submitted and one Springer book chapter, Abe et al., 2012 obtained a merit award based on the reviewer reports and the score reports of the conference committee members during the conference period
    • …
    corecore