247 research outputs found

    Multiple Classifier System for Remote Sensing Image Classification: A Review

    Get PDF
    Over the last two decades, multiple classifier system (MCS) or classifier ensemble has shown great potential to improve the accuracy and reliability of remote sensing image classification. Although there are lots of literatures covering the MCS approaches, there is a lack of a comprehensive literature review which presents an overall architecture of the basic principles and trends behind the design of remote sensing classifier ensemble. Therefore, in order to give a reference point for MCS approaches, this paper attempts to explicitly review the remote sensing implementations of MCS and proposes some modified approaches. The effectiveness of existing and improved algorithms are analyzed and evaluated by multi-source remotely sensed images, including high spatial resolution image (QuickBird), hyperspectral image (OMISII) and multi-spectral image (Landsat ETM+). Experimental results demonstrate that MCS can effectively improve the accuracy and stability of remote sensing image classification, and diversity measures play an active role for the combination of multiple classifiers. Furthermore, this survey provides a roadmap to guide future research, algorithm enhancement and facilitate knowledge accumulation of MCS in remote sensing community

    A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates

    Get PDF
    Our hypothesis is that building ensembles of small sets of strong classifiers constructed with different learning algorithms is, on average, the best approach to classification for real world problems. We propose a simple mechanism for building small heterogeneous ensembles based on exponentially weighting the probability estimates of the base classifiers with an estimate of the accuracy formed through cross-validation on the train data. We demonstrate through extensive experimentation that, given the same small set of base classifiers, this method has measurable benefits over commonly used alternative weighting, selection or meta classifier approaches to heterogeneous ensembles. We also show how an ensemble of five well known, fast classifiers can produce an ensemble that is not significantly worse than large homogeneous ensembles and tuned individual classifiers on datasets from the UCI archive. We provide evidence that the performance of the Cross-validation Accuracy Weighted Probabilistic Ensemble (CAWPE) generalises to a completely separate set of datasets, the UCR time series classification archive, and we also demonstrate that our ensemble technique can significantly improve the state-of-the-art classifier for this problem domain. We investigate the performance in more detail, and find that the improvement is most marked in problems with smaller train sets. We perform a sensitivity analysis and an ablation study to demonstrate the robustness of the ensemble and the significant contribution of each design element of the classifier. We conclude that it is, on average, better to ensemble strong classifiers with a weighting scheme rather than perform extensive tuning and that CAWPE is a sensible starting point for combining classifiers

    Performance evaluation of multi-tier ensemble classifiers for phishing websites

    Get PDF
    This article is devoted to large multi-tier ensemble classifiers generated as ensembles of ensembles and applied to phishing websites. Our new ensemble construction is a special case of the general and productive multi-tier approach well known in information security. Many efficient multi-tier classifiers have been considered in the literature. Our new contribution is in generating new large systems as ensembles of ensembles by linking a top-tier ensemble to another middletier ensemble instead of a base classifier so that the top~ tier ensemble can generate the whole system. This automatic generation capability includes many large ensemble classifiers in two tiers simultaneously and automatically combines them into one hierarchical unified system so that one ensemble is an integral part of another one. This new construction makes it easy to set up and run such large systems. The present article concentrates on the investigation of performance of these new multi~tier ensembles for the example of detection of phishing websites. We carried out systematic experiments evaluating several essential ensemble techniques as well as more recent approaches and studying their performance as parts of multi~level ensembles with three tiers. The results presented here demonstrate that new three-tier ensemble classifiers performed better than the base classifiers and standard ensembles included in the system. This example of application to the classification of phishing websites shows that the new method of combining diverse ensemble techniques into a unified hierarchical three-tier ensemble can be applied to increase the performance of classifiers in situations where data can be processed on a large computer

    Automatic generation of meta classifiers with large levels for distributed computing and networking

    Full text link
    This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system

    Enhancing navigation in biomedical databases by community voting and database-driven text classification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The breadth of biological databases and their information content continues to increase exponentially. Unfortunately, our ability to query such sources is still often suboptimal. Here, we introduce and apply community voting, database-driven text classification, and visual aids as a means to incorporate distributed expert knowledge, to automatically classify database entries and to efficiently retrieve them.</p> <p>Results</p> <p>Using a previously developed peptide database as an example, we compared several machine learning algorithms in their ability to classify abstracts of published literature results into categories relevant to peptide research, such as related or not related to cancer, angiogenesis, molecular imaging, etc. Ensembles of bagged decision trees met the requirements of our application best. No other algorithm consistently performed better in comparative testing. Moreover, we show that the algorithm produces meaningful class probability estimates, which can be used to visualize the confidence of automatic classification during the retrieval process. To allow viewing long lists of search results enriched by automatic classifications, we added a dynamic heat map to the web interface. We take advantage of community knowledge by enabling users to cast votes in Web 2.0 style in order to correct automated classification errors, which triggers reclassification of all entries. We used a novel framework in which the database "drives" the entire vote aggregation and reclassification process to increase speed while conserving computational resources and keeping the method scalable. In our experiments, we simulate community voting by adding various levels of noise to nearly perfectly labelled instances, and show that, under such conditions, classification can be improved significantly.</p> <p>Conclusion</p> <p>Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.</p> <p>The system can be accessed at <url>http://pepbank.mgh.harvard.edu</url>.</p

    Nonlinear Boosting Projections for Ensemble Construction

    Get PDF
    In this paper we propose a novel approach for ensemble construction based on the use of nonlinear projections to achieve both accuracy and diversity of individual classifiers. The proposed approach combines the philosophy of boosting, putting more effort on difficult instances, with the basis of the random subspace method. Our main contribution is that instead of using a random subspace, we construct a projection taking into account the instances which have posed most difficulties to previous classifiers. In this way, consecutive nonlinear projections are created by a neural network trained using only incorrectly classified instances. The feature subspace induced by the hidden layer of this network is used as the input space to a new classifier. The method is compared with bagging and boosting techniques, showing an improved performance on a large set of 44 problems from the UCI Machine Learning Repository. An additional study showed that the proposed approach is less sensitive to noise in the data than boosting method

    A penalized likelihood based pattern classification algorithm

    Full text link
    Penalized likelihood is a general approach whereby an objective function is defined, consisting of the log likelihood of the data minus some term penalizing non-smooth solutions. Subsequently, this objective function is maximized, yielding a solution that achieves some sort of trade-off between the faithfulness and the smoothness of the fit. Most work on that topic focused on the regression problem, and there has been little work on the classification problem. In this paper we propose a new classification method using the concept of penalized likelihood (for the two class case). By proposing a novel penalty term based on the K-nearest neighbors, simple analytical derivations have led to an algorithm that is proved to converge to the global optimum. Moreover, this algorithm is very simple to implement and converges typically in two or three iterations. We also introduced two variants of the method by distance-weighting the K-nearest neighbor contributions, and by tackling the unbalanced class patterns situation. We performed extensive experiments to compare the proposed method to several well-known classification methods. These simulations reveal that the proposed method achieves one of the top ranks in classification performance and with a fairly small computation time. Ā© 2009 Elsevier Ltd. All rights reserved
    • ā€¦
    corecore