20 research outputs found

    Simultaneous Spectral-Spatial Feature Selection and Extraction for Hyperspectral Images

    Full text link
    In hyperspectral remote sensing data mining, it is important to take into account of both spectral and spatial information, such as the spectral signature, texture feature and morphological property, to improve the performances, e.g., the image classification accuracy. In a feature representation point of view, a nature approach to handle this situation is to concatenate the spectral and spatial features into a single but high dimensional vector and then apply a certain dimension reduction technique directly on that concatenated vector before feed it into the subsequent classifier. However, multiple features from various domains definitely have different physical meanings and statistical properties, and thus such concatenation hasn't efficiently explore the complementary properties among different features, which should benefit for boost the feature discriminability. Furthermore, it is also difficult to interpret the transformed results of the concatenated vector. Consequently, finding a physically meaningful consensus low dimensional feature representation of original multiple features is still a challenging task. In order to address the these issues, we propose a novel feature learning framework, i.e., the simultaneous spectral-spatial feature selection and extraction algorithm, for hyperspectral images spectral-spatial feature representation and classification. Specifically, the proposed method learns a latent low dimensional subspace by projecting the spectral-spatial feature into a common feature space, where the complementary information has been effectively exploited, and simultaneously, only the most significant original features have been transformed. Encouraging experimental results on three public available hyperspectral remote sensing datasets confirm that our proposed method is effective and efficient

    An Active Learning Approach to Hyperspectral Data Classification

    Full text link

    Histopathological image analysis : a review

    Get PDF
    Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe

    Relative-fuzzy: a novel approach for handling complex ambiguity for software engineering of data mining models

    Get PDF
    There are two main defined classes of uncertainty namely: fuzziness and ambiguity, where ambiguity is ‘one-to-many’ relationship between syntax and semantic of a proposition. This definition seems that it ignores ‘many-to-many’ relationship ambiguity type of uncertainty. In this thesis, we shall use complex-uncertainty to term many-to-many relationship ambiguity type of uncertainty. This research proposes a new approach for handling the complex ambiguity type of uncertainty that may exist in data, for software engineering of predictive Data Mining (DM) classification models. The proposed approach is based on Relative-Fuzzy Logic (RFL), a novel type of fuzzy logic. RFL defines a new formulation of the problem of ambiguity type of uncertainty in terms of States Of Proposition (SOP). RFL describes its membership (semantic) value by using the new definition of Domain of Proposition (DOP), which is based on the relativity principle as defined by possible-worlds logic. To achieve the goal of proposing RFL, a question is needed to be answered, which is: how these two approaches; i.e. fuzzy logic and possible-world, can be mixed to produce a new membership value set (and later logic) that able to handle fuzziness and multiple viewpoints at the same time? Achieving such goal comes via providing possible world logic the ability to quantifying multiple viewpoints and also model fuzziness in each of these multiple viewpoints and expressing that in a new set of membership value. Furthermore, a new architecture of Hierarchical Neural Network (HNN) called ML/RFL-Based Net has been developed in this research, along with a new learning algorithm and new recalling algorithm. The architecture, learning algorithm and recalling algorithm of ML/RFL-Based Net follow the principles of RFL. This new type of HNN is considered to be a RFL computation machine. The ability of the Relative Fuzzy-based DM prediction model to tackle the problem of complex ambiguity type of uncertainty has been tested. Special-purpose Integrated Development Environment (IDE) software, which generates a DM prediction model for speech recognition, has been developed in this research too, which is called RFL4ASR. This special purpose IDE is an extension of the definition of the traditional IDE. Using multiple sets of TIMIT speech data, the prediction model of type ML/RFL-Based Net has classification accuracy of 69.2308%. This accuracy is higher than the best achievements of WEKA data mining machines given the same speech data

    Histopathological image analysis: a review,”

    Get PDF
    Abstract-Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe

    Tree Genera Classification by Ensemble Classification of Small-Footprint Airborne LiDAR

    Get PDF
    Tree genera information is useful in environmental applications such as forest management, forestry, urban planning, and the maintenance of utility transmission line infrastructure. The ability of small foot print airborne LiDAR (Light Detection and Ranging) to acquire 3D information provides a promising way of studying vertical forest structures. This provides an extra dimension of information compared to the traditional 2D remote sensing data. However, the techniques for processing this type of data are relatively recent and have becoming an innovative research direction. The existing perspective for processing LiDAR data for tree species classification involve calculating the statistics attributes of the vertical point profile for individual trees. This method however does not explicitly utilize the geometric information of the tree form such as shapes of the tree crown and geometric features that are derivable inside of the tree crown. Therefore, the aim of this dissertation research is to derive geometric features from individual tree crowns and use these features for genera classification. The second goal of this research is to improve classification results by combining the newly developed features with the conventional vertical point profile features through ensemble classification system. Final goal of this research is to design a classification system to cope with the situation where the number of classes in the validation data exceeds the number of classes in the training data. 24 geometric features were initially derived and six of them are selected for the classification of pine, poplar and maple. Average classification accuracy of 88.3% is achieved by using this method. When the geometric features are combined with vertical profile features by ensemble classification system, the average classification accuracy increased to 91.2%. While the individual performance of geometric classifier and vertical classifier is 88.0% and 88.8% respectively for the classification of pine, poplar and maple. Lastly, when samples that do not belong to pine, poplar and maple are added to the validation data, the classification accuracy dropped to 72.8% by using randomly selected samples for training. However, through diversified sampling technique, the classification accuracy increased to 93.8%

    Deep Vision in Optical Imagery: From Perception to Reasoning

    Get PDF
    Deep learning has achieved extraordinary success in a wide range of tasks in computer vision field over the past years. Remote sensing data present different properties as compared to natural images/videos, due to their unique imaging technique, shooting angle, etc. For instance, hyperspectral images usually have hundreds of spectral bands, offering additional information, and the size of objects (e.g., vehicles) in remote sensing images is quite limited, which brings challenges for detection or segmentation tasks. This thesis focuses on two kinds of remote sensing data, namely hyper/multi-spectral and high-resolution images, and explores several methods to try to find answers to the following questions: - In comparison with natural images or videos in computer vision, the unique asset of hyper/multi-spectral data is their rich spectral information. But what this “additional” information brings for learning a network? And how do we take full advantage of these spectral bands? - Remote sensing images at high resolution have pretty different characteristics, bringing challenges for several tasks, for example, small object segmentation. Can we devise tailored networks for such tasks? - Deep networks have produced stunning results in a variety of perception tasks, e.g., image classification, object detection, and semantic segmentation. While the capacity to reason about relations over space is vital for intelligent species. Can a network/module with the capacity of reasoning benefit to parsing remote sensing data? To this end, a couple of networks are devised to figure out what a network learns from hyperspectral images and how to efficiently use spectral bands. In addition, a multi-task learning network is investigated for the instance segmentation of vehicles from aerial images and videos. Finally, relational reasoning modules are designed to improve semantic segmentation of aerial images

    New learning strategies for automatic text categorization.

    Get PDF
    Lai Kwok-yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 125-130).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Automatic Textual Document Categorization --- p.1Chapter 1.2 --- Meta-Learning Approach For Text Categorization --- p.3Chapter 1.3 --- Contributions --- p.6Chapter 1.4 --- Organization of the Thesis --- p.7Chapter 2 --- Related Work --- p.9Chapter 2.1 --- Existing Automatic Document Categorization Approaches --- p.9Chapter 2.2 --- Existing Meta-Learning Approaches For Information Retrieval --- p.14Chapter 2.3 --- Our Meta-Learning Approaches --- p.20Chapter 3 --- Document Pre-Processing --- p.22Chapter 3.1 --- Document Representation --- p.22Chapter 3.2 --- Classification Scheme Learning Strategy --- p.25Chapter 4 --- Linear Combination Approach --- p.30Chapter 4.1 --- Overview --- p.30Chapter 4.2 --- Linear Combination Approach - The Algorithm --- p.33Chapter 4.2.1 --- Equal Weighting Strategy --- p.34Chapter 4.2.2 --- Weighting Strategy Based On Utility Measure --- p.34Chapter 4.2.3 --- Weighting Strategy Based On Document Rank --- p.35Chapter 4.3 --- Comparisons of Linear Combination Approach and Existing Meta-Learning Methods --- p.36Chapter 4.3.1 --- LC versus Simple Majority Voting --- p.36Chapter 4.3.2 --- LC versus BORG --- p.38Chapter 4.3.3 --- LC versus Restricted Linear Combination Method --- p.38Chapter 5 --- The New Meta-Learning Model - MUDOF --- p.40Chapter 5.1 --- Overview --- p.41Chapter 5.2 --- Document Feature Characteristics --- p.42Chapter 5.3 --- Classification Errors --- p.44Chapter 5.4 --- Linear Regression Model --- p.45Chapter 5.5 --- The MUDOF Algorithm --- p.47Chapter 6 --- Incorporating MUDOF into Linear Combination approach --- p.52Chapter 6.1 --- Background --- p.52Chapter 6.2 --- Overview of MUDOF2 --- p.54Chapter 6.3 --- Major Components of the MUDOF2 --- p.57Chapter 6.4 --- The MUDOF2 Algorithm --- p.59Chapter 7 --- Experimental Setup --- p.66Chapter 7.1 --- Document Collection --- p.66Chapter 7.2 --- Evaluation Metric --- p.68Chapter 7.3 --- Component Classification Algorithms --- p.71Chapter 7.4 --- Categorical Document Feature Characteristics for MUDOF and MUDOF2 --- p.72Chapter 8 --- Experimental Results and Analysis --- p.74Chapter 8.1 --- Performance of Linear Combination Approach --- p.74Chapter 8.2 --- Performance of the MUDOF Approach --- p.78Chapter 8.3 --- Performance of MUDOF2 Approach --- p.87Chapter 9 --- Conclusions and Future Work --- p.96Chapter 9.1 --- Conclusions --- p.96Chapter 9.2 --- Future Work --- p.98Chapter A --- Details of Experimental Results for Reuters-21578 corpus --- p.99Chapter B --- Details of Experimental Results for OHSUMED corpus --- p.114Bibliography --- p.12

    Cooperative Training in Multiple Classifier Systems

    Get PDF
    Multiple classifier system has shown to be an effective technique for classification. The success of multiple classifiers does not entirely depend on the base classifiers and/or the aggregation technique. Other parameters, such as training data, feature attributes, and correlation among the base classifiers may also contribute to the success of multiple classifiers. In addition, interaction of these parameters with each other may have an impact on multiple classifiers performance. In the present study, we intended to examine some of these interactions and investigate further the effects of these interactions on the performance of classifier ensembles. The proposed research introduces a different direction in the field of multiple classifiers systems. We attempt to understand and compare ensemble methods from the cooperation perspective. In this thesis, we narrowed down our focus on cooperation at training level. We first developed measures to estimate the degree and type of cooperation among training data partitions. These evaluation measures enabled us to evaluate the diversity and correlation among a set of disjoint and overlapped partitions. With the aid of properly selected measures and training information, we proposed two new data partitioning approaches: Cluster, De-cluster, and Selection (CDS) and Cooperative Cluster, De-cluster, and Selection (CO-CDS). In the end, a comprehensive comparative study was conducted where we compared our proposed training approaches with several other approaches in terms of robustness of their usage, resultant classification accuracy and classification stability. Experimental assessment of CDS and CO-CDS training approaches validates their robustness as compared to other training approaches. In addition, this study suggests that: 1) cooperation is generally beneficial and 2) classifier ensembles that cooperate through sharing information have higher generalization ability compared to the ones that do not share training information
    corecore