66,469 research outputs found

    Continuous Iterative Guided Spectral Class Rejection ClassiïŹcation Algorithm: Part 1

    Get PDF
    This paper outlines the changes necessary to convert the iterative guided spectral class rejection (IGSCR) classification algorithm to a soft classification algorithm. IGSCR uses a hypothesis test to select clusters to use in classification and iteratively reïŹnes clusters not yet selected for classification. Both steps assume that cluster and class memberships are crisp (either zero or one). In order to make soft cluster and class assignments (between zero and one), a new hypothesis test and iterative reïŹnement technique are introduced that are suitable for soft clusters. The new hypothesis test, called the (class) association signiïŹcance test, is based on the normal distribution, and a proof is supplied to show that the assumption of normality is reasonable. Soft clusters are iteratively reïŹned by creating new clusters using information contained in a targeted soft cluster. Soft cluster evaluation and reïŹnement can then be combined to form a soft classification algorithm, continuous iterative guided spectral class rejection (CIGSCR)

    Enrichment Procedures for Soft Clusters: A Statistical Test and its Applications

    Get PDF
    Clusters, typically mined by modeling locality of attribute spaces, are often evaluated for their ability to demonstrate ‘enrichment’ of categorical features. A cluster enrichment procedure evaluates the membership of a cluster for significant representation in pre-defined categories of interest. While classical enrichment procedures assume a hard clustering deïŹnition, in this paper we introduce a new statistical test that computes enrichments for soft clusters. We demonstrate an application of this test in reïŹning and evaluating soft clusters for classification of remotely sensed images

    Predicting diabetes-related hospitalizations based on electronic health records

    Full text link
    OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip

    Real-Time RGB-D based Template Matching Pedestrian Detection

    Full text link
    Pedestrian detection is one of the most popular topics in computer vision and robotics. Considering challenging issues in multiple pedestrian detection, we present a real-time depth-based template matching people detector. In this paper, we propose different approaches for training the depth-based template. We train multiple templates for handling issues due to various upper-body orientations of the pedestrians and different levels of detail in depth-map of the pedestrians with various distances from the camera. And, we take into account the degree of reliability for different regions of sliding window by proposing the weighted template approach. Furthermore, we combine the depth-detector with an appearance based detector as a verifier to take advantage of the appearance cues for dealing with the limitations of depth data. We evaluate our method on the challenging ETH dataset sequence. We show that our method outperforms the state-of-the-art approaches.Comment: published in ICRA 201

    MACOC: a medoid-based ACO clustering algorithm

    Get PDF
    The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository

    Fuzzy Spatial Analysis Techniques in a Business GIS Environment

    Get PDF
    The purpose of the paper is to explore the use of fuzzy logic technology in spatial analysis. Focus is laid on illustrating the value added within the context of Business GIS. We consider the issue of geomarketing for illustrative purposes. Geomarketing may be characterised as address focused marketing. The objective of the case study is to identify spatial customer potentials for a specific product, using real world customer data of an Austrian firm. Fuzzy logic is used to generate customer profiles and to model the spatial customer potential of the product in question. We will illustrate the use of fuzzy logic in comparison to crisp classification techniques and modelling with crisp operators for solving the problem and more generally how the use of fuzzy logic may be to the advantage of businesses. Univ.-Prof. Dr. Manfred M. Fischer (Department of Economic Geography and Geoinformatics, Vienna University of Economics and Business Administration) invited me to attend his special session.

    Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification

    Full text link
    Detecting faults in electrical power grids is of paramount importance, either from the electricity operator and consumer viewpoints. Modern electric power grids (smart grids) are equipped with smart sensors that allow to gather real-time information regarding the physical status of all the component elements belonging to the whole infrastructure (e.g., cables and related insulation, transformers, breakers and so on). In real-world smart grid systems, usually, additional information that are related to the operational status of the grid itself are collected such as meteorological information. Designing a suitable recognition (discrimination) model of faults in a real-world smart grid system is hence a challenging task. This follows from the heterogeneity of the information that actually determine a typical fault condition. The second point is that, for synthesizing a recognition model, in practice only the conditions of observed faults are usually meaningful. Therefore, a suitable recognition model should be synthesized by making use of the observed fault conditions only. In this paper, we deal with the problem of modeling and recognizing faults in a real-world smart grid system, which supplies the entire city of Rome, Italy. Recognition of faults is addressed by following a combined approach of multiple dissimilarity measures customization and one-class classification techniques. We provide here an in-depth study related to the available data and to the models synthesized by the proposed one-class classifier. We offer also a comprehensive analysis of the fault recognition results by exploiting a fuzzy set based reliability decision rule

    How Sample Completeness Affects Gamma-Ray Burst Classification

    Full text link
    Unsupervised pattern recognition algorithms support the existence of three gamma-ray burst classes; Class I (long, large fluence bursts of intermediate spectral hardness), Class II (short, small fluence, hard bursts), and Class III (soft bursts of intermediate durations and fluences). The algorithms surprisingly assign larger membership to Class III than to either of the other two classes. A known systematic bias has been previously used to explain the existence of Class III in terms of Class I; this bias allows the fluences and durations of some bursts to be underestimated (Hakkila et al., ApJ 538, 165, 2000). We show that this bias primarily affects only the longest bursts and cannot explain the bulk of the Class III properties. We resolve the question of Class III existence by demonstrating how samples obtained using standard trigger mechanisms fail to preserve the duration characteristics of small peak flux bursts. Sample incompleteness is thus primarily responsible for the existence of Class III. In order to avoid this incompleteness, we show how a new dual timescale peak flux can be defined in terms of peak flux and fluence. The dual timescale peak flux preserves the duration distribution of faint bursts and correlates better with spectral hardness (and presumably redshift) than either peak flux or fluence. The techniques presented here are generic and have applicability to the studies of other transient events. The results also indicate that pattern recognition algorithms are sensitive to sample completeness; this can influence the study of large astronomical databases such as those found in a Virtual Observatory.Comment: 29 pages, 6 figures, 3 tables, Accepted for publication in The Astrophysical Journa
    • 

    corecore