66,469 research outputs found
Continuous Iterative Guided Spectral Class Rejection ClassiïŹcation Algorithm: Part 1
This paper outlines the changes necessary to convert the iterative guided spectral class rejection (IGSCR) classification algorithm to a soft classification algorithm. IGSCR uses a hypothesis test to select clusters to use in classification and iteratively reïŹnes clusters not yet selected for classification. Both steps assume that cluster and class memberships are crisp (either zero or one). In order to make soft cluster and class assignments (between zero and one), a new hypothesis test and iterative reïŹnement technique are introduced that are suitable for soft clusters. The new hypothesis test, called the (class) association signiïŹcance test, is based on the normal distribution, and a proof is supplied to show that the assumption of normality is reasonable. Soft clusters are iteratively reïŹned by creating new clusters using information contained in a targeted soft cluster. Soft cluster evaluation and reïŹnement can then be combined to form a soft classification algorithm, continuous iterative guided spectral class rejection (CIGSCR)
Enrichment Procedures for Soft Clusters: A Statistical Test and its Applications
Clusters, typically mined by modeling locality of attribute spaces, are often evaluated for their ability to demonstrate âenrichmentâ of categorical features. A cluster enrichment procedure evaluates the membership of a cluster for significant representation in pre-defined categories of interest. While classical enrichment procedures assume a hard clustering deïŹnition, in this paper we introduce a new statistical test that computes enrichments for soft clusters. We demonstrate an application of this test in reïŹning and evaluating soft clusters for classification of remotely sensed images
Predicting diabetes-related hospitalizations based on electronic health records
OBJECTIVE: To derive a predictive model to identify patients likely to be hospitalized during the following year due to complications attributed to Type II diabetes. METHODS: A variety of supervised machine learning classification methods were tested and a new method that discovers hidden patient clusters in the positive class (hospitalized) was developed while, at the same time, sparse linear support vector machine classifiers were derived to separate positive samples from the negative ones (non-hospitalized). The convergence of the new method was established and theoretical guarantees were proved on how the classifiers it produces generalize to a test set not seen during training. RESULTS: The methods were tested on a large set of patients from the Boston Medical Center - the largest safety net hospital in New England. It is found that our new joint clustering/classification method achieves an accuracy of 89% (measured in terms of area under the ROC Curve) and yields informative clusters which can help interpret the classification results, thus increasing the trust of physicians to the algorithmic output and providing some guidance towards preventive measures. While it is possible to increase accuracy to 92% with other methods, this comes with increased computational cost and lack of interpretability. The analysis shows that even a modest probability of preventive actions being effective (more than 19%) suffices to generate significant hospital care savings. CONCLUSIONS: Predictive models are proposed that can help avert hospitalizations, improve health outcomes and drastically reduce hospital expenditures. The scope for savings is significant as it has been estimated that in the USA alone, about $5.8 billion are spent each year on diabetes-related hospitalizations that could be prevented.Accepted manuscrip
Real-Time RGB-D based Template Matching Pedestrian Detection
Pedestrian detection is one of the most popular topics in computer vision and
robotics. Considering challenging issues in multiple pedestrian detection, we
present a real-time depth-based template matching people detector. In this
paper, we propose different approaches for training the depth-based template.
We train multiple templates for handling issues due to various upper-body
orientations of the pedestrians and different levels of detail in depth-map of
the pedestrians with various distances from the camera. And, we take into
account the degree of reliability for different regions of sliding window by
proposing the weighted template approach. Furthermore, we combine the
depth-detector with an appearance based detector as a verifier to take
advantage of the appearance cues for dealing with the limitations of depth
data. We evaluate our method on the challenging ETH dataset sequence. We show
that our method outperforms the state-of-the-art approaches.Comment: published in ICRA 201
MACOC: a medoid-based ACO clustering algorithm
The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository
Fuzzy Spatial Analysis Techniques in a Business GIS Environment
The purpose of the paper is to explore the use of fuzzy logic technology in spatial analysis. Focus is laid on illustrating the value added within the context of Business GIS. We consider the issue of geomarketing for illustrative purposes. Geomarketing may be characterised as address focused marketing. The objective of the case study is to identify spatial customer potentials for a specific product, using real world customer data of an Austrian firm. Fuzzy logic is used to generate customer profiles and to model the spatial customer potential of the product in question. We will illustrate the use of fuzzy logic in comparison to crisp classification techniques and modelling with crisp operators for solving the problem and more generally how the use of fuzzy logic may be to the advantage of businesses. Univ.-Prof. Dr. Manfred M. Fischer (Department of Economic Geography and Geoinformatics, Vienna University of Economics and Business Administration) invited me to attend his special session.
Modeling and Recognition of Smart Grid Faults by a Combined Approach of Dissimilarity Learning and One-Class Classification
Detecting faults in electrical power grids is of paramount importance, either
from the electricity operator and consumer viewpoints. Modern electric power
grids (smart grids) are equipped with smart sensors that allow to gather
real-time information regarding the physical status of all the component
elements belonging to the whole infrastructure (e.g., cables and related
insulation, transformers, breakers and so on). In real-world smart grid
systems, usually, additional information that are related to the operational
status of the grid itself are collected such as meteorological information.
Designing a suitable recognition (discrimination) model of faults in a
real-world smart grid system is hence a challenging task. This follows from the
heterogeneity of the information that actually determine a typical fault
condition. The second point is that, for synthesizing a recognition model, in
practice only the conditions of observed faults are usually meaningful.
Therefore, a suitable recognition model should be synthesized by making use of
the observed fault conditions only. In this paper, we deal with the problem of
modeling and recognizing faults in a real-world smart grid system, which
supplies the entire city of Rome, Italy. Recognition of faults is addressed by
following a combined approach of multiple dissimilarity measures customization
and one-class classification techniques. We provide here an in-depth study
related to the available data and to the models synthesized by the proposed
one-class classifier. We offer also a comprehensive analysis of the fault
recognition results by exploiting a fuzzy set based reliability decision rule
How Sample Completeness Affects Gamma-Ray Burst Classification
Unsupervised pattern recognition algorithms support the existence of three
gamma-ray burst classes; Class I (long, large fluence bursts of intermediate
spectral hardness), Class II (short, small fluence, hard bursts), and Class III
(soft bursts of intermediate durations and fluences). The algorithms
surprisingly assign larger membership to Class III than to either of the other
two classes. A known systematic bias has been previously used to explain the
existence of Class III in terms of Class I; this bias allows the fluences and
durations of some bursts to be underestimated (Hakkila et al., ApJ 538, 165,
2000). We show that this bias primarily affects only the longest bursts and
cannot explain the bulk of the Class III properties. We resolve the question of
Class III existence by demonstrating how samples obtained using standard
trigger mechanisms fail to preserve the duration characteristics of small peak
flux bursts. Sample incompleteness is thus primarily responsible for the
existence of Class III. In order to avoid this incompleteness, we show how a
new dual timescale peak flux can be defined in terms of peak flux and fluence.
The dual timescale peak flux preserves the duration distribution of faint
bursts and correlates better with spectral hardness (and presumably redshift)
than either peak flux or fluence. The techniques presented here are generic and
have applicability to the studies of other transient events. The results also
indicate that pattern recognition algorithms are sensitive to sample
completeness; this can influence the study of large astronomical databases such
as those found in a Virtual Observatory.Comment: 29 pages, 6 figures, 3 tables, Accepted for publication in The
Astrophysical Journa
- âŠ