25,003 research outputs found
New approaches for boosting to uniformity
The use of multivariate classifiers has become commonplace in particle physics. To enhance the performance, a series of classifiers is typically trained; this is a technique known as boosting. This paper explores several novel boosting methods that have been designed to produce a uniform selection efficiency in a chosen multivariate space. Such algorithms have a wide range of applications in particle physics, from producing uniform signal selection efficiency across a Dalitz-plot to avoiding the creation of false signal peaks in an invariant mass distribution when searching for new particles.National Science Foundation (U.S.) (Grant PHY-1306550
Bag-of-Features Image Indexing and Classification in Microsoft SQL Server Relational Database
This paper presents a novel relational database architecture aimed to visual
objects classification and retrieval. The framework is based on the
bag-of-features image representation model combined with the Support Vector
Machine classification and is integrated in a Microsoft SQL Server database.Comment: 2015 IEEE 2nd International Conference on Cybernetics (CYBCONF),
Gdynia, Poland, 24-26 June 201
Browsing a digital library: A new approach for the New Zealand digital library
Browsing is part of the information seeking process, used when information needs are ill-defined or unspecific. Browsing and searching are often interleaved during information seeking to accommodate changing awareness of information needs. Digital Libraries often support full-text search, but are not so helpful in supporting browsing. Described here is a novel browsing system created for the Greenstone software used by the New Zealand Digital Library that supports users in a more natural approach to the information seeking process. Š Springer-Verlag Berlin Heidelberg 2003
Scalable Solutions for Automated Single Pulse Identification and Classification in Radio Astronomy
Data collection for scientific applications is increasing exponentially and
is forecasted to soon reach peta- and exabyte scales. Applications which
process and analyze scientific data must be scalable and focus on execution
performance to keep pace. In the field of radio astronomy, in addition to
increasingly large datasets, tasks such as the identification of transient
radio signals from extrasolar sources are computationally expensive. We present
a scalable approach to radio pulsar detection written in Scala that
parallelizes candidate identification to take advantage of in-memory task
processing using Apache Spark on a YARN distributed system. Furthermore, we
introduce a novel automated multiclass supervised machine learning technique
that we combine with feature selection to reduce the time required for
candidate classification. Experimental testing on a Beowulf cluster with 15
data nodes shows that the parallel implementation of the identification
algorithm offers a speedup of up to 5X that of a similar multithreaded
implementation. Further, we show that the combination of automated multiclass
classification and feature selection speeds up the execution performance of the
RandomForest machine learning algorithm by an average of 54% with less than a
2% average reduction in the algorithm's ability to correctly classify pulsars.
The generalizability of these results is demonstrated by using two real-world
radio astronomy data sets.Comment: In Proceedings of the 47th International Conference on Parallel
Processing (ICPP 2018). ACM, New York, NY, USA, Article 11, 11 page
Fusion of Heterogeneous Earth Observation Data for the Classification of Local Climate Zones
This paper proposes a novel framework for fusing multi-temporal,
multispectral satellite images and OpenStreetMap (OSM) data for the
classification of local climate zones (LCZs). Feature stacking is the most
commonly-used method of data fusion but does not consider the heterogeneity of
multimodal optical images and OSM data, which becomes its main drawback. The
proposed framework processes two data sources separately and then combines them
at the model level through two fusion models (the landuse fusion model and
building fusion model), which aim to fuse optical images with landuse and
buildings layers of OSM data, respectively. In addition, a new approach to
detecting building incompleteness of OSM data is proposed. The proposed
framework was trained and tested using data from the 2017 IEEE GRSS Data Fusion
Contest, and further validated on one additional test set containing test
samples which are manually labeled in Munich and New York. Experimental results
have indicated that compared to the feature stacking-based baseline framework
the proposed framework is effective in fusing optical images with OSM data for
the classification of LCZs with high generalization capability on a large
scale. The classification accuracy of the proposed framework outperforms the
baseline framework by more than 6% and 2%, while testing on the test set of
2017 IEEE GRSS Data Fusion Contest and the additional test set, respectively.
In addition, the proposed framework is less sensitive to spectral diversities
of optical satellite images and thus achieves more stable classification
performance than state-of-the art frameworks.Comment: accepted by TGR
Simpler is better: a novel genetic algorithm to induce compact multi-label chain classifiers
Multi-label classification (MLC) is the task of assigning multiple class labels to an object based on the features that describe the object. One of the most effective MLC methods is known as Classifier Chains (CC). This approach consists in training q binary classifiers linked in a chain, y1 â y2 â ... â yq, with each responsible for classifying a specific label in {l1, l2, ..., lq}. The chaining mechanism allows each individual classifier to incorporate the predictions of the previous ones as additional information at classification time. Thus, possible correlations among labels can be automatically exploited. Nevertheless, CC suffers from two important drawbacks: (i) the label ordering is decided at random, although it usually has a strong effect on predictive accuracy; (ii) all labels are inserted into the chain, although some of them might carry irrelevant information to discriminate the others. In this paper we tackle both problems at once, by proposing a novel genetic algorithm capable of searching for a single optimized label ordering, while at the same time taking into consideration the utilization of partial chains. Experiments on benchmark datasets demonstrate that our approach is able to produce models that are both simpler and more accurate
Detection of Dispersed Radio Pulses: A machine learning approach to candidate identification and classification
Searching for extraterrestrial, transient signals in astronomical data sets
is an active area of current research. However, machine learning techniques are
lacking in the literature concerning single-pulse detection. This paper
presents a new, two-stage approach for identifying and classifying dispersed
pulse groups (DPGs) in single-pulse search output. The first stage identified
DPGs and extracted features to characterize them using a new peak
identification algorithm which tracks sloping tendencies around local maxima in
plots of signal-to-noise ratio vs. dispersion measure. The second stage used
supervised machine learning to classify DPGs. We created four benchmark data
sets: one unbalanced and three balanced versions using three different
imbalance treatments.We empirically evaluated 48 classifiers by training and
testing binary and multiclass versions of six machine learning algorithms on
each of the four benchmark versions. While each classifier had advantages and
disadvantages, all classifiers with imbalance treatments had higher recall
values than those with unbalanced data, regardless of the machine learning
algorithm used. Based on the benchmarking results, we selected a subset of
classifiers to classify the full, unlabelled data set of over 1.5 million DPGs
identified in 42,405 observations made by the Green Bank Telescope. Overall,
the classifiers using a multiclass ensemble tree learner in combination with
two oversampling imbalance treatments were the most efficient; they identified
additional known pulsars not in the benchmark data set and provided six
potential discoveries, with significantly less false positives than the other
classifiers.Comment: 13 pages, accepted for publication in MNRAS, ref. MN-15-1713-MJ.R
- âŚ