3,598 research outputs found
Classification under Streaming Emerging New Classes: A Solution using Completely Random Trees
This paper investigates an important problem in stream mining, i.e.,
classification under streaming emerging new classes or SENC. The common
approach is to treat it as a classification problem and solve it using either a
supervised learner or a semi-supervised learner. We propose an alternative
approach by using unsupervised learning as the basis to solve this problem. The
SENC problem can be decomposed into three sub problems: detecting emerging new
classes, classifying for known classes, and updating models to enable
classification of instances of the new class and detection of more emerging new
classes. The proposed method employs completely random trees which have been
shown to work well in unsupervised learning and supervised learning
independently in the literature. This is the first time, as far as we know,
that completely random trees are used as a single common core to solve all
three sub problems: unsupervised learning, supervised learning and model update
in data streams. We show that the proposed unsupervised-learning-focused method
often achieves significantly better outcomes than existing
classification-focused methods
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach
Computer vision tasks are traditionally defined and evaluated using semantic
categories. However, it is known to the field that semantic classes do not
necessarily correspond to a unique visual class (e.g. inside and outside of a
car). Furthermore, many of the feasible learning techniques at hand cannot
model a visual class which appears consistent to the human eye. These problems
have motivated the use of 1) Unsupervised or supervised clustering as a
preprocessing step to identify the visual subclasses to be used in a
mixture-of-experts learning regime. 2) Felzenszwalb et al. part model and other
works model mixture assignment with latent variables which is optimized during
learning 3) Highly non-linear classifiers which are inherently capable of
modelling multi-modal input space but are inefficient at the test time. In this
work, we promote an incremental view over the recognition of semantic classes
with varied appearances. We propose an optimization technique which
incrementally finds maximal visual subclasses in a regularized risk
minimization framework. Our proposed approach unifies the clustering and
classification steps in a single algorithm. The importance of this approach is
its compliance with the classification via the fact that it does not need to
know about the number of clusters, the representation and similarity measures
used in pre-processing clustering methods a priori. Following this approach we
show both qualitatively and quantitatively significant results. We show that
the visual subclasses demonstrate a long tail distribution. Finally, we show
that state of the art object detection methods (e.g. DPM) are unable to use the
tails of this distribution comprising 50\% of the training samples. In fact we
show that DPM performance slightly increases on average by the removal of this
half of the data.Comment: Updated ICCV 2013 submissio
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
- …