39,722 research outputs found
Topic Models and Fusion Methods: a Union to Improve Text Clustering and Cluster Labeling
Topic modeling algorithms are statistical methods that aim to discover the topics running through the text documents. Using topic models in machine learning and text mining is popular due to its applicability in inferring the latent topic structure of a corpus. In this paper, we represent an enriching document approach, using state-of-the-art topic models and data fusion methods, to enrich documents of a collection with the aim of improving the quality of text clustering and cluster labeling. We propose a bi-vector space model in which every document of the corpus is represented by two vectors: one is generated based on the fusion-based topic modeling approach, and one simply is the traditional vector model. Our experiments on various datasets show that using a combination of topic modeling and fusion methods to create documents’ vectors can significantly improve the quality of the results in clustering the documents
Fusion of Head and Full-Body Detectors for Multi-Object Tracking
In order to track all persons in a scene, the tracking-by-detection paradigm
has proven to be a very effective approach. Yet, relying solely on a single
detector is also a major limitation, as useful image information might be
ignored. Consequently, this work demonstrates how to fuse two detectors into a
tracking system. To obtain the trajectories, we propose to formulate tracking
as a weighted graph labeling problem, resulting in a binary quadratic program.
As such problems are NP-hard, the solution can only be approximated. Based on
the Frank-Wolfe algorithm, we present a new solver that is crucial to handle
such difficult problems. Evaluation on pedestrian tracking is provided for
multiple scenarios, showing superior results over single detector tracking and
standard QP-solvers. Finally, our tracker ranks 2nd on the MOT16 benchmark and
1st on the new MOT17 benchmark, outperforming over 90 trackers.Comment: 10 pages, 4 figures; Winner of the MOT17 challenge; CVPRW 201
GOGGLES: Automatic Image Labeling with Affinity Coding
Generating large labeled training data is becoming the biggest bottleneck in
building and deploying supervised machine learning models. Recently, the data
programming paradigm has been proposed to reduce the human cost in labeling
training data. However, data programming relies on designing labeling functions
which still requires significant domain expertise. Also, it is prohibitively
difficult to write labeling functions for image datasets as it is hard to
express domain knowledge using raw features for images (pixels).
We propose affinity coding, a new domain-agnostic paradigm for automated
training data labeling. The core premise of affinity coding is that the
affinity scores of instance pairs belonging to the same class on average should
be higher than those of pairs belonging to different classes, according to some
affinity functions. We build the GOGGLES system that implements affinity coding
for labeling image datasets by designing a novel set of reusable affinity
functions for images, and propose a novel hierarchical generative model for
class inference using a small development set.
We compare GOGGLES with existing data programming systems on 5 image labeling
tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a
minimum of 71% to a maximum of 98% without requiring any extensive human
annotation. In terms of end-to-end performance, GOGGLES outperforms the
state-of-the-art data programming system Snuba by 21% and a state-of-the-art
few-shot learning technique by 5%, and is only 7% away from the fully
supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management
of Dat
Towards Real-Time Detection and Tracking of Spatio-Temporal Features: Blob-Filaments in Fusion Plasma
A novel algorithm and implementation of real-time identification and tracking
of blob-filaments in fusion reactor data is presented. Similar spatio-temporal
features are important in many other applications, for example, ignition
kernels in combustion and tumor cells in a medical image. This work presents an
approach for extracting these features by dividing the overall task into three
steps: local identification of feature cells, grouping feature cells into
extended feature, and tracking movement of feature through overlapping in
space. Through our extensive work in parallelization, we demonstrate that this
approach can effectively make use of a large number of compute nodes to detect
and track blob-filaments in real time in fusion plasma. On a set of 30GB fusion
simulation data, we observed linear speedup on 1024 processes and completed
blob detection in less than three milliseconds using Edison, a Cray XC30 system
at NERSC.Comment: 14 pages, 40 figure
Radar-based Feature Design and Multiclass Classification for Road User Recognition
The classification of individual traffic participants is a complex task,
especially for challenging scenarios with multiple road users or under bad
weather conditions. Radar sensors provide an - with respect to well established
camera systems - orthogonal way of measuring such scenes. In order to gain
accurate classification results, 50 different features are extracted from the
measurement data and tested on their performance. From these features a
suitable subset is chosen and passed to random forest and long short-term
memory (LSTM) classifiers to obtain class predictions for the radar input.
Moreover, it is shown why data imbalance is an inherent problem in automotive
radar classification when the dataset is not sufficiently large. To overcome
this issue, classifier binarization is used among other techniques in order to
better account for underrepresented classes. A new method to couple the
resulting probabilities is proposed and compared to others with great success.
Final results show substantial improvements when compared to ordinary
multiclass classificationComment: 8 pages, 6 figure
Non-Abelian Quantum Hall States and their Quasiparticles: from the Pattern of Zeros to Vertex Algebra
In the pattern-of-zeros approach to quantum Hall states, a set of data
{n;m;S_a|a=1,...,n; n,m,S_a in N} (called the pattern of zeros) is introduced
to characterize a quantum Hall wave function. In this paper we find sufficient
conditions on the pattern of zeros so that the data correspond to a valid wave
function. Some times, a set of data {n;m;S_a} corresponds to a unique quantum
Hall state, while other times, a set of data corresponds to several different
quantum Hall states. So in the latter cases, the patterns of zeros alone does
not completely characterize the quantum Hall states. In this paper, We find
that the following expanded set of data {n;m;S_a;c|a=1,...,n; n,m,S_a in N; c
in R} provides a more complete characterization of quantum Hall states. Each
expanded set of data completely characterize a unique quantum Hall state, at
least for the examples discussed in this paper. The result is obtained by
combining the pattern of zeros and Z_n simple-current vertex algebra which
describes a large class of Abelian and non-Abelian quantum Hall states
\Phi_{Z_n}^sc. The more complete characterization in terms of {n;m;S_a;c}
allows us to obtain more topological properties of those states, which include
the central charge c of edge states, the scaling dimensions and the statistics
of quasiparticle excitations.Comment: 42 pages. RevTeX
- …