1,374 research outputs found
Shuffled Multi-Channel Sparse Signal Recovery
Mismatches between samples and their respective channel or target commonly
arise in several real-world applications. For instance, whole-brain calcium
imaging of freely moving organisms, multiple-target tracking or multi-person
contactless vital sign monitoring may be severely affected by mismatched
sample-channel assignments. To systematically address this fundamental problem,
we pose it as a signal reconstruction problem where we have lost
correspondences between the samples and their respective channels. Assuming
that we have a sensing matrix for the underlying signals, we show that the
problem is equivalent to a structured unlabeled sensing problem, and establish
sufficient conditions for unique recovery. To the best of our knowledge, a
sampling result for the reconstruction of shuffled multi-channel signals has
not been considered in the literature and existing methods for unlabeled
sensing cannot be directly applied. We extend our results to the case where the
signals admit a sparse representation in an overcomplete dictionary (i.e., the
sensing matrix is not precisely known), and derive sufficient conditions for
the reconstruction of shuffled sparse signals. We propose a robust
reconstruction method that combines sparse signal recovery with robust linear
regression for the two-channel case. The performance and robustness of the
proposed approach is illustrated in an application related to whole-brain
calcium imaging. The proposed methodology can be generalized to sparse signal
representations other than the ones considered in this work to be applied in a
variety of real-world problems with imprecise measurement or channel
assignment.Comment: Submitted to TS
Shuffled linear regression through graduated convex relaxation
The shuffled linear regression problem aims to recover linear relationships
in datasets where the correspondence between input and output is unknown. This
problem arises in a wide range of applications including survey data, in which
one needs to decide whether the anonymity of the responses can be preserved
while uncovering significant statistical connections. In this work, we propose
a novel optimization algorithm for shuffled linear regression based on a
posterior-maximizing objective function assuming Gaussian noise prior. We
compare and contrast our approach with existing methods on synthetic and real
data. We show that our approach performs competitively while achieving
empirical running-time improvements. Furthermore, we demonstrate that our
algorithm is able to utilize the side information in the form of seeds, which
recently came to prominence in related problems
Homomorphic Sensing of Subspace Arrangements
Homomorphic sensing is a recent algebraic-geometric framework that studies
the unique recovery of points in a linear subspace from their images under a
given collection of linear maps. It has been successful in interpreting such a
recovery in the case of permutations composed by coordinate projections, an
important instance in applications known as unlabeled sensing, which models
data that are out of order and have missing values. In this paper, we provide
tighter and simpler conditions that guarantee the unique recovery for the
single-subspace case, extend the result to the case of a subspace arrangement,
and show that the unique recovery in a single subspace is locally stable under
noise. We specialize our results to several examples of homomorphic sensing
such as real phase retrieval and unlabeled sensing. In so doing, in a unified
way, we obtain conditions that guarantee the unique recovery for those
examples, typically known via diverse techniques in the literature, as well as
novel conditions for sparse and unsigned versions of unlabeled sensing.
Similarly, our noise result also implies that the unique recovery in unlabeled
sensing is locally stable.Comment: 18 page
A Deep and Autoregressive Approach for Topic Modeling of Multimodal Data
Topic modeling based on latent Dirichlet allocation (LDA) has been a
framework of choice to deal with multimodal data, such as in image annotation
tasks. Another popular approach to model the multimodal data is through deep
neural networks, such as the deep Boltzmann machine (DBM). Recently, a new type
of topic model called the Document Neural Autoregressive Distribution Estimator
(DocNADE) was proposed and demonstrated state-of-the-art performance for text
document modeling. In this work, we show how to successfully apply and extend
this model to multimodal data, such as simultaneous image classification and
annotation. First, we propose SupDocNADE, a supervised extension of DocNADE,
that increases the discriminative power of the learned hidden topic features
and show how to employ it to learn a joint representation from image visual
words, annotation words and class label information. We test our model on the
LabelMe and UIUC-Sports data sets and show that it compares favorably to other
topic models. Second, we propose a deep extension of our model and provide an
efficient way of training the deep model. Experimental results show that our
deep model outperforms its shallow version and reaches state-of-the-art
performance on the Multimedia Information Retrieval (MIR) Flickr data set.Comment: 24 pages, 10 figures. A version has been accepted by TPAMI on Aug
4th, 2015. Add footnote about how to train the model in practice in Section
5.1. arXiv admin note: substantial text overlap with arXiv:1305.530
VIBE: Video Inference for Human Body Pose and Shape Estimation
Human motion is fundamental to understanding behavior. Despite progress on
single-image 3D pose and shape estimation, existing video-based
state-of-the-art methods fail to produce accurate and natural motion sequences
due to a lack of ground-truth 3D motion data for training. To address this
problem, we propose Video Inference for Body Pose and Shape Estimation (VIBE),
which makes use of an existing large-scale motion capture dataset (AMASS)
together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty
is an adversarial learning framework that leverages AMASS to discriminate
between real human motions and those produced by our temporal pose and shape
regression networks. We define a temporal network architecture and show that
adversarial training, at the sequence level, produces kinematically plausible
motion sequences without in-the-wild ground-truth 3D labels. We perform
extensive experimentation to analyze the importance of motion and demonstrate
the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving
state-of-the-art performance. Code and pretrained models are available at
https://github.com/mkocabas/VIBE.Comment: CVPR-2020 camera ready. Code is available at
https://github.com/mkocabas/VIB
Deep Unsupervised Similarity Learning using Partially Ordered Sets
Unsupervised learning of visual similarities is of paramount importance to
computer vision, particularly due to lacking training data for fine-grained
similarities. Deep learning of similarities is often based on relationships
between pairs or triplets of samples. Many of these relations are unreliable
and mutually contradicting, implying inconsistencies when trained without
supervision information that relates different tuples or triplets to each
other. To overcome this problem, we use local estimates of reliable
(dis-)similarities to initially group samples into compact surrogate classes
and use local partial orders of samples to classes to link classes to each
other. Similarity learning is then formulated as a partial ordering task with
soft correspondences of all samples to classes. Adopting a strategy of
self-supervision, a CNN is trained to optimally represent samples in a mutually
consistent manner while updating the classes. The similarity learning and
grouping procedure are integrated in a single model and optimized jointly. The
proposed unsupervised approach shows competitive performance on detailed pose
estimation and object classification.Comment: Accepted for publication at IEEE Computer Vision and Pattern
Recognition 201
- …