16,294 research outputs found
Learning Multimodal Latent Attributes
Abstract—The rapid development of social media sharing has created a huge demand for automatic media classification and annotation techniques. Attribute learning has emerged as a promising paradigm for bridging the semantic gap and addressing data sparsity via transferring attribute knowledge in object recognition and relatively simple action classification. In this paper, we address the task of attribute learning for understanding multimedia data with sparse and incomplete labels. In particular we focus on videos of social group activities, which are particularly challenging and topical examples of this task because of their multi-modal content and complex and unstructured nature relative to the density of annotations. To solve this problem, we (1) introduce a concept of semi-latent attribute space, expressing user-defined and latent attributes in a unified framework, and (2) propose a novel scalable probabilistic topic model for learning multi-modal semi-latent attributes, which dramatically reduces requirements for an exhaustive accurate attribute ontology and expensive annotation effort. We show that our framework is able to exploit latent attributes to outperform contemporary approaches for addressing a variety of realistic multimedia sparse data learning tasks including: multi-task learning, learning with label noise, N-shot transfer learning and importantly zero-shot learning
Intra-Camera Supervised Person Re-Identification
Existing person re-identification (re-id) methods mostly exploit a large set of cross-camera identity labelled training data. This requires a tedious data collection and annotation process, leading to poor scalability in practical re-id applications. On the other hand unsupervised re-id methods do not need identity label information, but they usually suffer from much inferior and insufficient model performance. To overcome these fundamental limitations, we propose a novel person re-identification paradigm based on an idea of independent per-camera identity annotation. This eliminates the most time-consuming and tedious inter-camera identity labelling process, significantly reducing the amount of human annotation efforts. Consequently, it gives rise to a more scalable and more feasible setting, which we call Intra-Camera Supervised (ICS) person re-id, for which we formulate a Multi-tAsk mulTi-labEl (MATE) deep learning method. Specifically, MATE is designed for self-discovering the cross-camera identity correspondence in a per-camera multi-task inference framework. Extensive experiments demonstrate the cost-effectiveness superiority of our method over the alternative approaches on three large person re-id datasets. For example, MATE yields 88.7% rank-1 score on Market-1501 in the proposed ICS person re-id setting, significantly outperforming unsupervised learning models and closely approaching conventional fully supervised learning competitors
Intra-Camera Supervised Person Re-Identification: A New Benchmark
Existing person re-identification (re-id) methods rely mostly on a large set
of inter-camera identity labelled training data, requiring a tedious data
collection and annotation process therefore leading to poor scalability in
practical re-id applications. To overcome this fundamental limitation, we
consider person re-identification without inter-camera identity association but
only with identity labels independently annotated within each individual
camera-view. This eliminates the most time-consuming and tedious inter-camera
identity labelling process in order to significantly reduce the amount of human
efforts required during annotation. It hence gives rise to a more scalable and
more feasible learning scenario, which we call Intra-Camera Supervised (ICS)
person re-id. Under this ICS setting with weaker label supervision, we
formulate a Multi-Task Multi-Label (MTML) deep learning method. Given no
inter-camera association, MTML is specially designed for self-discovering the
inter-camera identity correspondence. This is achieved by inter-camera
multi-label learning under a joint multi-task inference framework. In addition,
MTML can also efficiently learn the discriminative re-id feature
representations by fully using the available identity labels within each
camera-view. Extensive experiments demonstrate the performance superiority of
our MTML model over the state-of-the-art alternative methods on three
large-scale person re-id datasets in the proposed intra-camera supervised
learning setting.Comment: 9 pages, 3 figures, accepted by ICCV Workshop on Real-World
Recognition from Low-Quality Images and Videos, 201
Bringing Structure into Summaries: Crowdsourcing a Benchmark Corpus of Concept Maps
Concept maps can be used to concisely represent important information and
bring structure into large document collections. Therefore, we study a variant
of multi-document summarization that produces summaries in the form of concept
maps. However, suitable evaluation datasets for this task are currently
missing. To close this gap, we present a newly created corpus of concept maps
that summarize heterogeneous collections of web documents on educational
topics. It was created using a novel crowdsourcing approach that allows us to
efficiently determine important elements in large document collections. We
release the corpus along with a baseline system and proposed evaluation
protocol to enable further research on this variant of summarization.Comment: Published at EMNLP 201
Recommended from our members
Interpretable classification of Alzheimer's disease pathologies with a convolutional neural network pipeline.
Neuropathologists assess vast brain areas to identify diverse and subtly-differentiated morphologies. Standard semi-quantitative scoring approaches, however, are coarse-grained and lack precise neuroanatomic localization. We report a proof-of-concept deep learning pipeline that identifies specific neuropathologies-amyloid plaques and cerebral amyloid angiopathy-in immunohistochemically-stained archival slides. Using automated segmentation of stained objects and a cloud-based interface, we annotate > 70,000 plaque candidates from 43 whole slide images (WSIs) to train and evaluate convolutional neural networks. Networks achieve strong plaque classification on a 10-WSI hold-out set (0.993 and 0.743 areas under the receiver operating characteristic and precision recall curve, respectively). Prediction confidence maps visualize morphology distributions at high resolution. Resulting network-derived amyloid beta (Aβ)-burden scores correlate well with established semi-quantitative scores on a 30-WSI blinded hold-out. Finally, saliency mapping demonstrates that networks learn patterns agreeing with accepted pathologic features. This scalable means to augment a neuropathologist's ability suggests a route to neuropathologic deep phenotyping
Highly Efficient Regression for Scalable Person Re-Identification
Existing person re-identification models are poor for scaling up to large
data required in real-world applications due to: (1) Complexity: They employ
complex models for optimal performance resulting in high computational cost for
training at a large scale; (2) Inadaptability: Once trained, they are
unsuitable for incremental update to incorporate any new data available. This
work proposes a truly scalable solution to re-id by addressing both problems.
Specifically, a Highly Efficient Regression (HER) model is formulated by
embedding the Fisher's criterion to a ridge regression model for very fast
re-id model learning with scalable memory/storage usage. Importantly, this new
HER model supports faster than real-time incremental model updates therefore
making real-time active learning feasible in re-id with human-in-the-loop.
Extensive experiments show that such a simple and fast model not only
outperforms notably the state-of-the-art re-id methods, but also is more
scalable to large data with additional benefits to active learning for reducing
human labelling effort in re-id deployment
- …