281 research outputs found
Binary Representation Learning for Large Scale Visual Data
The exponentially growing modern media created large amount of multimodal or multidomain visual data, which usually reside in high dimensional space. And it is crucial to provide not only effective but also efficient understanding of the data.In this dissertation, we focus on learning binary representation of visual dataset, whose primary use has been hash code for retrieval purpose. Simultaneously it serves as multifunctional feature that can also be used for various computer vision tasks. Essentially, this is achieved by discriminative learning that preserves the supervision information in the binary representation.By using deep networks such as convolutional neural networks (CNNs) as backbones, and effective binary embedding algorithm that is seamlessly integrated into the learning process, we achieve state-of-the art performance on several settings. First, we study the supervised binary representation learning problem by using label information directly instead of pairwise similarity or triplet loss. By considering images and associated textual information, we study the cross-modal representation learning. CNNs are used in both image and text embedding, and we are able to perform retrieval and prediction across these modalities. Furthermore, by utilizing unlabeled images from a different domain, we propose to use adversarial learning to connect these domains. Finally, we also consider progressive learning for more efficient learning and instance-level representation learning to provide finer granularity understanding. This dissertation demonstrates that binary representation is versatile and powerful under various circumstances with different tasks
Content-Based Video Retrieval in Historical Collections of the German Broadcasting Archive
The German Broadcasting Archive (DRA) maintains the cultural heritage of
radio and television broadcasts of the former German Democratic Republic (GDR).
The uniqueness and importance of the video material stimulates a large
scientific interest in the video content. In this paper, we present an
automatic video analysis and retrieval system for searching in historical
collections of GDR television recordings. It consists of video analysis
algorithms for shot boundary detection, concept classification, person
recognition, text recognition and similarity search. The performance of the
system is evaluated from a technical and an archival perspective on 2,500 hours
of GDR television recordings.Comment: TPDL 2016, Hannover, Germany. Final version is available at Springer
via DO
Ranking-based Deep Cross-modal Hashing
Cross-modal hashing has been receiving increasing interests for its low
storage cost and fast query speed in multi-modal data retrievals. However, most
existing hashing methods are based on hand-crafted or raw level features of
objects, which may not be optimally compatible with the coding process.
Besides, these hashing methods are mainly designed to handle simple pairwise
similarity. The complex multilevel ranking semantic structure of instances
associated with multiple labels has not been well explored yet. In this paper,
we propose a ranking-based deep cross-modal hashing approach (RDCMH). RDCMH
firstly uses the feature and label information of data to derive a
semi-supervised semantic ranking list. Next, to expand the semantic
representation power of hand-crafted features, RDCMH integrates the semantic
ranking information into deep cross-modal hashing and jointly optimizes the
compatible parameters of deep feature representations and of hashing functions.
Experiments on real multi-modal datasets show that RDCMH outperforms other
competitive baselines and achieves the state-of-the-art performance in
cross-modal retrieval applications
Deep Extreme Multi-label Learning
Extreme multi-label learning (XML) or classification has been a practical and
important problem since the boom of big data. The main challenge lies in the
exponential label space which involves possible label sets especially
when the label dimension is huge, e.g., in millions for Wikipedia labels.
This paper is motivated to better explore the label space by originally
establishing an explicit label graph. In the meanwhile, deep learning has been
widely studied and used in various classification problems including
multi-label classification, however it has not been properly introduced to XML,
where the label space can be as large as in millions. In this paper, we propose
a practical deep embedding method for extreme multi-label classification, which
harvests the ideas of non-linear embedding and graph priors-based label space
modeling simultaneously. Extensive experiments on public datasets for XML show
that our method performs competitive against state-of-the-art result
- …