26,725 research outputs found
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
Improving speaker turn embedding by crossmodal transfer learning from face embedding
Learning speaker turn embeddings has shown considerable improvement in
situations where conventional speaker modeling approaches fail. However, this
improvement is relatively limited when compared to the gain observed in face
embedding learning, which has been proven very successful for face verification
and clustering tasks. Assuming that face and voices from the same identities
share some latent properties (like age, gender, ethnicity), we propose three
transfer learning approaches to leverage the knowledge from the face domain
(learned from thousands of images and identities) for tasks in the speaker
domain. These approaches, namely target embedding transfer, relative distance
transfer, and clustering structure transfer, utilize the structure of the
source face embedding space at different granularities to regularize the target
speaker turn embedding space as optimizing terms. Our methods are evaluated on
two public broadcast corpora and yield promising advances over competitive
baselines in verification and audio clustering tasks, especially when dealing
with short speaker utterances. The analysis of the results also gives insight
into characteristics of the embedding spaces and shows their potential
applications
Evidence Transfer for Improving Clustering Tasks Using External Categorical Evidence
In this paper we introduce evidence transfer for clustering, a deep learning
method that can incrementally manipulate the latent representations of an
autoencoder, according to external categorical evidence, in order to improve a
clustering outcome. By evidence transfer we define the process by which the
categorical outcome of an external, auxiliary task is exploited to improve a
primary task, in this case representation learning for clustering. Our proposed
method makes no assumptions regarding the categorical evidence presented, nor
the structure of the latent space. We compare our method, against the baseline
solution by performing k-means clustering before and after its deployment.
Experiments with three different kinds of evidence show that our method
effectively manipulates the latent representations when introduced with real
corresponding evidence, while remaining robust when presented with low quality
evidence
Learning Social Relation Traits from Face Images
Social relation defines the association, e.g, warm, friendliness, and
dominance, between two or more people. Motivated by psychological studies, we
investigate if such fine-grained and high-level relation traits can be
characterised and quantified from face images in the wild. To address this
challenging problem we propose a deep model that learns a rich face
representation to capture gender, expression, head pose, and age-related
attributes, and then performs pairwise-face reasoning for relation prediction.
To learn from heterogeneous attribute sources, we formulate a new network
architecture with a bridging layer to leverage the inherent correspondences
among these datasets. It can also cope with missing target attribute labels.
Extensive experiments show that our approach is effective for fine-grained
social relation learning in images and videos.Comment: To appear in International Conference on Computer Vision (ICCV) 201
- …