10,888 research outputs found
Deep Multimodal Speaker Naming
Automatic speaker naming is the problem of localizing as well as identifying
each speaking character in a TV/movie/live show video. This is a challenging
problem mainly attributes to its multimodal nature, namely face cue alone is
insufficient to achieve good performance. Previous multimodal approaches to
this problem usually process the data of different modalities individually and
merge them using handcrafted heuristics. Such approaches work well for simple
scenes, but fail to achieve high performance for speakers with large appearance
variations. In this paper, we propose a novel convolutional neural networks
(CNN) based learning framework to automatically learn the fusion function of
both face and audio cues. We show that without using face tracking, facial
landmark localization or subtitle/transcript, our system with robust multimodal
feature extraction is able to achieve state-of-the-art speaker naming
performance evaluated on two diverse TV series. The dataset and implementation
of our algorithm are publicly available online
Nematic crossover in BaFeAs under uniaxial stress
Raman scattering can detect spontaneous point-group symmetry breaking without
resorting to single-domain samples. Here we use this technique to study
, the parent compound of the "122" Fe-based
superconductors. We show that an applied compression along the Fe-Fe direction,
which is commonly used to produce untwinned orthorhombic samples, changes the
structural phase transition at temperature into a crossover
that spans a considerable temperature range above . Even in
crystals that are not subject to any applied force, a distribution of
substantial residual stress remains, which may explain phenomena that are
seemingly indicative of symmetry breaking above . Our results
are consistent with an onset of spontaneous nematicity only below
.Comment: 4 pages, 4 figure
- …