255 research outputs found
Deep Multimodal Speaker Naming
Automatic speaker naming is the problem of localizing as well as identifying
each speaking character in a TV/movie/live show video. This is a challenging
problem mainly attributes to its multimodal nature, namely face cue alone is
insufficient to achieve good performance. Previous multimodal approaches to
this problem usually process the data of different modalities individually and
merge them using handcrafted heuristics. Such approaches work well for simple
scenes, but fail to achieve high performance for speakers with large appearance
variations. In this paper, we propose a novel convolutional neural networks
(CNN) based learning framework to automatically learn the fusion function of
both face and audio cues. We show that without using face tracking, facial
landmark localization or subtitle/transcript, our system with robust multimodal
feature extraction is able to achieve state-of-the-art speaker naming
performance evaluated on two diverse TV series. The dataset and implementation
of our algorithm are publicly available online
Look, Listen and Learn - A Multimodal LSTM for Speaker Identification
Speaker identification refers to the task of localizing the face of a person
who has the same identity as the ongoing voice in a video. This task not only
requires collective perception over both visual and auditory signals, the
robustness to handle severe quality degradations and unconstrained content
variations are also indispensable. In this paper, we describe a novel
multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies
both visual and auditory modalities from the beginning of each sequence input.
The key idea is to extend the conventional LSTM by not only sharing weights
across time steps, but also sharing weights across modalities. We show that
modeling the temporal dependency across face and voice can significantly
improve the robustness to content quality degradations and variations. We also
found that our multimodal LSTM is robustness to distractors, namely the
non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory
dataset and showed that our system outperforms the state-of-the-art systems in
speaker identification with lower false alarm rate and higher recognition
accuracy.Comment: The 30th AAAI Conference on Artificial Intelligence (AAAI-16
Two-Stage Predict+Optimize for Mixed Integer Linear Programs with Unknown Parameters in Constraints
Consider the setting of constrained optimization, with some parameters
unknown at solving time and requiring prediction from relevant features.
Predict+Optimize is a recent framework for end-to-end training supervised
learning models for such predictions, incorporating information about the
optimization problem in the training process in order to yield better
predictions in terms of the quality of the predicted solution under the true
parameters. Almost all prior works have focused on the special case where the
unknowns appear only in the optimization objective and not the constraints. Hu
et al.~proposed the first adaptation of Predict+Optimize to handle unknowns
appearing in constraints, but the framework has somewhat ad-hoc elements, and
they provided a training algorithm only for covering and packing linear
programs. In this work, we give a new \emph{simpler} and \emph{more powerful}
framework called \emph{Two-Stage Predict+Optimize}, which we believe should be
the canonical framework for the Predict+Optimize setting. We also give a
training algorithm usable for all mixed integer linear programs, vastly
generalizing the applicability of the framework. Experimental results
demonstrate the superior prediction performance of our training framework over
all classical and state-of-the-art methods
Generalized and Scalable Optimal Sparse Decision Trees
Decision tree optimization is notoriously difficult from a computational
perspective but essential for the field of interpretable machine learning.
Despite efforts over the past 40 years, only recently have optimization
breakthroughs been made that have allowed practical algorithms to find optimal
decision trees. These new techniques have the potential to trigger a paradigm
shift where it is possible to construct sparse decision trees to efficiently
optimize a variety of objective functions without relying on greedy splitting
and pruning heuristics that often lead to suboptimal solutions. The
contribution in this work is to provide a general framework for decision tree
optimization that addresses the two significant open problems in the area:
treatment of imbalanced data and fully optimizing over continuous variables. We
present techniques that produce optimal decision trees over a variety of
objectives including F-score, AUC, and partial area under the ROC convex hull.
We also introduce a scalable algorithm that produces provably optimal results
in the presence of continuous variables and speeds up decision tree
construction by several orders of magnitude relative to the state-of-the art.Comment: This paper was published in ICML 202
Working with the homeless: The case of a non-profit organisation in Shanghai
This article addresses a two-pronged objective, namely to bring to the fore a much neglected social issue of homelessness, and to explore the dynamics of state-society relations in contemporary China, through a case study of a non-profit organisation (NPO) working with the homeless in Shanghai. It shows that the largely invisible homelessness in Chinese cities was substantially due to exclusionary institutions, such as the combined household registration and 'detention and deportation' systems. Official policy has become much more supportive since 2003 when the latter was replaced with government-run shelters, but we argue that the NPO case demonstrates the potential for enhanced longer-term support and enabling active citizenship for homeless people. By analysing the ways in which the NPO offers services through collaboration and partnership with the public (and private) actors, we also argue that the transformations in postreform China and the changes within the state and civil society have significantly blurred their boundaries, rendering state-society relations much more complex, dynamic, fluid and mutually embedded
Phosphorothioate DNA Mediated Sequence-Insensitive Etching and Ripening of Silver Nanoparticles
Many DNA-functionalized nanomaterials and biosensors have been reported, but most have ignored the influence of DNA on the stability of nanoparticles. We observed that cytosine-rich DNA oligonucleotides can etch silver nanoparticles (AgNPs). In this work, we showed that phosphorothioate (PS)-modified DNA (PS-DNA) can etch AgNPs independently of DNA sequence, suggesting that the thio-modifications are playing the major role in etching. Compared to unmodified DNA (e.g., poly-cytosine DNA), the concentration of required PS DNA decreases sharply, and the reaction rate increases. Furthermore, etching by PS-DNA occurs quite independent of pH, which is also different from unmodified DNA. The PS-DNA mediated etching could also be controlled well by varying DNA length and conformation, and the number and location of PS modifications. With a higher activity of PS-DNA, the process of etching, ripening, and further etching was taken place sequentially. The etching ability is inhibited by forming duplex DNA and thus etching can be used to measure the concentration of complementary DNA
- …