375 research outputs found
Modeling Temporal Evidence from External Collections
Newsworthy events are broadcast through multiple mediums and prompt the
crowds to produce comments on social media. In this paper, we propose to
leverage on this behavioral dynamics to estimate the most relevant time periods
for an event (i.e., query). Recent advances have shown how to improve the
estimation of the temporal relevance of such topics. In this approach, we build
on two major novelties. First, we mine temporal evidences from hundreds of
external sources into topic-based external collections to improve the
robustness of the detection of relevant time periods. Second, we propose a
formal retrieval model that generalizes the use of the temporal dimension
across different aspects of the retrieval process. In particular, we show that
temporal evidence of external collections can be used to (i) infer a topic's
temporal relevance, (ii) select the query expansion terms, and (iii) re-rank
the final results for improved precision. Experiments with TREC Microblog
collections show that the proposed time-aware retrieval model makes an
effective and extensive use of the temporal dimension to improve search results
over the most recent temporal models. Interestingly, we observe a strong
correlation between precision and the temporal distribution of retrieved and
relevant documents.Comment: To appear in WSDM 201
Deep Multimodal Speaker Naming
Automatic speaker naming is the problem of localizing as well as identifying
each speaking character in a TV/movie/live show video. This is a challenging
problem mainly attributes to its multimodal nature, namely face cue alone is
insufficient to achieve good performance. Previous multimodal approaches to
this problem usually process the data of different modalities individually and
merge them using handcrafted heuristics. Such approaches work well for simple
scenes, but fail to achieve high performance for speakers with large appearance
variations. In this paper, we propose a novel convolutional neural networks
(CNN) based learning framework to automatically learn the fusion function of
both face and audio cues. We show that without using face tracking, facial
landmark localization or subtitle/transcript, our system with robust multimodal
feature extraction is able to achieve state-of-the-art speaker naming
performance evaluated on two diverse TV series. The dataset and implementation
of our algorithm are publicly available online
Look, Listen and Learn - A Multimodal LSTM for Speaker Identification
Speaker identification refers to the task of localizing the face of a person
who has the same identity as the ongoing voice in a video. This task not only
requires collective perception over both visual and auditory signals, the
robustness to handle severe quality degradations and unconstrained content
variations are also indispensable. In this paper, we describe a novel
multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies
both visual and auditory modalities from the beginning of each sequence input.
The key idea is to extend the conventional LSTM by not only sharing weights
across time steps, but also sharing weights across modalities. We show that
modeling the temporal dependency across face and voice can significantly
improve the robustness to content quality degradations and variations. We also
found that our multimodal LSTM is robustness to distractors, namely the
non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory
dataset and showed that our system outperforms the state-of-the-art systems in
speaker identification with lower false alarm rate and higher recognition
accuracy.Comment: The 30th AAAI Conference on Artificial Intelligence (AAAI-16
Accurate Single Stage Detector Using Recurrent Rolling Convolution
Most of the recent successful methods in accurate object detection and
localization used some variants of R-CNN style two stage Convolutional Neural
Networks (CNN) where plausible regions were proposed in the first stage then
followed by a second stage for decision refinement. Despite the simplicity of
training and the efficiency in deployment, the single stage detection methods
have not been as competitive when evaluated in benchmarks consider mAP for high
IoU thresholds. In this paper, we proposed a novel single stage end-to-end
trainable object detection network to overcome this limitation. We achieved
this by introducing Recurrent Rolling Convolution (RRC) architecture over
multi-scale feature maps to construct object classifiers and bounding box
regressors which are "deep in context". We evaluated our method in the
challenging KITTI dataset which measures methods under IoU threshold of 0.7. We
showed that with RRC, a single reduced VGG-16 based model already significantly
outperformed all the previously published results. At the time this paper was
written our models ranked the first in KITTI car detection (the hard level),
the first in cyclist detection and the second in pedestrian detection. These
results were not reached by the previous single stage methods. The code is
publicly available.Comment: CVPR 201
Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks
Predicting the future health information of patients from the historical
Electronic Health Records (EHR) is a core research task in the development of
personalized healthcare. Patient EHR data consist of sequences of visits over
time, where each visit contains multiple medical codes, including diagnosis,
medication, and procedure codes. The most important challenges for this task
are to model the temporality and high dimensionality of sequential EHR data and
to interpret the prediction results. Existing work solves this problem by
employing recurrent neural networks (RNNs) to model EHR data and utilizing
simple attention mechanism to interpret the results. However, RNN-based
approaches suffer from the problem that the performance of RNNs drops when the
length of sequences is large, and the relationships between subsequent visits
are ignored by current RNN-based approaches. To address these issues, we
propose {\sf Dipole}, an end-to-end, simple and robust model for predicting
patients' future health information. Dipole employs bidirectional recurrent
neural networks to remember all the information of both the past visits and the
future visits, and it introduces three attention mechanisms to measure the
relationships of different visits for the prediction. With the attention
mechanisms, Dipole can interpret the prediction results effectively. Dipole
also allows us to interpret the learned medical code representations which are
confirmed positively by medical experts. Experimental results on two real world
EHR datasets show that the proposed Dipole can significantly improve the
prediction accuracy compared with the state-of-the-art diagnosis prediction
approaches and provide clinically meaningful interpretation
Bridging Parametric and Nonparametric Methods in Cognitive Diagnosis
A number of parametric and nonparametric methods for estimating cognitive
diagnosis models (CDMs) have been developed and applied in a wide range of
contexts. However, in the literature, a wide chasm exists between these two
families of methods, and their relationship to each other is not well
understood. In this paper, we propose a unified estimation framework to bridge
the divide between parametric and nonparametric methods in cognitive diagnosis
to better understand their relationship. We also develop iterative joint
estimation algorithms and establish consistency properties within the proposed
framework. Lastly, we present comprehensive simulation results to compare
different methods, and provide practical recommendations on the appropriate use
of the proposed framework in various CDM contexts
Effects of Polyethylene Glycol on DNA Adsorption and Hybridization on Gold Nanoparticles and Graphene Oxide
This document is the Accepted Manuscript version of a Published Work that appeared in final form in Langmuir, copyright © American Chemical Society after peer review and technical editing by publisher. To access the final edited and published work see http://dx.doi.org/10.1021/la302799sUnderstanding the interface between DNA and nanomaterials is crucial for rational design and optimization of biosensors and drug delivery systems. For detection and delivery into cells, where high concentrations of cellular proteins are present, another layer of complexity is added. In this context, we employ polyethylene glycol (PEG) as a model polymer to mimic the excluded volume effect of cellular proteins and to test its effects on DNA adsorption and hybridization on gold nanoparticles (AuNPs) and graphene oxide (GO), both of which show great promise for designing intracellular biosensors and drug delivery systems. We show that PEG 20000 (e.g., 4%) accelerates DNA hybridization to DNA-functionalized AuNPs by 50–100%, but this enhanced hybridization kinetics has not been observed with free DNA. Therefore, this rate enhancement is attributed to the surface blocking effect by PEG instead of the macromolecular crowding effect. On the other hand, DNA adsorption on citrate-capped AuNP surfaces is impeded even in the presence of a trace level (i.e., parts per billion) of PEG, confirming PEG competes with DNA for surface binding sites. Additional insights have been obtained by studying the adsorption of a thiolated DNA and a peptide nucleic acid. In these cases, the steric effects of PEG to impede adsorption are observed. Similar observations have also been made with GO. Therefore, PEG may be used as an effective blocking agent for both hydrophilic AuNP and for GO that also contains hydrophobic domains.University of Waterloo ||
Canadian Foundation for Innovation ||
Ontario Ministry of Research & Innovation ||
Canadian Institutes of Health Research ||
Natural Sciences and Engineering Research Council |
- …