Search CORE

375 research outputs found

Modeling Temporal Evidence from External Collections

Author: Craveiro Olga
Guo Weiwei
Lin Jimmy
Lin Jimmy
Metzler Donald
O'Connor Brendan
Shokouhi Milad
Xu Tan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/12/2018
Field of study

Newsworthy events are broadcast through multiple mediums and prompt the crowds to produce comments on social media. In this paper, we propose to leverage on this behavioral dynamics to estimate the most relevant time periods for an event (i.e., query). Recent advances have shown how to improve the estimation of the temporal relevance of such topics. In this approach, we build on two major novelties. First, we mine temporal evidences from hundreds of external sources into topic-based external collections to improve the robustness of the detection of relevant time periods. Second, we propose a formal retrieval model that generalizes the use of the temporal dimension across different aspects of the retrieval process. In particular, we show that temporal evidence of external collections can be used to (i) infer a topic's temporal relevance, (ii) select the query expansion terms, and (iii) re-rank the final results for improved precision. Experiments with TREC Microblog collections show that the proposed time-aware retrieval model makes an effective and extensive use of the temporal dimension to improve search results over the most recent temporal models. Interestingly, we observe a strong correlation between precision and the temporal distribution of retrieved and relevant documents.Comment: To appear in WSDM 201

arXiv.org e-Print Archive

Crossref

Deep Multimodal Speaker Naming

Author: Dai Jingwen
Hu Yongtao
Ren Jimmy
Wang Wenping
Xu Li
Yuan Chang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/07/2015
Field of study

Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online

arXiv.org e-Print Archive

Crossref

Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

Author: Hu Yongtao
Ren Jimmy
Sun Wenxiu
Tai Yu-Wing
Wang Chuan
Xu Li
Yan Qiong
Publication venue
Publication date: 13/02/2016
Field of study

Speaker identification refers to the task of localizing the face of a person who has the same identity as the ongoing voice in a video. This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable. In this paper, we describe a novel multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies both visual and auditory modalities from the beginning of each sequence input. The key idea is to extend the conventional LSTM by not only sharing weights across time steps, but also sharing weights across modalities. We show that modeling the temporal dependency across face and voice can significantly improve the robustness to content quality degradations and variations. We also found that our multimodal LSTM is robustness to distractors, namely the non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory dataset and showed that our system outperforms the state-of-the-art systems in speaker identification with lower false alarm rate and higher recognition accuracy.Comment: The 30th AAAI Conference on Artificial Intelligence (AAAI-16

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Author: Chen Xiaohao
Liu Jianbo
Pang Jiahao
Ren Jimmy
Sun Wenxiu
Tai Yu-Wing
Xu Li
Yan Qiong
Publication venue
Publication date: 19/04/2017
Field of study

Most of the recent successful methods in accurate object detection and localization used some variants of R-CNN style two stage Convolutional Neural Networks (CNN) where plausible regions were proposed in the first stage then followed by a second stage for decision refinement. Despite the simplicity of training and the efficiency in deployment, the single stage detection methods have not been as competitive when evaluated in benchmarks consider mAP for high IoU thresholds. In this paper, we proposed a novel single stage end-to-end trainable object detection network to overcome this limitation. We achieved this by introducing Recurrent Rolling Convolution (RRC) architecture over multi-scale feature maps to construct object classifiers and bounding box regressors which are "deep in context". We evaluated our method in the challenging KITTI dataset which measures methods under IoU threshold of 0.7. We showed that with RRC, a single reduced VGG-16 based model already significantly outperformed all the previously published results. At the time this paper was written our models ranked the first in KITTI car detection (the hard level), the first in cyclist detection and the second in pedestrian detection. These results were not reached by the previous single stage methods. The code is publicly available.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Dipole: Diagnosis Prediction in Healthcare via Attention-based Bidirectional Recurrent Neural Networks

Author: Ba Jimmy
Cheng Yu
Choi Edward
Lipton Zachary C
Suo Qiuling
Wang Xiang
Xu Kelvin
Zeiler Matthew D
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/06/2017
Field of study

Predicting the future health information of patients from the historical Electronic Health Records (EHR) is a core research task in the development of personalized healthcare. Patient EHR data consist of sequences of visits over time, where each visit contains multiple medical codes, including diagnosis, medication, and procedure codes. The most important challenges for this task are to model the temporality and high dimensionality of sequential EHR data and to interpret the prediction results. Existing work solves this problem by employing recurrent neural networks (RNNs) to model EHR data and utilizing simple attention mechanism to interpret the results. However, RNN-based approaches suffer from the problem that the performance of RNNs drops when the length of sequences is large, and the relationships between subsequent visits are ignored by current RNN-based approaches. To address these issues, we propose {\sf Dipole}, an end-to-end, simple and robust model for predicting patients' future health information. Dipole employs bidirectional recurrent neural networks to remember all the information of both the past visits and the future visits, and it introduces three attention mechanisms to measure the relationships of different visits for the prediction. With the attention mechanisms, Dipole can interpret the prediction results effectively. Dipole also allows us to interpret the learned medical code representations which are confirmed positively by medical experts. Experimental results on two real world EHR datasets show that the proposed Dipole can significantly improve the prediction accuracy compared with the state-of-the-art diagnosis prediction approaches and provide clinically meaningful interpretation

arXiv.org e-Print Archive

Crossref

Bridging Parametric and Nonparametric Methods in Cognitive Diagnosis

Author: de la Torre Jimmy
Ma Chenchen
Xu Gongjun
Publication venue
Publication date: 27/06/2020
Field of study

A number of parametric and nonparametric methods for estimating cognitive diagnosis models (CDMs) have been developed and applied in a wide range of contexts. However, in the literature, a wide chasm exists between these two families of methods, and their relationship to each other is not well understood. In this paper, we propose a unified estimation framework to bridge the divide between parametric and nonparametric methods in cognitive diagnosis to better understand their relationship. We also develop iterative joint estimation algorithms and establish consistency properties within the proposed framework. Lastly, we present comprehensive simulation results to compare different methods, and provide practical recommendations on the appropriate use of the proposed framework in various CDM contexts

arXiv.org e-Print Archive

Effects of Polyethylene Glycol on DNA Adsorption and Hybridization on Gold Nanoparticles and Graphene Oxide

Author: Huang Po-Jung Jimmy
Liu Juewen
Servos Mark R.
Zhang Xu
Publication venue: 'American Chemical Society (ACS)'
Publication date: 28/09/2012
Field of study

This document is the Accepted Manuscript version of a Published Work that appeared in final form in Langmuir, copyright © American Chemical Society after peer review and technical editing by publisher. To access the final edited and published work see http://dx.doi.org/10.1021/la302799sUnderstanding the interface between DNA and nanomaterials is crucial for rational design and optimization of biosensors and drug delivery systems. For detection and delivery into cells, where high concentrations of cellular proteins are present, another layer of complexity is added. In this context, we employ polyethylene glycol (PEG) as a model polymer to mimic the excluded volume effect of cellular proteins and to test its effects on DNA adsorption and hybridization on gold nanoparticles (AuNPs) and graphene oxide (GO), both of which show great promise for designing intracellular biosensors and drug delivery systems. We show that PEG 20000 (e.g., 4%) accelerates DNA hybridization to DNA-functionalized AuNPs by 50–100%, but this enhanced hybridization kinetics has not been observed with free DNA. Therefore, this rate enhancement is attributed to the surface blocking effect by PEG instead of the macromolecular crowding effect. On the other hand, DNA adsorption on citrate-capped AuNP surfaces is impeded even in the presence of a trace level (i.e., parts per billion) of PEG, confirming PEG competes with DNA for surface binding sites. Additional insights have been obtained by studying the adsorption of a thiolated DNA and a peptide nucleic acid. In these cases, the steric effects of PEG to impede adsorption are observed. Similar observations have also been made with GO. Therefore, PEG may be used as an effective blocking agent for both hydrophilic AuNP and for GO that also contains hydrophobic domains.University of Waterloo || Canadian Foundation for Innovation || Ontario Ministry of Research & Innovation || Canadian Institutes of Health Research || Natural Sciences and Engineering Research Council |

University of Waterloo's Institutional Repository

Crossref