280 research outputs found

    Deep Multimodal Speaker Naming

    Full text link
    Automatic speaker naming is the problem of localizing as well as identifying each speaking character in a TV/movie/live show video. This is a challenging problem mainly attributes to its multimodal nature, namely face cue alone is insufficient to achieve good performance. Previous multimodal approaches to this problem usually process the data of different modalities individually and merge them using handcrafted heuristics. Such approaches work well for simple scenes, but fail to achieve high performance for speakers with large appearance variations. In this paper, we propose a novel convolutional neural networks (CNN) based learning framework to automatically learn the fusion function of both face and audio cues. We show that without using face tracking, facial landmark localization or subtitle/transcript, our system with robust multimodal feature extraction is able to achieve state-of-the-art speaker naming performance evaluated on two diverse TV series. The dataset and implementation of our algorithm are publicly available online

    Look, Listen and Learn - A Multimodal LSTM for Speaker Identification

    Full text link
    Speaker identification refers to the task of localizing the face of a person who has the same identity as the ongoing voice in a video. This task not only requires collective perception over both visual and auditory signals, the robustness to handle severe quality degradations and unconstrained content variations are also indispensable. In this paper, we describe a novel multimodal Long Short-Term Memory (LSTM) architecture which seamlessly unifies both visual and auditory modalities from the beginning of each sequence input. The key idea is to extend the conventional LSTM by not only sharing weights across time steps, but also sharing weights across modalities. We show that modeling the temporal dependency across face and voice can significantly improve the robustness to content quality degradations and variations. We also found that our multimodal LSTM is robustness to distractors, namely the non-speaking identities. We applied our multimodal LSTM to The Big Bang Theory dataset and showed that our system outperforms the state-of-the-art systems in speaker identification with lower false alarm rate and higher recognition accuracy.Comment: The 30th AAAI Conference on Artificial Intelligence (AAAI-16

    Bias-correction and Test for Mark-point Dependence with Replicated Marked Point Processes

    Full text link
    Mark-point dependence plays a critical role in research problems that can be fitted into the general framework of marked point processes. In this work, we focus on adjusting for mark-point dependence when estimating the mean and covariance functions of the mark process, given independent replicates of the marked point process. We assume that the mark process is a Gaussian process and the point process is a log-Gaussian Cox process, where the mark-point dependence is generated through the dependence between two latent Gaussian processes. Under this framework, naive local linear estimators ignoring the mark-point dependence can be severely biased. We show that this bias can be corrected using a local linear estimator of the cross-covariance function and establish uniform convergence rates of the bias-corrected estimators. Furthermore, we propose a test statistic based on local linear estimators for mark-point independence, which is shown to converge to an asymptotic normal distribution in a parametric n\sqrt{n}-convergence rate. Model diagnostics tools are developed for key model assumptions and a robust functional permutation test is proposed for a more general class of mark-point processes. The effectiveness of the proposed methods is demonstrated using extensive simulations and applications to two real data examples

    Group Network Hawkes Process

    Full text link
    In this work, we study the event occurrences of individuals interacting in a network. To characterize the dynamic interactions among the individuals, we propose a group network Hawkes process (GNHP) model whose network structure is observed and fixed. In particular, we introduce a latent group structure among individuals to account for the heterogeneous user-specific characteristics. A maximum likelihood approach is proposed to simultaneously cluster individuals in the network and estimate model parameters. A fast EM algorithm is subsequently developed by utilizing the branching representation of the proposed GNHP model. Theoretical properties of the resulting estimators of group memberships and model parameters are investigated under both settings when the number of latent groups GG is over-specified or correctly specified. A data-driven criterion that can consistently identify the true GG under mild conditions is derived. Extensive simulation studies and an application to a data set collected from Sina Weibo are used to illustrate the effectiveness of the proposed methodology.Comment: 35 page

    Second order semi-parametric inference for multivariate log Gaussian Cox processes

    Full text link
    This paper introduces a new approach to inferring the second order properties of a multivariate log Gaussian Cox process (LGCP) with a complex intensity function. We assume a semi-parametric model for the multivariate intensity function containing an unspecified complex factor common to all types of points. Given this model we exploit the availability of several types of points to construct a second-order conditional composite likelihood to infer the pair correlation and cross pair correlation functions of the LGCP. Crucially this likelihood does not depend on the unspecified part of the intensity function. We also introduce a cross validation method for model selection and an algorithm for regularized inference that can be used to obtain sparse models for cross pair correlation functions. The methodology is applied to simulated data as well as data examples from microscopy and criminology. This shows how the new approach outperforms existing alternatives where the intensity functions are estimated non-parametrically.Comment: 32 pages including appendi

    Reducing series resistance in Cu2ZnSn(S,Se)4 nanoparticle ink solar cells on flexible molybdenum foil substrates

    Get PDF
    Earth abundant Cu2ZnSnS4 nanoparticle inks were depostied on molybdenum foil substrates and subsequently converted to high quality thin film Cu2ZnSn(S,Se)4 photovoltaic absorbers. Integration of these absorbers within a thin film solar cell device structure yields a solar energy conversion efficiency which is comparable to identical devices processed on rigid glass substrates. Importantly, this is only achieved when a thin layer of molybdenum is first applied directly to the foil. The layer limits the formation of a thick Mo(S,Se)x layer resulting in a substantially reduced series resistance

    XGBoostPP:Tree-based Estimation of Point Process Intensity Functions

    Get PDF
    We propose a novel tree-based ensemble method, named XGBoostPP, to nonparametrically estimate the intensity of a point process as a function of covariates. It extends the use of gradient-boosted regression trees (Chen & Guestrin, 2016) to the point process literature via two carefully designed loss functions. The first loss is based on the Poisson likelihood, working for general point processes. The second loss is based on the weighted Poisson likelihood, where spatially dependent weights are introduced to further improve the estimation efficiency for clustered processes. An efficient greedy search algorithm is developed for model estimation, and the effectiveness of the proposed method is demonstrated through extensive simulation studies and two real data analyses. In particular, we report that XGBoostPP achieves superior performance to existing approaches when the dimension of the covariate space is high, revealing the advantages of tree-based ensemble methods in estimating complex intensity function

    Association of tissue lineage and gene expression: conservatively and differentially expressed genes define common and special functions of tissues

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Embryogenesis is the process by which the embryo is formed, develops, and establishes developmental hierarchies of tissues. The recent advance in microarray technology made it possible to investigate the tissue specific patterns of gene expression and their relationship with tissue lineages. This study is focused on how tissue specific functions, tissue lineage, and cell differentiation are correlated, which is essential to understand embryonic development and organism complexity.</p> <p>Results</p> <p>We performed individual gene and gene set based analysis on multiple tissue expression data, in association with the classic topology of mammalian fate maps of embryogenesis. For each sub-group of tissues on the fate map, conservatively, differentially and correlatively expressed genes or gene sets were identified. Tissue distance was found to correlate with gene expression divergence. Tissues of the ectoderm or mesoderm origins from the same segments on the fate map shared more similar expression pattern than those from different origins. Conservatively expressed genes or gene sets define common functions in a tissue group and are related to tissue specific diseases, which is supported by results from Gene Ontology and KEGG pathway analysis. Gene expression divergence is larger in certain human tissues than in the mouse homologous tissues.</p> <p>Conclusion</p> <p>The results from tissue lineage and gene expression analysis indicate that common function features of neighbor tissue groups were defined by the conservatively expressed genes and were related to tissue specific diseases, and differentially expressed genes contribute to the functional divergence of tissues. The difference of gene expression divergence in human and mouse homologous tissues reflected the organism complexity, i.e. distinct neural development levels and different body sizes.</p

    Modeling Social Media User Content Generation Using Interpretable Point Process Models

    Full text link
    In this article, we study the activity patterns of modern social media users on platforms such as Twitter and Facebook. To characterize the complex patterns we observe in users' interactions with social media, we describe a new class of point process models. The components in the model have straightforward interpretations and can thus provide meaningful insights into user activity patterns. A composite likelihood approach and a composite EM estimation procedure are developed to overcome the challenges that arise in parameter estimation. Using the proposed method, we analyze Donald Trump's Twitter data and study if and how his tweeting behavior evolved before, during and after the presidential campaign. Additionally, we analyze a large-scale social media data from Sina Weibo and identify interesting groups of users with distinct behaviors; in this analysis, we also discuss the effect of social ties on a user's online content generating behavior
    corecore