149 research outputs found

    Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

    Full text link
    We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.Comment: ECML PKDD 201

    Male mice song syntax depends on social contexts and influences female preferences.

    Get PDF
    In 2005, Holy and Guo advanced the idea that male mice produce ultrasonic vocalizations (USV) with some features similar to courtship songs of songbirds. Since then, studies showed that male mice emit USV songs in different contexts (sexual and other) and possess a multisyllabic repertoire. Debate still exists for and against plasticity in their vocalizations. But the use of a multisyllabic repertoire can increase potential flexibility and information, in how elements are organized and recombined, namely syntax. In many bird species, modulating song syntax has ethological relevance for sexual behavior and mate preferences. In this study we exposed adult male mice to different social contexts and developed a new approach of analyzing their USVs based on songbird syntax analysis. We found that male mice modify their syntax, including specific sequences, length of sequence, repertoire composition, and spectral features, according to stimulus and social context. Males emit longer and simpler syllables and sequences when singing to females, but more complex syllables and sequences in response to fresh female urine. Playback experiments show that the females prefer the complex songs over the simpler ones. We propose the complex songs are to lure females in, whereas the directed simpler sequences are used for direct courtship. These results suggest that although mice have a much more limited ability of song modification, they could still be used as animal models for understanding some vocal communication features that songbirds are used for

    Bayesian Gaussian Copula Factor Models for Mixed Data.

    Get PDF
    Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa

    Using quantile regression to investigate racial disparities in medication non-adherence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many studies have investigated racial/ethnic disparities in medication non-adherence in patients with type 2 diabetes using common measures such as medication possession ratio (MPR) or gaps between refills. All these measures including MPR are quasi-continuous and bounded and their distribution is usually skewed. Analysis of such measures using traditional regression methods that model mean changes in the dependent variable may fail to provide a full picture about differential patterns in non-adherence between groups.</p> <p>Methods</p> <p>A retrospective cohort of 11,272 veterans with type 2 diabetes was assembled from Veterans Administration datasets from April 1996 to May 2006. The main outcome measure was MPR with quantile cutoffs Q1-Q4 taking values of 0.4, 0.6, 0.8 and 0.9. Quantile-regression (QReg) was used to model the association between MPR and race/ethnicity after adjusting for covariates. Comparison was made with commonly used ordinary-least-squares (OLS) and generalized linear mixed models (GLMM).</p> <p>Results</p> <p>Quantile-regression showed that Non-Hispanic-Black (NHB) had statistically significantly lower MPR compared to Non-Hispanic-White (NHW) holding all other variables constant across all quantiles with estimates and p-values given as -3.4% (p = 0.11), -5.4% (p = 0.01), -3.1% (p = 0.001), and -2.00% (p = 0.001) for Q1 to Q4, respectively. Other racial/ethnic groups had lower adherence than NHW only in the lowest quantile (Q1) of about -6.3% (p = 0.003). In contrast, OLS and GLMM only showed differences in mean MPR between NHB and NHW while the mean MPR difference between other racial groups and NHW was not significant.</p> <p>Conclusion</p> <p>Quantile regression is recommended for analysis of data that are heterogeneous such that the tails and the central location of the conditional distributions vary differently with the covariates. QReg provides a comprehensive view of the relationships between independent and dependent variables (i.e. not just centrally but also in the tails of the conditional distribution of the dependent variable). Indeed, without performing QReg at different quantiles, an investigator would have no way of assessing whether a difference in these relationships might exist.</p

    Prevention of Neural-Tube Defects with Periconceptional Folic Acid, Methylfolate, or Multivitamins?

    Get PDF
    Background/Aims: To review the main results of intervention trials which showed the efficacy of periconceptional folic acid-containing multivitamin and folic acid supplementation in the prevention of neural-tube defects (NTD). Methods and Results: The main findings of 5 intervention trials are known: (i) the efficacy of a multivitamin containing 0.36 mg folic acid in a UK nonrandomized controlled trial resulted in an 83-91% reduction in NTD recurrence, while the results of the Hungarian (ii) randomized controlled trial and (iii) cohort-controlled trial using a multivitamin containing 0.8 mg folic acid showed 93 and 89% reductions in the first occurrence of NTD, respectively. On the other hand, (iv) another multicenter randomized controlled trial proved a 71% efficacy of 4 mg folic acid in the reduction of recurrent NTD, while (v) a public health-oriented Chinese-US trial showed a 41-79% reduction in the first occurrence of NTD depending on the incidence of NTD. Conclusions: Translational application of these findings could result in a breakthrough in the primary prevention of NTD, but so far this is not widely applied in practice. The benefits and drawbacks of 4 main possible uses of periconceptional folic acid/multivitamin supplementation, i.e. (i) dietary intake, (ii) periconceptional supplementation, (iii) flour fortification, and (iv) the recent attempt for the use of combination of oral contraceptives with 6S-5-methytetrahydrofolate (methylfolate), are discussed. Obviously, prevention of NTD is much better than the frequent elective termination of pregnancies after prenatal diagnosis of NTD fetuses

    Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

    Get PDF
    Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naĂŻve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naĂŻve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility

    Bayesian lasso binary quantile regression

    Get PDF
    In this paper, a Bayesian hierarchical model for variable selection and estimation in the context of binary quantile regression is proposed. Existing approaches to variable selection in a binary classification context are sensitive to outliers, heteroskedasticity or other anomalies of the latent response. The method proposed in this study overcomes these problems in an attractive and straightforward way. A Laplace likelihood and Laplace priors for the regression parameters are proposed and estimated with Bayesian Markov Chain Monte Carlo. The resulting model is equivalent to the frequentist lasso procedure. A conceptional result is that by doing so, the binary regression model is moved from a Gaussian to a full Laplacian framework without sacrificing much computational efficiency. In addition, an efficient Gibbs sampler to estimate the model parameters is proposed that is superior to the Metropolis algorithm that is used in previous studies on Bayesian binary quantile regression. Both the simulation studies and the real data analysis indicate that the proposed method performs well in comparison to the other methods. Moreover, as the base model is binary quantile regression, a much more detailed insight in the effects of the covariates is provided by the approach. An implementation of the lasso procedure for binary quantile regression models is available in the R-package bayesQR

    Bayesian mapping of pulmonary tuberculosis in Antananarivo, Madagascar

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Tuberculosis (TB), an infectious disease caused by the <it>Mycobacterium tuberculosis </it>is endemic in Madagascar. The capital, Antananarivo is the most seriously affected area. TB had a non-random spatial distribution in this setting, with clustering in the poorer areas. The aim of this study was to explore this pattern further by a Bayesian approach, and to measure the associations between the spatial variation of TB risk and national control program indicators for all neighbourhoods.</p> <p>Methods</p> <p>Combination of a Bayesian approach and a generalized linear mixed model (GLMM) was developed to produce smooth risk maps of TB and to model relationships between TB new cases and national TB control program indicators. The TB new cases were collected from records of the 16 Tuberculosis Diagnostic and Treatment Centres (DTC) of the city from 2004 to 2006. And five TB indicators were considered in the analysis: number of cases undergoing retreatment, number of patients with treatment failure and those suffering relapse after the completion of treatment, number of households with more than one case, number of patients lost to follow-up, and proximity to a DTC.</p> <p>Results</p> <p>In Antananarivo, 43.23% of the neighbourhoods had a standardized incidence ratio (SIR) above 1, of which 19.28% with a TB risk significantly higher than the average. Identified high TB risk areas were clustered and the distribution of TB was found to be associated mainly with the number of patients lost to follow-up (SIR: 1.10, CI 95%: 1.02-1.19) and the number of households with more than one case (SIR: 1.13, CI 95%: 1.03-1.24).</p> <p>Conclusion</p> <p>The spatial pattern of TB in Antananarivo and the contribution of national control program indicators to this pattern highlight the importance of the data recorded in the TB registry and the use of spatial approaches for assessing the epidemiological situation for TB. Including these variables into the model increases the reproducibility, as these data are already available for individual DTCs. These findings may also be useful for guiding decisions related to disease control strategies.</p
    • 

    corecore