Search CORE

149 research outputs found

Scalable Bayesian Non-Negative Tensor Factorization for Massive Count Data

Author: DB Dunson
EC Chi
G Heinrich
MD Hoffman
MI Jordan
O Cappé
TG Kolda
Publication venue
Publication date: 18/08/2015
Field of study

We present a Bayesian non-negative tensor factorization model for count-valued tensor data, and develop scalable inference algorithms (both batch and online) for dealing with massive tensors. Our generative model can handle overdispersed counts as well as infer the rank of the decomposition. Moreover, leveraging a reparameterization of the Poisson distribution as a multinomial facilitates conjugacy in the model and enables simple and efficient Gibbs sampling and variational Bayes (VB) inference updates, with a computational cost that only depends on the number of nonzeros in the tensor. The model also provides a nice interpretability for the factors; in our model, each factor corresponds to a "topic". We develop a set of online inference algorithms that allow further scaling up the model to massive tensors, for which batch inference methods may be infeasible. We apply our framework on diverse real-world applications, such as \emph{multiway} topic modeling on a scientific publications database, analyzing a political science data set, and analyzing a massive household transactions data set.Comment: ECML PKDD 201

arXiv.org e-Print Archive

Crossref

Male mice song syntax depends on social contexts and influences female preferences.

Author: Chabout J
Dunson DB
Jarvis ED
Sarkar A
Publication venue
Publication date: 01/01/2015
Field of study

In 2005, Holy and Guo advanced the idea that male mice produce ultrasonic vocalizations (USV) with some features similar to courtship songs of songbirds. Since then, studies showed that male mice emit USV songs in different contexts (sexual and other) and possess a multisyllabic repertoire. Debate still exists for and against plasticity in their vocalizations. But the use of a multisyllabic repertoire can increase potential flexibility and information, in how elements are organized and recombined, namely syntax. In many bird species, modulating song syntax has ethological relevance for sexual behavior and mate preferences. In this study we exposed adult male mice to different social contexts and developed a new approach of analyzing their USVs based on songbird syntax analysis. We found that male mice modify their syntax, including specific sequences, length of sequence, repertoire composition, and spectral features, according to stimulus and social context. Males emit longer and simpler syllables and sequences when singing to females, but more complex syllables and sequences in response to fresh female urine. Playback experiments show that the females prefer the complex songs over the simpler ones. We propose the complex songs are to lure females in, whereas the directed simpler sequences are used for direct courtship. These results suggest that although mice have a much more limited ability of song modification, they could still be used as animal models for understanding some vocal communication features that songbirds are used for

DukeSpace

Bayesian Gaussian Copula Factor Models for Mixed Data.

Author: Carin L
Dunson DB
Lucas JE
Murray JS
Publication venue
Publication date
Field of study

Gaussian factor models have proven widely useful for parsimoniously characterizing dependence in multivariate data. There is a rich literature on their extension to mixed categorical and continuous variables, using latent Gaussian variables or through generalized latent trait models acommodating measurements in the exponential family. However, when generalizing to non-Gaussian measured variables the latent variables typically influence both the dependence structure and the form of the marginal distributions, complicating interpretation and introducing artifacts. To address this problem we propose a novel class of Bayesian Gaussian copula factor models which decouple the latent factors from the marginal distributions. A semiparametric specification for the marginals based on the extended rank likelihood yields straightforward implementation and substantial computational gains. We provide new theoretical and empirical justifications for using this likelihood in Bayesian inference. We propose new default priors for the factor loadings and develop efficient parameter-expanded Gibbs sampling for posterior computation. The methods are evaluated through simulations and applied to a dataset in political science. The models in this paper are implemented in the R package bfa

DukeSpace

Using quantile regression to investigate racial disparities in medication non-adherence

Author: A Marshall
AM Peterson
AS Adams
C Chen
C Chen
Carrae Echols
Centers for Disease Control and Prevention
Cheryl P Lynch
DB Dunson
DB Dunson
DB Rubin
DR Miller
DT Lau
EP Box George
ES Huang
FE Harrell
Gregory E Gilbert
H Quan
JE Aikens
JE Kurlander
JK Kirk
K Madsen
KC Farmer
L Hao
Leonard E Egede
LM Hess
M Pladevall
Martina Mueller
Mulugeta Gebregziabher
P Diggle
P Holland
P McCullagh
P Royston
PJ Rousseeuw
PM Ho
R Balkrishnan
R Koenker
R Scott Leslie
RA Shenolikar
RP Hertz
RW Koenker
RW Koenker
S Karve
WC Lee
Y Rozenfeld
Y Yang
Yumin Zhao
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Many studies have investigated racial/ethnic disparities in medication non-adherence in patients with type 2 diabetes using common measures such as medication possession ratio (MPR) or gaps between refills. All these measures including MPR are quasi-continuous and bounded and their distribution is usually skewed. Analysis of such measures using traditional regression methods that model mean changes in the dependent variable may fail to provide a full picture about differential patterns in non-adherence between groups. Methods A retrospective cohort of 11,272 veterans with type 2 diabetes was assembled from Veterans Administration datasets from April 1996 to May 2006. The main outcome measure was MPR with quantile cutoffs Q1-Q4 taking values of 0.4, 0.6, 0.8 and 0.9. Quantile-regression (QReg) was used to model the association between MPR and race/ethnicity after adjusting for covariates. Comparison was made with commonly used ordinary-least-squares (OLS) and generalized linear mixed models (GLMM). Results Quantile-regression showed that Non-Hispanic-Black (NHB) had statistically significantly lower MPR compared to Non-Hispanic-White (NHW) holding all other variables constant across all quantiles with estimates and p-values given as -3.4% (p = 0.11), -5.4% (p = 0.01), -3.1% (p = 0.001), and -2.00% (p = 0.001) for Q1 to Q4, respectively. Other racial/ethnic groups had lower adherence than NHW only in the lowest quantile (Q1) of about -6.3% (p = 0.003). In contrast, OLS and GLMM only showed differences in mean MPR between NHB and NHW while the mean MPR difference between other racial groups and NHW was not significant. Conclusion Quantile regression is recommended for analysis of data that are heterogeneous such that the tails and the central location of the conditional distributions vary differently with the covariates. QReg provides a comprehensive view of the relationships between independent and dependent variables (i.e. not just centrally but also in the tails of the conditional distribution of the dependent variable). Indeed, without performing QReg at different quantiles, an investigator would have no way of assessing whether a difference in these relationships might exist.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations

Author: A Fritsch
A Jasra
C Yau
CE Antoniak
D Blackwell
David I. Hastie
DB Dunson
DB Dunson
DI Hastie
H Ishwaran
J Molitor
J Pitman
J Sethuraman
JL Bigelow
M Kalli
M Papathomas
M Papathomas
MD Escobar
S Jain
S Jain
S Richardson
SG Walker
Silvia Liverani
Sylvia Richardson
TS Ferguson
Y Ulker
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Prevention of Neural-Tube Defects with Periconceptional Folic Acid, Methylfolate, or Multivitamins?

Author: Andrew E. Czeizel
Berry RJ
Brämswig S
Busby A
Cronin M
Cuskelly GJ
Czeizel AE
Czeizel AE
Czeizel AE
Czeizel AE
Daily LE
De Wals P
Dunson DB
Fenech M
Ferenc Bánhidy
Grosse SD
Hoffbrand AV
Holzgreve W
István Dudás
Lamers Y
László Paput
McPartlin J
Nevin NC
Oakley D
Oakley GP
Pietrzik K
Pietrzik K
Prinz-Langenohl R
Rosenberg MJ
Smithells RW
Szarevsky A
Valek A
Wald NJ
Wilcken B
Williams LJ
Wills L
Wright AJ
Publication venue: 'S. Karger AG'
Publication date: 18/12/2014
Field of study

Background/Aims: To review the main results of intervention trials which showed the efficacy of periconceptional folic acid-containing multivitamin and folic acid supplementation in the prevention of neural-tube defects (NTD). Methods and Results: The main findings of 5 intervention trials are known: (i) the efficacy of a multivitamin containing 0.36 mg folic acid in a UK nonrandomized controlled trial resulted in an 83-91% reduction in NTD recurrence, while the results of the Hungarian (ii) randomized controlled trial and (iii) cohort-controlled trial using a multivitamin containing 0.8 mg folic acid showed 93 and 89% reductions in the first occurrence of NTD, respectively. On the other hand, (iv) another multicenter randomized controlled trial proved a 71% efficacy of 4 mg folic acid in the reduction of recurrent NTD, while (v) a public health-oriented Chinese-US trial showed a 41-79% reduction in the first occurrence of NTD depending on the incidence of NTD. Conclusions: Translational application of these findings could result in a breakthrough in the primary prevention of NTD, but so far this is not widely applied in practice. The benefits and drawbacks of 4 main possible uses of periconceptional folic acid/multivitamin supplementation, i.e. (i) dietary intake, (ii) periconceptional supplementation, (iii) flour fortification, and (iv) the recent attempt for the use of combination of oral contraceptives with 6S-5-methytetrahydrofolate (methylfolate), are discussed. Obviously, prevention of NTD is much better than the frequent elective termination of pregnancies after prenatal diagnosis of NTD fetuses

Crossref

Semmelweis Repository

Bayesian Inference for Genomic Data Integration Reduces Misclassification Rate in Predicting Protein-Protein Interactions

Author: A Elefsinioti
A Valencia
AJ Enright
AK Ramani
AL Hopkins
BA Shoemaker
C von Mering
C von Mering
CC Wu
Christos A. Ouzounis
Chuanhua Xing
CS Goh
David B. Dunson
DB Dunson
DR Rhodes
EC Butcher
EM Marcotte
F Browne
F Pazos
GT Hart
H Huang
H Ishwaran
H Yu
I Lee
IW Taylor
J Saric
J Sun
JS Bader
L Hakes
L Hood
L Lu
LJ Jensen
LJ Lu
LV Zhang
M Huang
M Persico
MA Yildirim
MP Brown
MS Scott
N Lin
OG Troyanskaya
P Aloy
P Bork
P Pagel
P Sham
R Chowdhary
R Jansen
R Malik
R Mrowka
S Dolma
S Kim
S Tsoka
SV Date
Y Qi
Y Qi
Publication venue: Public Library of Science
Publication date: 01/07/2011
Field of study

Protein-protein interactions (PPIs) are essential to most fundamental cellular processes. There has been increasing interest in reconstructing PPIs networks. However, several critical difficulties exist in obtaining reliable predictions. Noticeably, false positive rates can be as high as >80%. Error correction from each generating source can be both time-consuming and inefficient due to the difficulty of covering the errors from multiple levels of data processing procedures within a single test. We propose a novel Bayesian integration method, deemed nonparametric Bayes ensemble learning (NBEL), to lower the misclassification rate (both false positives and negatives) through automatically up-weighting data sources that are most informative, while down-weighting less informative and biased sources. Extensive studies indicate that NBEL is significantly more robust than the classic naïve Bayes to unreliable, error-prone and contaminated data. On a large human data set our NBEL approach predicts many more PPIs than naïve Bayes. This suggests that previous studies may have large numbers of not only false positives but also false negatives. The validation on two human PPIs datasets having high quality supports our observations. Our experiments demonstrate that it is feasible to predict high-throughput PPIs computationally with substantially reduced false positives and false negatives. The ability of predicting large numbers of PPIs both reliably and automatically may inspire people to use computational approaches to correct data errors in general, and may speed up PPIs prediction with high quality. Such a reliable prediction may provide a solid platform to other studies such as protein functions prediction and roles of PPIs in disease susceptibility

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Bayesian lasso binary quantile regression

Author: Dries F. Benoit
Rahim Alhamzawi
Keming Yu
R Alhamzawi
DF Andrews
OE Barndorff-Nielsen
DB Dunson
K Florios
F Hoti
R Koenker
R Koenker
R Koenker
RW Koenker
R Koenker
G Kordas
H Kozumi
T Lancaster
Q Li
Y Li
CF Manski
CF Manski
JR Michael
T Park
W Sun
R Tibshirani
EG Tsionas
H Wang
Y Wu
N Yi
K Yu
K Yu
K Yu
S Zheng
H Zou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In this paper, a Bayesian hierarchical model for variable selection and estimation in the context of binary quantile regression is proposed. Existing approaches to variable selection in a binary classification context are sensitive to outliers, heteroskedasticity or other anomalies of the latent response. The method proposed in this study overcomes these problems in an attractive and straightforward way. A Laplace likelihood and Laplace priors for the regression parameters are proposed and estimated with Bayesian Markov Chain Monte Carlo. The resulting model is equivalent to the frequentist lasso procedure. A conceptional result is that by doing so, the binary regression model is moved from a Gaussian to a full Laplacian framework without sacrificing much computational efficiency. In addition, an efficient Gibbs sampler to estimate the model parameters is proposed that is superior to the Metropolis algorithm that is used in previous studies on Bayesian binary quantile regression. Both the simulation studies and the real data analysis indicate that the proposed method performs well in comparison to the other methods. Moreover, as the base model is binary quantile regression, a much more detailed insight in the effects of the covariates is provided by the approach. An implementation of the lasso procedure for binary quantile regression models is available in the R-package bayesQR

CiteSeerX

Crossref

Ghent University Academic Bibliography

HKU Scholars Hub

Bayesian semiparametric modeling for stochastic precedence, with applications in epidemiology and survival analysis

Author: A Erkanli
A Gelfand
A Kottas
A Kottas
A Kottas
AE Gelfand
AE Gelfand
AE Gelfand
Athanasios Kottas
CE Antoniak
D Bamber
D Blackwell
DB Dunson
E Arjas
G Karabatsos
J Sethuraman
JP Klein
M Evans
M Shaked
MA Arcones
MD Escobar
MS Pepe
MT Collins
P Müller
PD Hoff
RM Neal
TE Hanson
TE Hanson
TS Ferguson
Z Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Bayesian mapping of pulmonary tuberculosis in Antananarivo, Madagascar

Author: A Gemperli
A Odoi
A Odoi
A Senkowski
AB Lawson
AC Clements
AC Clements
AC Clements
C Nunes
CJL Murray
D Onozuka
D Spiegelhalter
DB Dunson
DJ Bicout
Dominique J Bicout
Fanjasoa Rakotomanana
G Aamodt
G Auregan
GA Tipple
GD Johnson
GJ Yang
Ministère de la Santé et du Planning Familial Madagascar
P Grassberger
Philippe Sabatier
R Bergamaschi
R Dufournet
Rindra V Randremanana
RV Randremanana
S Richardson
T Tango
Vincent Richard
WV Souza
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Tuberculosis (TB), an infectious disease caused by the <it>Mycobacterium tuberculosis </it>is endemic in Madagascar. The capital, Antananarivo is the most seriously affected area. TB had a non-random spatial distribution in this setting, with clustering in the poorer areas. The aim of this study was to explore this pattern further by a Bayesian approach, and to measure the associations between the spatial variation of TB risk and national control program indicators for all neighbourhoods. Methods Combination of a Bayesian approach and a generalized linear mixed model (GLMM) was developed to produce smooth risk maps of TB and to model relationships between TB new cases and national TB control program indicators. The TB new cases were collected from records of the 16 Tuberculosis Diagnostic and Treatment Centres (DTC) of the city from 2004 to 2006. And five TB indicators were considered in the analysis: number of cases undergoing retreatment, number of patients with treatment failure and those suffering relapse after the completion of treatment, number of households with more than one case, number of patients lost to follow-up, and proximity to a DTC. Results In Antananarivo, 43.23% of the neighbourhoods had a standardized incidence ratio (SIR) above 1, of which 19.28% with a TB risk significantly higher than the average. Identified high TB risk areas were clustered and the distribution of TB was found to be associated mainly with the number of patients lost to follow-up (SIR: 1.10, CI 95%: 1.02-1.19) and the number of households with more than one case (SIR: 1.13, CI 95%: 1.03-1.24). Conclusion The spatial pattern of TB in Antananarivo and the contribution of national control program indicators to this pattern highlight the importance of the data recorded in the TB registry and the use of spatial approaches for assessing the epidemiological situation for TB. Including these variables into the model increases the reproducibility, as these data are already available for individual DTCs. These findings may also be useful for guiding decisions related to disease control strategies.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central