Search CORE

1,191 research outputs found

Global consensus Monte Carlo

Author: Johansen Adam M.
Lee Anthony
Rendell Lewis J.
Whiteley Nick
Publication venue
Publication date: 07/04/2020
Field of study

To conduct Bayesian inference with large data sets, it is often convenient or necessary to distribute the data across multiple machines. We consider a likelihood function expressed as a product of terms, each associated with a subset of the data. Inspired by global variable consensus optimisation, we introduce an instrumental hierarchical model associating auxiliary statistical parameters with each term, which are conditionally independent given the top-level parameters. One of these top-level parameters controls the unconditional strength of association between the auxiliary parameters. This model leads to a distributed MCMC algorithm on an extended state space yielding approximations of posterior expectations. A trade-off between computational tractability and fidelity to the original model can be controlled by changing the association strength in the instrumental model. We further propose the use of a SMC sampler with a sequence of association strengths, allowing both the automatic determination of appropriate strengths and for a bias correction technique to be applied. In contrast to similar distributed Monte Carlo algorithms, this approach requires few distributional assumptions. The performance of the algorithms is illustrated with a number of simulated examples

arXiv.org e-Print Archive

Warwick Research Archives Portal Repository

Explore Bristol Research

Challenges of Big Data Analysis

Author: Fan Jianqing
Han Fang
Liu Han
Publication venue: 'Oxford University Press (OUP)'
Publication date: 05/02/2014
Field of study

Big Data bring new opportunities to modern society and challenges to data scientists. On one hand, Big Data hold great promises for discovering subtle population patterns and heterogeneities that are not possible with small-scale data. On the other hand, the massive sample size and high dimensionality of Big Data introduce unique computational and statistical challenges, including scalability and storage bottleneck, noise accumulation, spurious correlation, incidental endogeneity, and measurement errors. These challenges are distinguished and require new computational and statistical paradigm. This article give overviews on the salient features of Big Data and how these features impact on paradigm change on statistical and computational methods as well as computing architectures. We also provide various new perspectives on the Big Data analysis and computation. In particular, we emphasis on the viability of the sparsest solution in high-confidence set and point out that exogeneous assumptions in most statistical methods for Big Data can not be validated due to incidental endogeneity. They can lead to wrong statistical inferences and consequently wrong scientific conclusions

arXiv.org e-Print Archive

CiteSeerX

Princeton University Open Access Repository

Crossref

PubMed Central

A Survey of Bayesian Statistical Approaches for Big Data

Author: A Akusok
A Baldominos
A Belle
A Beskos
A Bouchard-Côté
A De Mauro
A Fahad
A Gandomi
A Lee
A Lee
A Marshall
A O’Driscoll
A Siddiqa
A Vyas
AB Owen
AF Wise
AR Linero
AT Azar
AT Porter
AT Porter
AÇ Pehlivanlı
B Franke
B Liquet
B Liu
B Oancea
C Loebbecke
C Wang
C Wang
C Yang
CA McGrory
CC Drovandi
CE Rasmussen
Changwon Yoo
CK Emani
D Apiletti
D Oprea
D Talia
DB Dunson
DM Blei
DN Politis
DT Frazier
DV Shah
DW Bates
E Raguseo
ED Schifano
ET Bradlow
F Lindsten
Florian Buettner
Florian Maire
G Bello-Orgaz
G Jifa
GI Allen
GJ Lasinio
GM Allenby
H Cai
H Demirkan
H Hassani
H Kousar
HA Chipman
HH Huang
HJ Watson
I Ben-Gal
J Fan
J Roski
J Zhu
Jake Luo
JE Bibault
JJ Chen
JN Cappella
JS Rumsfeld
K Chalupka
Kath Albury
KL Mengersen
KS Divya
L Breiman
L Liu
L Mählmann
L Wang
L Yu
L Zhang
L Zhou
LG Nongxa
M Hilbert
M Viceconti
MA Suchard
Matias Quiroz
MD Assunção
MD Hoffman
MT Moores
N Moustafa
N. Chopin
NA Lazar
O Sysoev
Oliver Müller
OY Al-Jarrah
P Ducange
P Müller
P Pudlo
PF Brennan
R Bardenet
R Burrows
R Guhaniyogi
R Guhaniyogi
R Guhaniyogi
R Izbicki
RF Mansour
Richard Branch
Robin Genuer
RW Hoerl
S Atkinson
S Castruccio
S Chaudhuri
S Fosso Wamba
S Guha
S Kaisler
S Li
S Minsker
S Pandey
S Sagiroglu
S Sisson
S Srivastava
S Suthaharan
S White
SF Wamba
Shahriar Akter
Shweta Bansal
Simon I. Hay
SL Scott
SM Schennach
Sudipto Banerjee
T Magdon-Ismail
T Zhang
Tengyao Wang
TH McCormick
TJ McKinley
U Sivarajah
VD Katkar
X Zhang
XF Wang
XG Xia
Xing Ju Lee
Y Tang
Y Webb-Vargas
Y Zhang
Yang Ni
YW Teh
Z Ma
Z Sun
Z Zhang
Ziad Obermeyer
Zoubin Ghahramani
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 28/05/2020
Field of study

The modern era is characterised as an era of information or Big Data. This has motivated a huge literature on new methods for extracting information and insights from these data. A natural question is how these approaches differ from those that were available prior to the advent of Big Data. We present a review of published studies that present Bayesian statistical approaches specifically for Big Data and discuss the reported and perceived benefits of these approaches. We conclude by addressing the question of whether focusing only on improving computational algorithms and infrastructure will be enough to face the challenges of Big Data

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Machine Learning and Integrative Analysis of Biomedical Big Data.

Author: Choi Howard
Chung Neo Christopher
Mirza Bilal
Ping Peipei
Wang Jie
Wang Wei
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

Multidisciplinary Digital Publishing Institute

Ezid

Directory of Open Access Journals

eScholarship - University of California