2,875 research outputs found

    Particle MCMC algorithms and architectures for accelerating inference in state-space models.

    Get PDF
    Particle Markov Chain Monte Carlo (pMCMC) is a stochastic algorithm designed to generate samples from a probability distribution, when the density of the distribution does not admit a closed form expression. pMCMC is most commonly used to sample from the Bayesian posterior distribution in State-Space Models (SSMs), a class of probabilistic models used in numerous scientific applications. Nevertheless, this task is prohibitive when dealing with complex SSMs with massive data, due to the high computational cost of pMCMC and its poor performance when the posterior exhibits multi-modality. This paper aims to address both issues by: 1) Proposing a novel pMCMC algorithm (denoted ppMCMC), which uses multiple Markov chains (instead of the one used by pMCMC) to improve sampling efficiency for multi-modal posteriors, 2) Introducing custom, parallel hardware architectures, which are tailored for pMCMC and ppMCMC. The architectures are implemented on Field Programmable Gate Arrays (FPGAs), a type of hardware accelerator with massive parallelization capabilities. The new algorithm and the two FPGA architectures are evaluated using a large-scale case study from genetics. Results indicate that ppMCMC achieves 1.96x higher sampling efficiency than pMCMC when using sequential CPU implementations. The FPGA architecture of pMCMC is 12.1x and 10.1x faster than state-of-the-art, parallel CPU and GPU implementations of pMCMC and up to 53x more energy efficient; the FPGA architecture of ppMCMC increases these speedups to 34.9x and 41.8x respectively and is 173x more power efficient, bringing previously intractable SSM-based data analyses within reach.The authors would like to thank the Wellcome Trust (Grant reference 097816/Z/11/A) and the EPSRC (Grant reference EP/I012036/1) for the financial support given to this research project

    A Provable Smoothing Approach for High Dimensional Generalized Regression with Applications in Genomics

    Get PDF
    In many applications, linear models fit the data poorly. This article studies an appealing alternative, the generalized regression model. This model only assumes that there exists an unknown monotonically increasing link function connecting the response YY to a single index XTβ∗X^T\beta^* of explanatory variables X∈RdX\in\mathbb{R}^d. The generalized regression model is flexible and covers many widely used statistical models. It fits the data generating mechanisms well in many real problems, which makes it useful in a variety of applications where regression models are regularly employed. In low dimensions, rank-based M-estimators are recommended to deal with the generalized regression model, giving root-nn consistent estimators of β∗\beta^*. Applications of these estimators to high dimensional data, however, are questionable. This article studies, both theoretically and practically, a simple yet powerful smoothing approach to handle the high dimensional generalized regression model. Theoretically, a family of smoothing functions is provided, and the amount of smoothing necessary for efficient inference is carefully calculated. Practically, our study is motivated by an important and challenging scientific problem: decoding gene regulation by predicting transcription factors that bind to cis-regulatory elements. Applying our proposed method to this problem shows substantial improvement over the state-of-the-art alternative in real data.Comment: 53 page

    Extracting transcription factor binding sites from unaligned gene sequences with statistical models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcription factor binding sites (TFBSs) are crucial in the regulation of gene transcription. Recently, chromatin immunoprecipitation followed by cDNA microarray hybridization (ChIP-chip array) has been used to identify potential regulatory sequences, but the procedure can only map the probable protein-DNA interaction loci within 1–2 kb resolution. To find out the exact binding motifs, it is necessary to build a computational method to examine the ChIP-chip array binding sequences and search for possible motifs representing the transcription factor binding sites.</p> <p>Results</p> <p>We developed a program to find out accurate motif sites from a set of unaligned DNA sequences in the yeast genome. Compared with MDscan, the prediction results suggest that, overall, our algorithm outperforms MDscan since the predicted motifs are more consistent with previously known specificities reported in the literature and have better prediction ranks. Our program also outperforms the constraint-less Cosmo program, especially in the elimination of false positives.</p> <p>Conclusion</p> <p>In this study, an improved sampling algorithm is proposed to incorporate the binomial probability model to build significant initial candidate motif sets. By investigating the statistical dependence between base positions in TFBSs, the method of dependency graphs and their expanded Bayesian networks is combined. The results show that our program satisfactorily extract transcription factor binding sites from unaligned gene sequences.</p

    Gene regulation and epigenotype in Friedreich's ataxia

    Get PDF
    Friedreich??????s ataxia (FRDA) is known to be provoked by an abnormal GAA-repeat expansion located in the first intron of the FXN gene. As a result of the GAA expansion, patients exhibit low levels of FXN mRNA, leading to FRDA. Here, via chromatin immunoprecipitation (ChIP), the presence of a RNA pol II transcriptional pausing site at exon 1 of the FXN gene was demonstrated. At this site, FRDA EBVcell lines exhibited elevated levels of the negative elongation factor NELF-E depending on the presence of a GAA repeat expansion compared to controls. This site may represent a rate-limiting step for FXN transcription and consequently provide a means to modify transcription levels in FRDA. Moreover, RNA pol II pausing site binding factors, such as NELF-E, were influenced by Nicotinamide treatment, a HDAC class III inhibitor. Therefore, factors sensitive to chromatin changes may influence the regulation of RNA pol II pausing and also balance otherwise positive chromatin changes. This new finding could explain the relatively minor effects of different drug approaches to up-regulate this gene. Furthermore, CTCF and the histone demethylase LSD1 were also found to be located at the FXN pausing site. Results suggest a function for LSD1 in demethylating H3K4me2 at the pausing site and potentially also in demethylating H3K9me3 in the case of frequently transcribed expanded GAA repeats. Therefore, LSD1 might play a crucial role in preventing heterochromatinisation of a euchromatic gene. Using primary transcript RNA-FISH, a delay in RNA pol II release from the pausing site and furthermore a dramatic loss of RNA pol II elongation in the presence of expanded GAA repeats was seen. The identified and characterised transcriptional pausing site at FXN is likely to play a repressive role and participates in the pathogenesis of FRDA.Imperial Users onl
    • …
    corecore