150 research outputs found

    Detection of regulator genes and eQTLs in gene networks

    Full text link
    Genetic differences between individuals associated to quantitative phenotypic traits, including disease states, are usually found in non-coding genomic regions. These genetic variants are often also associated to differences in expression levels of nearby genes (they are "expression quantitative trait loci" or eQTLs for short) and presumably play a gene regulatory role, affecting the status of molecular networks of interacting genes, proteins and metabolites. Computational systems biology approaches to reconstruct causal gene networks from large-scale omics data have therefore become essential to understand the structure of networks controlled by eQTLs together with other regulatory genes, and to generate detailed hypotheses about the molecular mechanisms that lead from genotype to phenotype. Here we review the main analytical methods and softwares to identify eQTLs and their associated genes, to reconstruct co-expression networks and modules, to reconstruct causal Bayesian gene and module networks, and to validate predicted networks in silico.Comment: minor revision with typos corrected; review article; 24 pages, 2 figure

    Faster inference from state space models via GPU computing

    Get PDF
    Funding: C.F.-J. is funded via a doctoral scholarship from the University of St Andrews, School of Mathematics and Statistics.Inexpensive Graphics Processing Units (GPUs) offer the potential to greatly speed up computation by employing their massively parallel architecture to perform arithmetic operations more efficiently. Population dynamics models are important tools in ecology and conservation. Modern Bayesian approaches allow biologically realistic models to be constructed and fitted to multiple data sources in an integrated modelling framework based on a class of statistical models called state space models. However, model fitting is often slow, requiring hours to weeks of computation. We demonstrate the benefits of GPU computing using a model for the population dynamics of British grey seals, fitted with a particle Markov chain Monte Carlo algorithm. Speed-ups of two orders of magnitude were obtained for estimations of the log-likelihood, compared to a traditional ‘CPU-only’ implementation, allowing for an accurate method of inference to be used where this was previously too computationally expensive to be viable. GPU computing has enormous potential, but one barrier to further adoption is a steep learning curve, due to GPUs' unique hardware architecture. We provide a detailed description of hardware and software setup, and our case study provides a template for other similar applications. We also provide a detailed tutorial-style description of GPU hardware architectures, and examples of important GPU-specific programming practices.Publisher PDFPeer reviewe

    Statistical Methods in Integrative Genomics

    Get PDF
    Statistical methods in integrative genomics aim to answer important biology questions by jointly analyzing multiple types of genomic data (vertical integration) or aggregating the same type of data across multiple studies (horizontal integration). In this article, we introduce different types of genomic data and data resources, and then review statistical methods of integrative genomics, with emphasis on the motivation and rationale of these methods. We conclude with some summary points and future research directions

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    An Adaptive Lightweight Security Framework Suited for IoT

    Get PDF
    Standard security systems are widely implemented in the industry. These systems consume considerable computational resources. Devices in the Internet of Things [IoT] are very limited with processing capacity, memory and storage. Therefore, existing security systems are not applicable for IoT. To cope with it, we propose downsizing of existing security processes. In this chapter, we describe three areas, where we reduce the required storage space and processing power. The first is the classification process required for ongoing anomaly detection, whereby values accepted or generated by a sensor are classified as valid or abnormal. We collect historic data and analyze it using machine learning techniques to draw a contour, where all streaming values are expected to fall within the contour space. Hence, the detailed collected data from the sensors are no longer required for real-time anomaly detection. The second area involves the implementation of the Random Forest algorithm to apply distributed and parallel processing for anomaly discovery. The third area is downsizing cryptography calculations, to fit IoT limitations without compromising security. For each area, we present experimental results supporting our approach and implementation

    A multibranch, multitarget neural network for rapid point-source inversion in a microseismic environment: examples from the Hengill Geothermal Field, Iceland

    Get PDF
    Despite advanced seismological techniques, automatic source characterization for microseismic earthquakes remains difficult and challenging since current inversion and modelling of high-frequency signals are complex and time consuming. For real-time applications such as induced seismicity monitoring, the application of standard methods is often not fast enough for true complete real-time information on seismic sources. In this paper, we present an alternative approach based on recent advances in deep learning for rapid source-parameter estimation of microseismic earthquakes. The seismic inversion is represented in compact form by two convolutional neural networks, with individual feature extraction, and a fully connected neural network, for feature aggregation, to simultaneously obtain full moment tensor and spatial location of microseismic sources. Specifically, a multibranch neural network algorithm is trained to encapsulate the information about the relationship between seismic waveforms and underlying point-source mechanisms and locations. The learning-based model allows rapid inversion (within a fraction of second) once input data are available. A key advantage of the algorithm is that it can be trained using synthetic seismic data only, so it is directly applicable to scenarios where there are insufficient real data for training. Moreover, we find that the method is robust with respect to perturbations such as observational noise and data incompleteness (missing stations). We apply the new approach on synthesized and example recorded small magnitude (M <= 1.6) earthquakes at the Hellisheioi geothermal field in the Hengill area, Iceland. For the examined events, the model achieves excellent performance and shows very good agreement with the inverted solutions determined through standard methodology. In this study, we seek to demonstrate that this approach is viable for microseismicity real-time estimation of source parameters and can be integrated into advanced decision-support tools for controlling induced seismicity

    Simulated Annealing

    Get PDF
    The book contains 15 chapters presenting recent contributions of top researchers working with Simulated Annealing (SA). Although it represents a small sample of the research activity on SA, the book will certainly serve as a valuable tool for researchers interested in getting involved in this multidisciplinary field. In fact, one of the salient features is that the book is highly multidisciplinary in terms of application areas since it assembles experts from the fields of Biology, Telecommunications, Geology, Electronics and Medicine

    Bayesian Federated Learning in Predictive Space

    Get PDF
    Federated Learning (FL) involves training a model over a dataset distributed among clients, with the constraint that each client's data is private. This paradigm is useful in settings where different entities own different training points, such as when training on data stored on multiple edge devices. Within this setting, small and noisy datasets are common, which highlights the need for well-calibrated models which are able to represent the uncertainty in their predictions. Alongside this, two other important goals for a practical FL algorithm are 1) that it has low communication costs, operating over only a few rounds of communication, and 2) that it achieves good performance when client datasets are distributed differently from each other (are heterogeneous). Among existing FL techniques, the closest to achieving such goals include Bayesian FL methods which collect parameter samples from local posteriors, and aggregate them to approximate the global posterior. These provide uncertainty estimates, more naturally handle data heterogeneity owing to their Bayesian nature, and can operate in a single round of communication. Of these techniques, many make inaccurate approximations to the high-dimensional posterior over parameters which in turn negatively effects their uncertainty estimates. A Bayesian technique known as the ``Bayesian Committee Machine" (BCM), originally introduced outside the FL context, remedies some of these issues by aggregating the Bayesian posteriors in the lower dimensional predictive space instead. The BCM, in its original form, is impractical for FL due to requiring a large ensemble for inference. We first argue that it is well-suited for heterogeneous FL, then propose a modification to the BCM algorithm, involving distillation, to make it practical for FL. We demonstrate that this modified method outperforms other techniques as heterogeneity increases. We then demonstrate theoretical issues with the calibration of the BCM, namely that it is systematically overconfident. We remedy this by proposing β-Predictive Bayes, a Bayesian FL algorithm which performs a modified aggregation of the local predictive posteriors, using a tunable parameter β. β is tuned to improve the global model's calibration, before it is distilled. We empirically evaluate this method on a number of regression and classification datasets to demonstrate that it generally better calibrated than other baselines, over a range of heterogeneous data partitions
    corecore