93 research outputs found

    Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library

    Full text link
    Remote data access for data analysis in high performance computing is commonly done with specialized data access protocols and storage systems. These protocols are highly optimized for high throughput on very large datasets, multi-streams, high availability, low latency and efficient parallel I/O. The purpose of this paper is to describe how we have adapted a generic protocol, the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative for high performance I/O and data analysis applications in a global computing grid: the Worldwide LHC Computing Grid. In this work, we first analyze the design differences between the HTTP protocol and the most common high performance I/O protocols, pointing out the main performance weaknesses of HTTP. Then, we describe in detail how we solved these issues. Our solutions have been implemented in a toolkit called davix, available through several recent Linux distributions. Finally, we describe the results of our benchmarks where we compare the performance of davix against a HPC specific protocol for a data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho

    A machine learning case–control classifier for schizophrenia based on DNA methylation in blood

    Get PDF
    Epigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a “risk distance” to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson r = 0.28, P = 1.28 × 10−12), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants

    Ronin Governs Early Heart Development by Controlling Core Gene Expression Programs.

    Get PDF
    Ronin (THAP11), a DNA-binding protein that evolved from a primordial DNA transposon by molecular domestication, recognizes a hyperconserved promoter sequence to control developmentally and metabolically essential genes in pluripotent stem cells. However, it remains unclear whether Ronin or related THAP proteins perform similar functions in development. Here, we present evidence that Ronin functions within the nascent heart as it arises from the mesoderm and forms a four-chambered organ. We show that Ronin is vital for cardiogenesis during midgestation by controlling a set of critical genes. The activity of Ronin coincided with the recruitment of its cofactor, Hcf-1, and the elevation of H3K4me3 levels at specific target genes, suggesting the involvement of an epigenetic mechanism. On the strength of these findings, we propose that Ronin activity during cardiogenesis offers a template to understand how important gene programs are sustained across different cell types within a developing organ such as the heart

    Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random Satisfiability problem, and its application to stop-and-restart resolutions

    Full text link
    A large deviation analysis of the solving complexity of random 3-Satisfiability instances slightly below threshold is presented. While finding a solution for such instances demands an exponential effort with high probability, we show that an exponentially small fraction of resolutions require a computation scaling linearly in the size of the instance only. This exponentially small probability of easy resolutions is analytically calculated, and the corresponding exponent shown to be smaller (in absolute value) than the growth exponent of the typical resolution time. Our study therefore gives some theoretical basis to heuristic stop-and-restart solving procedures, and suggests a natural cut-off (the size of the instance) for the restart.Comment: Revtex file, 4 figure

    Functional annotation of the human brain methylome identifies tissue-specific epigenetic variation across brain and blood

    Get PDF
    notes: PMCID: PMC3446315© 2012 Davies et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Dynamic changes to the epigenome play a critical role in establishing and maintaining cellular phenotype during differentiation, but little is known about the normal methylomic differences that occur between functionally distinct areas of the brain. We characterized intra- and inter-individual methylomic variation across whole blood and multiple regions of the brain from multiple donors

    ReadDepth: A Parallel R Package for Detecting Copy Number Alterations from Short Sequencing Reads

    Get PDF
    Copy number alterations are important contributors to many genetic diseases, including cancer. We present the readDepth package for R, which can detect these aberrations by measuring the depth of coverage obtained by massively parallel sequencing of the genome. In addition to achieving higher accuracy than existing packages, our tool runs much faster by utilizing multi-core architectures to parallelize the processing of these large data sets. In contrast to other published methods, readDepth does not require the sequencing of a reference sample, and uses a robust statistical model that accounts for overdispersed data. It includes a method for effectively increasing the resolution obtained from low-coverage experiments by utilizing breakpoint information from paired end sequencing to do positional refinement. We also demonstrate a method for inferring copy number using reads generated by whole-genome bisulfite sequencing, thus enabling integrative study of epigenomic and copy number alterations. Finally, we apply this tool to two genomes, showing that it performs well on genomes sequenced to both low and high coverage. The readDepth package runs on Linux and MacOSX, is released under the Apache 2.0 license, and is available at http://code.google.com/p/readdepth/

    Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications.

    Get PDF
    Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression

    Identification of Genes and Pathways Regulated by Lamin A in Heart

    Get PDF
    Background Mutations in the LMNA gene, encoding LMNA (lamin A/C), causes distinct disorders, including dilated cardiomyopathies, collectively referred to as laminopathies. The genes (coding and noncoding) and regulatory pathways controlled by LMNA in the heart are not completely defined. Methods and Results We analyzed cardiac transcriptome from wild-type, loss-of-function (Lmna-/-), and gain-of-function (Lmna-/- injected with adeno-associated virus serotype 9 expressing LMNA) mice with normal cardiac function. Deletion of Lmna (Lmna-/-) led to differential expression of 2193 coding and 629 long noncoding RNA genes in the heart (q<0.05). Re-expression of LMNA in the Lmna-/- mouse heart, completely rescued 501 coding and 208 non-coding and partially rescued 1862 coding and 607 lncRNA genes. Pathway analysis of differentially expressed genes predicted activation of transcriptional regulators lysine-specific demethylase 5A, lysine-specific demethylase 5B, tumor protein 53, and suppression of retinoblastoma 1, paired-like homeodomain 2, and melanocyte-inducing transcription factor, which were completely or partially rescued upon reexpression of LMNA. Furthermore, lysine-specific demethylase 5A and 5B protein levels were increased in the Lmna-/- hearts and were partially rescued upon LMNA reexpression. Analysis of biological function for rescued genes identified activation of tumor necrosis factor-α, epithelial to mesenchymal transition, and suppression of the oxidative phosphorylation pathway upon Lmna deletion and their restoration upon LMNA reintroduction in the heart. Restoration of the gene expression and transcriptional regulators in the heart was associated with improved cardiac function and increased survival of the Lmna-/- mice. Conclusions The findings identify LMNA-regulated cardiac genes and their upstream transcriptional regulators in the heart and implicate lysine-specific demethylase 5A and B as epigenetic regulators of a subset of the dysregulated genes in laminopathies
    corecore