313 research outputs found
Efficient HTTP based I/O on very large datasets for high performance computing with the libdavix library
Remote data access for data analysis in high performance computing is
commonly done with specialized data access protocols and storage systems. These
protocols are highly optimized for high throughput on very large datasets,
multi-streams, high availability, low latency and efficient parallel I/O. The
purpose of this paper is to describe how we have adapted a generic protocol,
the Hyper Text Transport Protocol (HTTP) to make it a competitive alternative
for high performance I/O and data analysis applications in a global computing
grid: the Worldwide LHC Computing Grid. In this work, we first analyze the
design differences between the HTTP protocol and the most common high
performance I/O protocols, pointing out the main performance weaknesses of
HTTP. Then, we describe in detail how we solved these issues. Our solutions
have been implemented in a toolkit called davix, available through several
recent Linux distributions. Finally, we describe the results of our benchmarks
where we compare the performance of davix against a HPC specific protocol for a
data analysis use case.Comment: Presented at: Very large Data Bases (VLDB) 2014, Hangzho
The repertoire and features of human platelet microRNAs
Playing a central role in the maintenance of hemostasis as well as in thrombotic disorders, platelets contain a relatively diverse messenger RNA (mRNA) transcriptome as well as functional mRNA-regulatory microRNAs, suggesting that platelet mRNAs may be regulated by microRNAs. Here, we elucidated the complete repertoire and features of human platelet microRNAs by high-throughput sequencing. More than 492 different mature microRNAs were detected in human platelets, whereas the list of known human microRNAs was expanded further by the discovery of 40 novel microRNA sequences. As in nucleated cells, platelet microRNAs bear signs of post-transcriptional modifications, mainly terminal adenylation and uridylation. In vitro enzymatic assays demonstrated the ability of human platelets to uridylate microRNAs, which correlated with the presence of the uridyltransferase enzyme TUT4. We also detected numerous microRNA isoforms (isomiRs) resulting from imprecise Drosha and/or Dicer processing, in some cases more frequently than the reference microRNA sequence, including 5′ shifted isomiRs with redirected mRNA targeting abilities. This study unveils the existence of a relatively diverse and complex microRNA repertoire in human platelets, and represents a mandatory step towards elucidating the intraplatelet and extraplatelet role, function and importance of platelet microRNAs
Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing
<p>Abstract</p> <p>Background</p> <p>Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing.</p> <p>Results</p> <p>Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms.</p> <p>Conclusions</p> <p>We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.</p
A machine learning case–control classifier for schizophrenia based on DNA methylation in blood
Epigenetic dysregulation is thought to contribute to the etiology of schizophrenia (SZ), but the cell type-specificity of DNA methylation makes population-based epigenetic studies of SZ challenging. To train an SZ case–control classifier based on DNA methylation in blood, therefore, we focused on human genomic regions of systemic interindividual epigenetic variation (CoRSIVs), a subset of which are represented on the Illumina Human Methylation 450K (HM450) array. HM450 DNA methylation data on whole blood of 414 SZ cases and 433 non-psychiatric controls were used as training data for a classification algorithm with built-in feature selection, sparse partial least squares discriminate analysis (SPLS-DA); application of SPLS-DA to HM450 data has not been previously reported. Using the first two SPLS-DA dimensions we calculated a “risk distance” to identify individuals with the highest probability of SZ. The model was then evaluated on an independent HM450 data set on 353 SZ cases and 322 non-psychiatric controls. Our CoRSIV-based model classified 303 individuals as cases with a positive predictive value (PPV) of 80%, far surpassing the performance of a model based on polygenic risk score (PRS). Importantly, risk distance (based on CoRSIV methylation) was not associated with medication use, arguing against reverse causality. Risk distance and PRS were positively correlated (Pearson r = 0.28, P = 1.28 × 10−12), and mediational analysis suggested that genetic effects on SZ are partially mediated by altered methylation at CoRSIVs. Our results indicate two innate dimensions of SZ risk: one based on genetic, and the other on systemic epigenetic variants
Ronin Governs Early Heart Development by Controlling Core Gene Expression Programs.
Ronin (THAP11), a DNA-binding protein that evolved from a primordial DNA transposon by molecular domestication, recognizes a hyperconserved promoter sequence to control developmentally and metabolically essential genes in pluripotent stem cells. However, it remains unclear whether Ronin or related THAP proteins perform similar functions in development. Here, we present evidence that Ronin functions within the nascent heart as it arises from the mesoderm and forms a four-chambered organ. We show that Ronin is vital for cardiogenesis during midgestation by controlling a set of critical genes. The activity of Ronin coincided with the recruitment of its cofactor, Hcf-1, and the elevation of H3K4me3 levels at specific target genes, suggesting the involvement of an epigenetic mechanism. On the strength of these findings, we propose that Ronin activity during cardiogenesis offers a template to understand how important gene programs are sustained across different cell types within a developing organ such as the heart
Exponentially hard problems are sometimes polynomial, a large deviation analysis of search algorithms for the random Satisfiability problem, and its application to stop-and-restart resolutions
A large deviation analysis of the solving complexity of random
3-Satisfiability instances slightly below threshold is presented. While finding
a solution for such instances demands an exponential effort with high
probability, we show that an exponentially small fraction of resolutions
require a computation scaling linearly in the size of the instance only. This
exponentially small probability of easy resolutions is analytically calculated,
and the corresponding exponent shown to be smaller (in absolute value) than the
growth exponent of the typical resolution time. Our study therefore gives some
theoretical basis to heuristic stop-and-restart solving procedures, and
suggests a natural cut-off (the size of the instance) for the restart.Comment: Revtex file, 4 figure
Recommended from our members
DNA methylation in AgRP neurons regulates voluntary exercise behavior in mice.
DNA methylation regulates cell type-specific gene expression. Here, in a transgenic mouse model, we show that deletion of the gene encoding DNA methyltransferase Dnmt3a in hypothalamic AgRP neurons causes a sedentary phenotype characterized by reduced voluntary exercise and increased adiposity. Whole-genome bisulfite sequencing (WGBS) and transcriptional profiling in neuronal nuclei from the arcuate nucleus of the hypothalamus (ARH) reveal differentially methylated genomic regions and reduced expression of AgRP neuron-associated genes in knockout mice. We use read-level analysis of WGBS data to infer putative ARH neural cell types affected by the knockout, and to localize promoter hypomethylation and increased expression of the growth factor Bmp7 to AgRP neurons, suggesting a role for aberrant TGF-β signaling in the development of this phenotype. Together, these data demonstrate that DNA methylation in AgRP neurons is required for their normal epigenetic development and neuron-specific gene expression profiles, and regulates voluntary exercise behavior
Plasma Urea Cycle Metabolites May Be Useful Biomarkers in Children With Eosinophilic Esophagitis
Background: Eosinophilic esophagitis (EoE) is a disorder of the esophagus that has become increasingly recognized in children. Because these children undergo multiple endoscopies, discovering a non-invasive biomarker of disease activity is highly desirable. The aim of this study was to use targeted plasma metabolomics to identify potential biomarker candidates for EoE in a discovery phase.Methods: A prospective, single-center clinical trial was performed on 24 children ages 2–18 years with and without EoE undergoing upper endoscopy for any indication. Blood samples were collected for metabolomics profiling using the subclasses: amino acids, tricarboxylic acid cycle, acetylation, and methylation. Using mass spectrometry and systematic bioinformatics analysis, 48 metabolites were measured and compared between children with active EoE (+EoE) and controls (–EoE). To investigate the effect of proton pump inhibitor (PPI) use on metabolites, patients were also stratified based on PPI use (+PPI, –PPI).Results: Seven children had active EoE at the time of endoscopy. Eleven children were on PPI (4 with EoE). Of the 48 metabolites measured, 8 plasma metabolites showed statistically significant differences (p < 0.05) comparing +EoE –PPI to –EoE –PPI, a few of which were upregulated metabolites involved in the urea cycle. There were 14 significant differences comparing +EoE +PPI to +EoE –PPI. This demonstrated that in EoE patients, PPI use upregulated metabolites involved in the urea cycle, while it downregulated metabolites involved in methylation. Comparison among all four groups, +EoE +PPI, +EoE –PPI, –EoE +PPI, and –EoE –PPI, revealed 27 significantly different metabolites. +EoE +PPI had downregulated methionine and N-acetyl methionine, while both +EoE groups and –EoE +PPI had upregulated homocysteine, N-acetylputrescine, N-acetylornithine, arginine, and ornithine.Conclusion: The present study revealed key plasma metabolite differences in children with EoE compared to unaffected controls. Notable candidate biomarkers include dimethylarginine, putrescine, and N-acetylputrescine. PPI use was shown to influence these urea cycle metabolites, regardless of EoE presence. Therefore, future studies should distinguish patients based on PPI use or determine metabolites while not on treatment. These findings will be confirmed in a larger validation phase, as this may represent a significant discovery in the search for a non-invasive biomarker for EoE.Clinical Trial Registration: This clinical trial was registered with ClinicalTrials.gov, identifier: NCT 03107819
- …