206 research outputs found

    11th German Conference on Chemoinformatics (GCC 2015) : Fulda, Germany. 8-10 November 2015.

    Get PDF

    Dinucleotide controlled null models for comparative RNA gene prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.</p> <p>Results</p> <p>We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.</p> <p>Conclusion</p> <p>SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.</p> <p>Availability</p> <p>SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

    Overview of ASDEX Upgrade results

    Get PDF
    Recent results from the ASDEX Upgrade experimental campaigns 2001 and 2002 are presented. An improved understanding of energy and particle transport emerges in terms of a 'critical gradient' model for the temperature gradients. Coupling this to particle diffusion explains most of the observed behaviour of the density profiles, in particular, the finding that strong central heating reduces the tendency for density profile peaking. Internal transport barriers (ITBs) with electron and ion temperatures in excess of 20 keV (but not simultaneously) have been achieved. By shaping the plasma, a regime with small type II edge localized modes (ELMs) has been established. Here, the maximum power deposited on the target plates was greatly reduced at constant average power. Also, an increase of the ELM frequency by injection of shallow pellets was demonstrated. ELM free operation is possible in the quiescent H-mode regime previously found in DIII-D which has also been established on ASDEX Upgrade. Regarding stability, a regime with benign neoclassical tearing modes (NTMs) was found. During electron cyclotron current drive (ECCD) stabilization of NTMs, βN could be increased well above the usual onset level without a reappearance of the NTM. Electron cyclotron resonance heating and ECCD have also been used to control the sawtooth repetition frequency at a moderate fraction of the total heating power. The inner wall of the ASDEX Upgrade vessel has increasingly been covered with tungsten without causing detrimental effects on the plasma performance. Regarding scenario integration, a scenario with a large fraction of noninductively driven current (≥50%), but without ITB has been established. It combines improved confinement (τE/τITER98 ≈ 1.2) and stability (βN ≤ 3.5) at high Greenwald fraction (ne/nGW ≈ 0.85) in steady state and with type II ELMy edge and would offer the possibility for long pulses with high fusion power at reduced current in ITER

    A Machine Learning Approach for Identifying Novel Cell Type–Specific Transcriptional Regulators of Myogenesis

    Get PDF
    Transcriptional enhancers integrate the contributions of multiple classes of transcription factors (TFs) to orchestrate the myriad spatio-temporal gene expression programs that occur during development. A molecular understanding of enhancers with similar activities requires the identification of both their unique and their shared sequence features. To address this problem, we combined phylogenetic profiling with a DNA–based enhancer sequence classifier that analyzes the TF binding sites (TFBSs) governing the transcription of a co-expressed gene set. We first assembled a small number of enhancers that are active in Drosophila melanogaster muscle founder cells (FCs) and other mesodermal cell types. Using phylogenetic profiling, we increased the number of enhancers by incorporating orthologous but divergent sequences from other Drosophila species. Functional assays revealed that the diverged enhancer orthologs were active in largely similar patterns as their D. melanogaster counterparts, although there was extensive evolutionary shuffling of known TFBSs. We then built and trained a classifier using this enhancer set and identified additional related enhancers based on the presence or absence of known and putative TFBSs. Predicted FC enhancers were over-represented in proximity to known FC genes; and many of the TFBSs learned by the classifier were found to be critical for enhancer activity, including POU homeodomain, Myb, Ets, Forkhead, and T-box motifs. Empirical testing also revealed that the T-box TF encoded by org-1 is a previously uncharacterized regulator of muscle cell identity. Finally, we found extensive diversity in the composition of TFBSs within known FC enhancers, suggesting that motif combinatorics plays an essential role in the cellular specificity exhibited by such enhancers. In summary, machine learning combined with evolutionary sequence analysis is useful for recognizing novel TFBSs and for facilitating the identification of cognate TFs that coordinate cell type–specific developmental gene expression patterns

    Transcriptome Profiling of Testis during Sexual Maturation Stages in Eriocheir sinensis Using Illumina Sequencing

    Get PDF
    The testis is a highly specialized tissue that plays dual roles in ensuring fertility by producing spermatozoa and hormones. Spermatogenesis is a complex process, resulting in the production of mature sperm from primordial germ cells. Significant structural and biochemical changes take place in the seminiferous epithelium of the adult testis during spermatogenesis. The gene expression pattern of testis in Chinese mitten crab (Eriocheir sinensis) has not been extensively studied, and limited genetic research has been performed on this species. The advent of high-throughput sequencing technologies enables the generation of genomic resources within a short period of time and at minimal cost. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive transcript dataset for testis of E. sinensis. In two runs, we produced 25,698,778 sequencing reads corresponding with 2.31 Gb total nucleotides. These reads were assembled into 342,753 contigs or 141,861 scaffold sequences, which identified 96,311 unigenes. Based on similarity searches with known proteins, 39,995 unigenes were annotated based on having a Blast hit in the non-redundant database or ESTscan results with a cut-off E-value above 10−5. This is the first report of a mitten crab transcriptome using high-throughput sequencing technology, and all these testes transcripts can help us understand the molecular mechanisms involved in spermatogenesis and testis maturation

    HP1a Targets the Drosophila KDM4A Demethylase to a Subset of Heterochromatic Genes to Regulate H3K36me3 Levels

    Get PDF
    The KDM4 subfamily of JmjC domain-containing demethylases mediates demethylation of histone H3K36me3/me2 and H3K9me3/me2. Several studies have shown that human and yeast KDM4 proteins bind to specific gene promoters and regulate gene expression. However, the genome-wide distribution of KDM4 proteins and the mechanism of genomic-targeting remain elusive. We have previously identified Drosophila KDM4A (dKDM4A) as a histone H3K36me3 demethylase that directly interacts with HP1a. Here, we performed H3K36me3 ChIP-chip analysis in wild type and dkdm4a mutant embryos to identify genes regulated by dKDM4A demethylase activity in vivo. A subset of heterochromatic genes that show increased H3K36me3 levels in dkdm4a mutant embryos overlap with HP1a target genes. More importantly, binding to HP1a is required for dKDM4A-mediated H3K36me3 demethylation at a subset of heterochromatic genes. Collectively, these results show that HP1a functions to target the H3K36 demethylase dKDM4A to heterochromatic genes in Drosophila

    Axially Symmetric Divertor Experiment (ASDEX) Upgrade Team (vol 81, 033507, 2010)

    Get PDF

    De Novo Transcriptomic Analysis of an Oleaginous Microalga: Pathway Description and Gene Discovery for Production of Next-Generation Biofuels

    Get PDF
    Background: Eustigmatos cf. polyphem is a yellow-green unicellular soil microalga belonging to the eustimatophyte with high biomass and considerable production of triacylglycerols (TAGs) for biofuels, which is thus referred to as an oleaginous microalga. The paucity of microalgae genome sequences, however, limits development of gene-based biofuel feedstock optimization studies. Here we describe the sequencing and de novo transcriptome assembly for a non-model microalgae species, E. cf. polyphem, and identify pathways and genes of importance related to biofuel production. Results: We performed the de novo assembly of E. cf. polyphem transcriptome using Illumina paired-end sequencing technology. In a single run, we produced 29,199,432 sequencing reads corresponding to 2.33 Gb total nucleotides. These reads were assembled into 75,632 unigenes with a mean size of 503 bp and an N50 of 663 bp, ranging from 100 bp to.3,000 bp. Assembled unigenes were subjected to BLAST similarity searches and annotated with Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) orthology identifiers. These analyses identified the majority of carbohydrate, fatty acids, TAG and carotenoids biosynthesis and catabolism pathways in E. cf. polyphem. Conclusions: Our data provides the construction of metabolic pathways involved in the biosynthesis and catabolism of carbohydrate, fatty acids, TAG and carotenoids in E. cf. polyphem and provides a foundation for the molecular genetics and functional genomics required to direct metabolic engineering efforts that seek to enhance the quantity and character o

    Importance of patient bed pathways and length of stay differences in predicting COVID-19 hospital bed occupancy in England.

    Get PDF
    Background: Predicting bed occupancy for hospitalised patients with COVID-19 requires understanding of length of stay (LoS) in particular bed types. LoS can vary depending on the patient’s “bed pathway” - the sequence of transfers of individual patients between bed types during a hospital stay. In this study, we characterise these pathways, and their impact on predicted hospital bed occupancy. Methods: We obtained data from University College Hospital (UCH) and the ISARIC4C COVID-19 Clinical Information Network (CO-CIN) on hospitalised patients with COVID-19 who required care in general ward or critical care (CC) beds to determine possible bed pathways and LoS. We developed a discrete-time model to examine the implications of using either bed pathways or only average LoS by bed type to forecast bed occupancy. We compared model-predicted bed occupancy to publicly available bed occupancy data on COVID-19 in England between March and August 2020. Results: In both the UCH and CO-CIN datasets, 82% of hospitalised patients with COVID-19 only received care in general ward beds. We identified four other bed pathways, present in both datasets: “Ward, CC, Ward”, “Ward, CC”, “CC” and “CC, Ward”. Mean LoS varied by bed type, pathway, and dataset, between 1.78 and 13.53 days. For UCH, we found that using bed pathways improved the accuracy of bed occupancy predictions, while only using an average LoS for each bed type underestimated true bed occupancy. However, using the CO-CIN LoS dataset we were not able to replicate past data on bed occupancy in England, suggesting regional LoS heterogeneities. Conclusions: We identified five bed pathways, with substantial variation in LoS by bed type, pathway, and geography. This might be caused by local differences in patient characteristics, clinical care strategies, or resource availability, and suggests that national LoS averages may not be appropriate for local forecasts of bed occupancy for COVID-19. Trial registration: The ISARIC WHO CCP-UK study ISRCTN66726260 was retrospectively registered on 21/04/2020 and designated an Urgent Public Health Research Study by NIHR.</p
    corecore