20 research outputs found

    Table_2_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.XLSX

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Image_1_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.pdf

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Table_1_MARVEL, a Tool for Prediction of Bacteriophage Sequences in Metagenomic Bins.XLSX

    No full text
    <p>Here we present MARVEL, a tool for prediction of double-stranded DNA bacteriophage sequences in metagenomic bins. MARVEL uses a random forest machine learning approach. We trained the program on a dataset with 1,247 phage and 1,029 bacterial genomes, and tested it on a dataset with 335 bacterial and 177 phage genomes. We show that three simple genomic features extracted from contig sequences were sufficient to achieve a good performance in separating bacterial from phage sequences: gene density, strand shifts, and fraction of significant hits to a viral protein database. We compared the performance of MARVEL to that of VirSorter and VirFinder, two popular programs for predicting viral sequences. Our results show that all three programs have comparable specificity, but MARVEL achieves much better performance on the recall (sensitivity) measure. This means that MARVEL should be able to identify many more phage sequences in metagenomic bins than heretofore has been possible. In a simple test with real data, containing mostly bacterial sequences, MARVEL classified 58 out of 209 bins as phage genomes; other evidence suggests that 57 of these 58 bins are novel phage sequences. MARVEL is freely available at https://github.com/LaboratorioBioinformatica/MARVEL.</p

    Table_1_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.XLSX

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    Table_9_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.docx

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    Table_8_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.XLSX

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    Table_3_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.xlsx

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    Data_Sheet_1_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.pdf

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    Table_2_Chromatin Landscape Distinguishes the Genomic Loci of Hundreds of Androgen-Receptor-Associated LincRNAs From the Loci of Non-associated LincRNAs.XLSX

    No full text
    <p>Cell signaling events triggered by androgen hormone in prostate cells is dependent on activation of the androgen receptor (AR) transcription factor. Androgen hormone binding to AR promotes its displacement from the cytoplasm to the nucleus and AR binding to DNA motifs, thus inducing activatory and inhibitory transcriptional programs through a complex regulatory mechanism not yet fully understood. In this work, we performed RNA-seq deep-sequencing of LNCaP prostate cancer cells and found over 7000 expressed long intergenic non-coding RNAs (lincRNAs), of which ∼4000 are novel lincRNAs, and 258 lincRNAs have their expression activated by androgen. Immunoprecipitation of AR, followed by large-scale sequencing of co-immunoprecipitated RNAs (RIP-Seq) has identified in the LNCaP cell line a total of 619 lincRNAs that were significantly enriched (FDR < 10%, DESeq2) in the anti-Androgen Receptor (antiAR) fraction in relation to the control fraction (non-specific IgG), and we named them Androgen-Receptor-Associated lincRNAs (ARA-lincRNAs). A genome-wide analysis showed that protein-coding gene neighbors to ARA-lincRNAs had a significantly higher androgen-induced change in expression than protein-coding genes neighboring lincRNAs not associated to AR. To find relevant epigenetic signatures enriched at the ARA-lincRNAs’ transcription start sites (TSSs) we used a machine learning approach and identified that the ARA-lincRNA genomic loci in LNCaP cells are significantly enriched with epigenetic marks that are characteristic of in cis enhancer RNA regulators, and that the H3K27ac mark of active enhancers is conspicuously enriched at the TSS of ARA-lincRNAs adjacent to androgen-activated protein-coding genes. In addition, LNCaP topologically associating domains (TADs) that comprise chromatin regions with ARA-lincRNAs exhibit transcription factor contents, epigenetic marks and gene transcriptional activities that are significantly different from TADs not containing ARA-lincRNAs. This work highlights the possible involvement of hundreds of lincRNAs working in synergy with the AR on the genome-wide androgen-induced gene regulatory program in prostate cells.</p

    <i>Schistosoma mansoni</i> Egg, Adult Male and Female Comparative Gene Expression Analysis and Identification of Novel Genes by RNA-Seq

    No full text
    <div><p>Background</p><p>Schistosomiasis is one of the most prevalent parasitic diseases worldwide and is a public health problem. <i>Schistosoma mansoni</i> is the most widespread species responsible for schistosomiasis in the Americas, Middle East and Africa. Adult female worms (mated to males) release eggs in the hepatic portal vasculature and are the principal cause of morbidity. Comparative separate transcriptomes of female and male adult worms were previously assessed with using microarrays and Serial Analysis of Gene Expression (SAGE), thus limiting the possibility of finding novel genes. Moreover, the egg transcriptome was analyzed only once with limited bacterially cloned cDNA libraries.</p><p>Methodology/Principal findings</p><p>To compare the gene expression of <i>S</i>. <i>mansoni</i> eggs, females, and males, we performed RNA-Seq on these three parasite forms using 454/Roche technology and reconstructed the transcriptome using Trinity <i>de novo</i> assembly. The resulting contigs were mapped to the genome and were cross-referenced with predicted Smp genes and H3K4me3 ChIP-Seq public data. For the first time, we obtained separate, unbiased gene expression profiles for <i>S</i>. <i>mansoni</i> eggs and female and male adult worms, identifying enriched biological processes and specific enriched functions for each of the three parasite forms. Transcripts with no match to predicted genes were analyzed for their protein-coding potential and the presence of an encoded conserved protein domain. A set of 232 novel protein-coding genes with putative functions related to reproduction, metabolism, and cell biogenesis was detected, which contributes to the understanding of parasite biology.</p><p>Conclusions/Significance</p><p>Large-scale RNA-Seq analysis using <i>de novo</i> assembly associated with genome-wide information for histone marks in the vicinity of gene models constitutes a new approach to transcriptome analysis that has not yet been explored in schistosomes. Importantly, all data have been consolidated into a UCSC Genome Browser search- and download-tool (<a href="http://schistosoma.usp.br/" target="_blank">http://schistosoma.usp.br/</a>). This database provides new ways to explore the schistosome genome and transcriptome and will facilitate molecular research on this important parasite.</p></div
    corecore