60 research outputs found
Assessment of genetic variation for the LINE-1 retrotransposon from next generation sequence data
<p>Abstract</p> <p>Background</p> <p>In humans, copies of the Long Interspersed Nuclear Element 1 (LINE-1) retrotransposon comprise 21% of the reference genome, and have been shown to modulate expression and produce novel splice isoforms of transcripts from genes that span or neighbor the LINE-1 insertion site.</p> <p>Results</p> <p>In this work, newly released pilot data from the 1000 Genomes Project is analyzed to detect previously unreported full length insertions of the retrotransposon LINE-1. By direct analysis of the sequence data, we have identified 22 previously unreported LINE-1 insertion sites within the sequence data reported for a mother/father/daughter trio.</p> <p>Conclusions</p> <p>It is demonstrated here that next generation sequencing data, as well as emerging high quality datasets from individual genome projects allow us to assess the amount of heterogeneity with respect to the LINE-1 retrotransposon amongst humans, and provide us with a wealth of testable hypotheses as to the impact that this diversity may have on the health of individuals and populations.</p
rMotifGen: random motif generator for DNA and protein sequences
<p>Abstract</p> <p>Background</p> <p>Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms.</p> <p>Results</p> <p>Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages.</p> <p>Conclusion</p> <p>rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: <url>http://bioinformatics.louisville.edu/brg/rMotifGen/</url>.</p
Systematic assessment of long-read RNA-seq methods for transcript identification and quantification
The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis
Prenatal Arsenic Exposure Alters Gene Expression in the Adult Liver to a Proinflammatory State Contributing to Accelerated Atherosclerosis
The mechanisms by which environmental toxicants alter developmental processes predisposing individuals to adult onset chronic disease are not well-understood. Transplacental arsenic exposure promotes atherogenesis in apolipoprotein E-knockout (ApoE−/−) mice. Because the liver plays a central role in atherosclerosis, diabetes and metabolic syndrome, we hypothesized that accelerated atherosclerosis may be linked to altered hepatic development. This hypothesis was tested in ApoE−/− mice exposed to 49 ppm arsenic in utero from gestational day (GD) 8 to term. GD18 hepatic arsenic was 1.2 µg/g in dams and 350 ng/g in fetuses. The hepatic transcriptome was evaluated by microarray analysis to assess mRNA and microRNA abundance in control and exposed pups at postnatal day (PND) 1 and PND70. Arsenic exposure altered postnatal developmental trajectory of mRNA and microRNA profiles. We identified an arsenic exposure related 51-gene signature at PND1 and PND70 with several hubs of interaction (Hspa8, IgM and Hnf4a). Gene ontology (GO) annotation analyses indicated that pathways for gluconeogenesis and glycolysis were suppressed in exposed pups at PND1, and pathways for protein export, ribosome, antigen processing and presentation, and complement and coagulation cascades were induced by PND70. Promoter analysis of differentially-expressed transcripts identified enriched transcription factor binding sites and clustering to common regulatory sites. SREBP1 binding sites were identified in about 16% of PND70 differentially-expressed genes. Western blot analysis confirmed changes in the liver at PND70 that included increases of heat shock protein 70 (Hspa8) and active SREBP1. Plasma AST and ALT levels were increased at PND70. These results suggest that transplacental arsenic exposure alters developmental programming in fetal liver, leading to an enduring stress and proinflammatory response postnatally that may contribute to early onset of atherosclerosis. Genes containing SREBP1 binding sites also suggest pathways for diabetes mellitus and rheumatoid arthritis, both diseases that contribute to increased cardiovascular disease in humans
Systematic assessment of long-read RNA-seq methods for transcript identification and quantification
The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis
- …