323 research outputs found
Paired-End Mappability of Transposable Elements in the Human Genome
Though transposable elements make up around half of the human genome, the repetitive nature of their sequences makes it difficult to accurately align conventional sequencing reads. However, in light of new advances in sequencing technology, such as increased read length and paired-end libraries, these repetitive regions are now becoming easier to align to. This study investigates the mappability of transposable elements with 50 bp, 76 bp and 100 bp paired-end read libraries. With respect to those read lengths and allowing for 3 mismatches during alignment, over 68, 85, and 88% of all transposable elements in the RepeatMasker database are uniquely mappable, suggesting that accurate locus-specific mapping of older transposable elements is well within reach
Gene Family Evolution across 12 Drosophila Genomes
Comparison of whole genomes has revealed large and frequent changes in the size of gene families. These changes occur because of high rates of both gene gain (via duplication) and loss (via deletion or pseudogenization), as well as the evolution of entirely new genes. Here we use the genomes of 12 fully sequenced Drosophila species to study the gain and loss of genes at unprecedented resolution. We find large numbers of both gains and losses, with over 40% of all gene families differing in size among the Drosophila. Approximately 17 genes are estimated to be duplicated and fixed in a genome every million years, a rate on par with that previously found in both yeast and mammals. We find many instances of extreme expansions or contractions in the size of gene families, including the expansion of several sex- and spermatogenesis-related families in D. melanogaster that also evolve under positive selection at the nucleotide level. Newly evolved gene families in our dataset are associated with a class of testes-expressed genes known to have evolved de novo in a number of cases. Gene family comparisons also allow us to identify a number of annotated D. melanogaster genes that are unlikely to encode functional proteins, as well as to identify dozens of previously unannotated D. melanogaster genes with conserved homologs in the other Drosophila. Taken together, our results demonstrate that the apparent stasis in total gene number among species has masked rapid turnover in individual gene gain and loss. It is likely that this genomic revolving door has played a large role in shaping the morphological, physiological, and metabolic differences among species
Transcriptome Analyses of Tumor-Adjacent Somatic Tissues Reveal Genes Co-Expressed with Transposable Elements
Background: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. Results: Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. Conclusions: We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes
Correction to: Transcriptome Analyses of Tumor-Adjacent Somatic Tissues Reveal Genes Co-Expressed with Transposable Elements
Following publication of the original article [1], the authors reported errors in Table 2 wherein all “KZFP” in the gene names should be changed to “ZNF”
Machine Learning Approaches for the Prediction of Bone Mineral Density by Using Genomic and Phenotypic Data of 5130 Older Men
The study aimed to utilize machine learning (ML) approaches and genomic data to develop a prediction model for bone mineral density (BMD) and identify the best modeling approach for BMD prediction. The genomic and phenotypic data of Osteoporotic Fractures in Men Study (n = 5130) was analyzed. Genetic risk score (GRS) was calculated from 1103 associated SNPs for each participant after a comprehensive genotype imputation. Data were normalized and divided into a training set (80%) and a validation set (20%) for analysis. Random forest, gradient boosting, neural network, and linear regression were used to develop BMD prediction models separately. Ten-fold cross-validation was used for hyper-parameters optimization. Mean square error and mean absolute error were used to assess model performance. When using GRS and phenotypic covariates as the predictors, all ML models’ performance and linear regression in BMD prediction were similar. However, when replacing GRS with the 1103 individual SNPs in the model, ML models performed significantly better than linear regression (with lasso regularization), and the gradient boosting model performed the best. Our study suggested that ML models, especially gradient boosting, can improve BMD prediction in genomic data
phyloXML: XML for evolutionary biology and comparative genomics
<p>Abstract</p> <p>Background</p> <p>Evolutionary trees are central to a wide range of biological studies. In many of these studies, tree nodes and branches need to be associated (or annotated) with various attributes. For example, in studies concerned with organismal relationships, tree nodes are associated with taxonomic names, whereas tree branches have lengths and oftentimes support values. Gene trees used in comparative genomics or phylogenomics are usually annotated with taxonomic information, genome-related data, such as gene names and functional annotations, as well as events such as gene duplications, speciations, or exon shufflings, combined with information related to the evolutionary tree itself. The data standards currently used for evolutionary trees have limited capacities to incorporate such annotations of different data types.</p> <p>Results</p> <p>We developed a XML language, named phyloXML, for describing evolutionary trees, as well as various associated data items. PhyloXML provides elements for commonly used items, such as branch lengths, support values, taxonomic names, and gene names and identifiers. By using "property" elements, phyloXML can be adapted to novel and unforeseen use cases. We also developed various software tools for reading, writing, conversion, and visualization of phyloXML formatted data.</p> <p>Conclusion</p> <p>PhyloXML is an XML language defined by a complete schema in XSD that allows storing and exchanging the structures of evolutionary trees as well as associated data. More information about phyloXML itself, the XSD schema, as well as tools implementing and supporting phyloXML, is available at <url>http://www.phyloxml.org</url>.</p
A Complex Suite of Forces Drives Gene Traffic from Drosophila X Chromosomes
Theoretical studies predict X chromosomes and autosomes should be under different selection pressures, and there should therefore be differences in sex-specific and sexually antagonistic gene content between the X and the autosomes. Previous analyses have identified an excess of genes duplicated by retrotransposition from the X chromosome in Drosophila melanogaster. A number of hypotheses may explain this pattern, including mutational bias, escape from X-inactivation during spermatogenesis, and the movement of male-favored (sexually antagonistic) genes from a chromosome that is predominantly carried by females. To distinguish among these processes and to examine the generality of these patterns, we identified duplicated genes in nine sequenced Drosophila genomes. We find that, as in D. melanogaster, there is an excess of genes duplicated from the X chromosome across the genus Drosophila. This excess duplication is due almost completely to genes duplicated by retrotransposition, with little to no excess from the X among genes duplicated via DNA intermediates. The only exception to this pattern appears within the burst of duplication that followed the creation of the Drosophila pseudoobscura neo-X chromosome. Additionally, we examined genes relocated among chromosomal arms (i.e., genes duplicated to new locations coupled with the loss of the copy in the ancestral locus) and found an excess of genes relocated off the ancestral X and neo-X chromosomes. Interestingly, many of the same genes were duplicated or relocated from the independently derived neo-X chromosomes of D. pseudoobscura and Drosophila willistoni, suggesting that natural selection favors the traffic of genes from X chromosomes. Overall, we find that the forces driving gene duplication from X chromosomes are dependent on the lineage in question, the molecular mechanism of duplication considered, the preservation of the ancestral copy, and the age of the X chromosome
Models of Neutrino Masses and Mixings
We review theoretical ideas, problems and implications of neutrino masses and
mixing angles. We give a general discussion of schemes with three light
neutrinos. Several specific examples are analyzed in some detail, particularly
those that can be embedded into grand unified theories.Comment: 44 pages, 2 figures, version accepted for publication on the Focus
Issue on 'Neutrino Physics' edited by F.Halzen, M.Lindner and A. Suzuki, to
be published in New Journal of Physics
Genome Sequence of Fusobacterium nucleatum Subspecies Polymorphum — a Genetically Tractable Fusobacterium
Fusobacterium nucleatum is a prominent member of the oral microbiota and is a common cause of human infection. F. nucleatum includes five subspecies: polymorphum, nucleatum, vincentii, fusiforme, and animalis. F. nucleatum subsp. polymorphum ATCC 10953 has been well characterized phenotypically and, in contrast to previously sequenced strains, is amenable to gene transfer. We sequenced and annotated the 2,429,698 bp genome of F. nucleatum subsp. polymorphum ATCC 10953. Plasmid pFN3 from the strain was also sequenced and analyzed. When compared to the other two available fusobacterial genomes (F. nucleatum subsp. nucleatum, and F. nucleatum subsp. vincentii) 627 open reading frames unique to F. nucleatum subsp. polymorphum ATCC 10953 were identified. A large percentage of these mapped within one of 28 regions or islands containing five or more genes. Seventeen percent of the clustered proteins that demonstrated similarity were most similar to proteins from the clostridia, with others being most similar to proteins from other gram-positive organisms such as Bacillus and Streptococcus. A ten kilobase region homologous to the Salmonella typhimurium propanediol utilization locus was identified, as was a prophage and integrated conjugal plasmid. The genome contains five composite ribozyme/transposons, similar to the CdISt IStrons described in Clostridium difficile. IStrons are not present in the other fusobacterial genomes. These findings indicate that F. nucleatum subsp. polymorphum is proficient at horizontal gene transfer and that exchange with the Firmicutes, particularly the Clostridia, is common
- …