650 research outputs found

    MetWAMer: eukaryotic translation initiation site prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Translation initiation site (TIS) identification is an important aspect of the gene annotation process, requisite for the accurate delineation of protein sequences from transcript data. We have developed the MetWAMer package for TIS prediction in eukaryotic open reading frames of non-viral origin. MetWAMer can be used as a stand-alone, third-party tool for post-processing gene structure annotations generated by external computational programs and/or pipelines, or directly integrated into gene structure prediction software implementations.</p> <p>Results</p> <p>MetWAMer currently implements five distinct methods for TIS prediction, the most accurate of which is a routine that combines weighted, signal-based translation initiation site scores and the contrast in coding potential of sequences flanking TISs using a perceptron. Also, our program implements clustering capabilities through use of the <it>k</it>-medoids algorithm, thereby enabling cluster-specific TIS parameter utilization. In practice, our static weight array matrix-based indexing method for parameter set lookup can be used with good results in data sets exhibiting moderate levels of 5'-complete coverage.</p> <p>Conclusion</p> <p>We demonstrate that improvements in statistically-based models for TIS prediction can be achieved by taking the class of each potential start-methionine into account pending certain testing conditions, and that our perceptron-based model is suitable for the TIS identification task. MetWAMer represents a well-documented, extensible, and freely available software system that can be readily re-trained for differing target applications and/or extended with existing and novel TIS prediction methods, to support further research efforts in this area.</p

    Identification and assessment of variable single-copy orthologous (SCO) nuclear loci for low-level phylogenomics: a case study in the genus Rosa (Rosaceae)

    Get PDF
    International audienceBackground: With an ever-growing number of published genomes, many low levels of the Tree of Life now contain several species with enough molecular data to perform shallow-scale phylogenomic studies. Moving away from using just a few universal phylogenetic markers, we can now target thousands of other loci to decipher taxa relationships. Making the best possible selection of informative sequences regarding the taxa studied has emerged as a new issue. Here, we developed a general procedure to mine genomic data, looking for orthologous single-copy loci capable of deciphering phylogenetic relationships below the generic rank. To develop our strategy, we chose the genus Rosa, a rapid-evolving lineage of the Rosaceae family in which several species genomes have recently been sequenced. We also compared our loci to conventional plastid markers, commonly used for phylogenetic inference in this genus

    Polyploidy, base composition bias, and incomplete lineage sorting in fish phylogenetics

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 2014.Understanding the evolutionary relationships between organisms is of fundamental importance in biology. Originally based on overall similarity in morphological traits, depiction of evolutionary relationships is now often pursued by constructing trees based on molecular data- molecular phylogenetics. Molecular phylogenetic inference uses variation in molecular data in a variety of frameworks to produce hypothetical relationships between organisms. As with many practices making use of biological data, the inherent noise and complexity challenges phylogeneticists. In this dissertation, I examine three empirical datasets while addressing three possible issues in phylogenetic inference: polyploidy, base composition bias and incomplete lineage sorting. Polyploidy leads to incorrect genes (paralogs) being analyzed, since it is often impossible to distinguish between gene copies generated as a result of polyploidization. My analysis indicates that incorrect assumptions of orthology have led to incorrect conclusions being drawn from phylogenetic studies including the polyploid salmons (Salmoniformes). Results indicate that pikes (Esociformes) and the polyploid salmons are not only sister taxa, but that the graylings (Thymallinae) and whitefishes (Coregoninae) are most closely related to each other. Base composition bias misleads inference through the overall similarity between sequences being a result of changes in base composition, not shared evolutionary history. Incomplete lineage sorting refers to the fact that the reconstructed relationships of different genes do not agree. Genetic variants may persist through speciation events and are not completely "sorted" between lineages, and require a methodology to reconcile the different genealogies. In two chapters I focused on base composition bias and incomplete lineage sorting in a detailed study of flatfish (Pleuronectiformes) origins. A major issue in fish phylogenetics is the question of whether flatfish are monophyletic with poor support from both morphological and molecular data. Often it appears that cranial asymmetry is the only characteristic uniting the group. I found very little evidence for a single evolutionary origin of the extant flatfishes. Base composition bias appears not to be a major contributor to flatfish non-monophyly; however incomplete lineage sorting likely results in the inability to generate robust statistical support for inferred relationships of flatfishes and relatives. Results of my work indicate that more care should be exercised in phylogenetics in determining orthology of genes. I also find that not acknowledging the presence of paralogs does indeed mislead analyses. With increased data availability and computational capabilities, non-neutral models of nucleotide evolution should be developed and included in further studies. Presenting the heterogeneity of datasets and actively accounting for incomplete lineage sorting will definitively improve the field of phylogenetics as well.Chapter 1. Introduction -- Chapter 2. Pike and Salmon as Sister Taxa: Detailed Intraclade Resolution and Divergence Time Estimation of Esociformes + Salmoniformes Based on Whole Mitochondrial Genome Sequences -- Chapter 3. Are Flatfishes (Pleuronectiformes) Monophyletic? -- Chapter 4. Mitochondrial Genomic Investigation of Flatfish (Pleuronectiformes) Monophyly -- Chapter 5. Conclusion

    Phylogenomics and Historical Biogeography of the Gooseneck Barnacle Pollicipes elegans

    Get PDF
    This dissertation explores the systematics, biogeography, and genomics of the gooseneck barnacle Pollicipes elegans, a marine crustacean of the tropical Eastern Pacific. In Chapter 1, I provide a broad framework for my research by introducing and focusing on the long-­‐standing debate of the mechanisms behind the latitudinal gradient in species diversity, which provided the initial motivation for using Pollicipes elegans as a model system to study the mechanisms leading to genetic differentiation and speciation in tropical regions. In Chapter 2, I examine the genetic structure, infer patterns of connectivity across the warm tropical waters of the eastern Pacific, and reconstruct the biogeographic history of P. elegans using a statistical phylogeographic framework. Using mitochondrial DNA sequences, I found strong evidence supporting an out-­‐of-­‐the tropics model of speciation in P. elegans, with a clear phylogeographical break between populations in Mexico and all populations to the south. In Chapter 3, I added sequence data from six nuclear genes to the analysis of genetic structure and found strong evidence for two cryptic species within the nominal P. elegans that likely originated by allopatric speciation across the Central American Gap. I estimated the divergence times between peripheral and central populations, and the effective population sizes of these populations, and found again support for an out-­‐of-­‐the-­‐tropics model of diversification. In Chapter 4, I used RNA sequencing of individuals of P. elegans from each cryptic species to assemble the first transcriptome for this taxon. Data mining of the transcriptome allowed me to identify microsatellite and single nucleotide polymorphism (SNP) markers to be used in future research. Analyses using the SNP dataset revealed evidence for 11 genes under natural selection between the two cryptic species; the genes that were identified may be influenced by spatial variation in sea surface temperature in the tropical eastern Pacific. Lastly, in Chapter 5, I provide guidelines for future studies that should be pursued to help elucidate patterns, mechanisms, and consequences of latitudinal gradients of temperature in the process of allopatric speciation. The phylogeographic and demographic reconstruction for P. elegans in this dissertation provide evidence of the role that temperature may play in population differentiation associated with speciation. The transcriptome analyses provided a large set of genetic markers and a list of candidate genes under selection, a crucial first step in the description of the genetic basis of local thermal adaptation in tropical regions. The information generated in this dissertation provides a novel empirical system that can help elucidate the evolution of tropical diversity and can be used to potentially predict the future impacts of climate change on tropical species
    • 

    corecore