Search CORE

1,629 research outputs found

Synthesis of phylogeny and taxonomy into a comprehensive tree of life

Author: Allman James
Chaudhary Ruchi
Coghill Lyndon
Crandall Keith
Cranston Karen
Deng Jiabin
Drew Bryan
Emily Jane McTavish
Gazis Romina
Gude Karl
H. Dail Laughinghouse
Hibbett David
Hinchliff Cody
IV
J. Gordon Burleigh
Katz Laura
Midford Peter
Owen Christopher
Ree Richard
Rees Jonathan
Smith Stephen
Soltis Douglas
Williams Tiffani
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 01/01/2015
Field of study

An approximate search engine for structure

Author: Shan Huiyuan
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2004
Field of study

As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label and the left-to-right order among siblings matters. Unordered labeled trees are trees in which the parent-child relationship is significant, but the order among siblings is unimportant. Free trees (unrooted unordered trees) are acyclic graphs. These trees find many applications in bioinformatics, Web log analysis, phyloinformatics, XML processing, etc. Two types of similarity measures are investigated: (i) counting the mismatching paths in the query tree and a data tree, and (ii) measuring the topological relationship between the trees. The proposed approaches include storing the paths of trees in a suffix array, employing hashing techniques to speed up retrieval, and counting the number of up-down operations to move a token from one node to another node in a tree. Various filters for accelerating a search, different strategies for parallelizing these search algorithms and applications of these algorithms to XML and phylogenetic data management are discussed. The proposed techniques have been implemented into a phylogenetic search engine which is fully operational and is available on the World Wide Web. Experimental results on comparing the similarity measures with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate the effectiveness of the search engine. Future work includes extending the techniques to other structural data, as well as developing new filters and algorithms for speeding up searching and mining in complex structures

Digital Commons @ New Jersey Institute of Technology (NJIT)

Creation, evaluation, and use of PSI, a program for identifying protein-phenotype relationships and comparing protein content in groups of organisms

Author: Trost Brett
Publication venue: 'University of Saskatchewan Library'
Publication date: 01/01/2009
Field of study

Recent advances in DNA sequencing technology have enabled entire genomes to be sequenced quickly and accurately, resulting in an exponential increase in the number of organisms whose genome sequences have been elucidated. While the genome sequence of a given organism represents an important starting point in understanding its physiology, the functions of the protein products of many genes are still unknown; as such, computational methods for studying protein function are becoming increasingly important. In addition, this wealth of genomic information has created an unprecedented opportunity to compare the protein content of different organisms; among other applications, this can enable us to improve taxonomic classifications, to develop more accurate diagnostic tests for identifying particular bacteria, and to better understand protein content relationships in both closely-related and distantly-related organisms. This thesis describes the design, evaluation, and use of a program called Proteome Subtraction and Intersection (PSI) that uses an idea called genome subtraction for discovering protein-phenotype relationships and for characterizing differences in protein content in groups of organisms. PSI takes as input a set of proteomes, as well as a partitioning of that set into a subset of "included" proteomes and a subset of "excluded" proteomes. Using reciprocal BLAST hits, PSI finds orthologous relationships among all the proteins in the proteomes from the original set, and then finds groups of orthologous proteins containing at least one orthologue from each of the proteomes in the "included" subset, and none from any of the proteomes in the "excluded" subset. PSI is first applied to finding protein-phenotype relationships. By identifying proteins that are present in all sequenced isolates of the genus Lactobacillus, but not in the related bacterium Pediococcus pentosaceus, proteins are discovered that are likely to be responsible for the difference in cell shape between the lactobacilli and P. pentosaceus. In addition, proteins are identified that may be responsible for resistance to the antibiotic gatifloxacin in some lactic acid bacteria. This thesis also explores the use of PSI for comparing protein content in groups of organisms. Based on the idea of genome subtraction, a novel metric is proposed for comparing the difference in protein content between two organisms. This metric is then used to create a phylogenetic tree for a large set of bacteria, which to the author's knowledge represents the largest phylogenetic tree created to date using protein content. In addition, PSI is used to find the proteomic cohesiveness of isolates of several bacterial species in order to support or refute their current taxonomic classifications. Overall, PSI is a versatile tool with many interesting applications, and should become more and more valuable as additional genomic information becomes available

eCommons@USASK

University of Saskatchewan Research Archive

Computational Methods for Identifying Conserved Protein Complexes between Species from Protein Interaction Data

Author: NGUYEN PHI VU
Publication venue
Publication date: 28/08/2013
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

A Network Synthesis Model for Generating Protein Interaction Network Families

Author: Fraternali Franca
Sahraeian Sayed Mohammad Ebrahim
Yoon Byung-Jun
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2012
Field of study

Texas A&M Repository

NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

Author: Adida
Anurag Priyam
Arlin Stoltzfus
Ashburner
Balhoff
Beaman
Beckett
Benson
Biron
Bisby
Brandes
Cardona
Connelly
Constable
Dahdul
Dahdul
Drummond
Fallside
Felsenstein
Felsenstein
Gkoutos
Gopalan
Han
Hilmar Lapp
Hladish
Hyam
James P. Balhoff
Jason A. Caravas
Jeet Sukumaran
Johnson
Jordan
Leary
Leebens-Mack
Lewis
Li
Maddison
Maddison
Maddison
Maddison
Mark T. Holder
Matthews
McEntire
Miller
Moore
Mungall
O'Leary
Page
Parks
Peter E. Midford
Piel
Prosdocimi
Rausher
Rice
Ronquist
Rutger A. Vos
Sanderson
Schmitt
Sidlauskas
Smits
Stoesser
Sukumaran
Swofford
Taylor
Than
Thompson
Wayne P. Maddison
Whelan
Whitlock
Xia
Xuhua Xia
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML

Crossref

KU ScholarWorks

PubMed Central

Carolina Digital Repository

NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

Author: Balhoff James P.
Caravas Jason A.
Holder Mark T.
Lapp Hilmar
Maddison Wayne P.
Midford Peter E.
Priyam Anurag
Stoltzfus Arlin
Sukumaran Jeet
Vos Rutger A.
Xia Xuhua
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/04/2014
Field of study

KU ScholarWorks

Computational Methods for Comparative Non-coding RNA Analysis: from Secondary Structures to Tertiary Structures

Author: Ge Ping
Publication venue: University of Central Florida
Publication date: 01/01/2016
Field of study

Unlike message RNAs (mRNAs) whose information is encoded in the primary sequences, the cellular roles of non-coding RNAs (ncRNAs) originate from the structures. Therefore studying the structural conservation in ncRNAs is important to yield an in-depth understanding of their functionalities. In the past years, many computational methods have been proposed to analyze the common structural patterns in ncRNAs using comparative methods. However, the RNA structural comparison is not a trivial task, and the existing approaches still have numerous issues in efficiency and accuracy. In this dissertation, we will introduce a suite of novel computational tools that extend the classic models for ncRNA secondary and tertiary structure comparisons. For RNA secondary structure analysis, we first developed a computational tool, named PhyloRNAalifold, to integrate the phylogenetic information into the consensus structural folding. The underlying idea of this algorithm is that the importance of a co-varying mutation should be determined by its position on the phylogenetic tree. By assigning high scores to the critical covariances, the prediction of RNA secondary structure can be more accurate. Besides structure prediction, we also developed a computational tool, named ProbeAlign, to improve the efficiency of genome-wide ncRNA screening by using high-throughput RNA structural probing data. It treats the chemical reactivities embedded in the probing information as pairing attributes of the searching targets. This approach can avoid the time-consuming base pair matching in the secondary structure alignment. The application of ProbeAlign to the FragSeq datasets shows its capability of genome-wide ncRNAs analysis. For RNA tertiary structure analysis, we first developed a computational tool, named STAR3D, to find the global conservation in RNA 3D structures. STAR3D aims at finding the consensus of stacks by using 2D topology and 3D geometry together. Then, the loop regions can be ordered and aligned according to their relative positions in the consensus. This stack-guided alignment method adopts the divide-and-conquer strategy into RNA 3D structural alignment, which has improved its efficiency dramatically. Furthermore, we also have clustered all loop regions in non-redundant RNA 3D structures to de novo detect plausible RNA structural motifs. The computational pipeline, named RNAMSC, was extended to handle large-scale PDB datasets, and solid downstream analysis was performed to ensure the clustering results are valid and easily to be applied to further research. The final results contain many interesting variations of known motifs, such as GNAA tetraloop, kink-turn, sarcin-ricin and t-loops. We also discovered novel functional motifs that conserved in a wide range of ncRNAs, including ribosomal RNA, sgRNA, SRP RNA, GlmS riboswitch and twister ribozyme

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

A genome phylogeny for mitochondria among alpha-proteobacteria and a predominantly eubacterial ancestry of yeast nuclear genes

Author: Ahmadinejad Nahal
Bryant David
Esser Christian
Gelius-Dietrich Gabriel
Henze Katrin
Kretschmann Ernst
Leister Dario
Lockhart Peter J.
Martin William
Penny David
Richly Erik
Rotte Carmen
Sebastiani Federico
Steel Michael A.
Wiegand Christian
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2004
Field of study

Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana and Marchantia polymorpha mitochondrial genomes revealed that current methods for constructing phylogenetic trees are insufficiently sensitive (or artifact-insensitive) to ascertain the sister of mitochondria among the current sample of eight alpha-proteobacterial genomes using mitochondrially-encoded proteins. However, Rhodospirillum rubrum came as close to mitochondria as any alpha-proteobacterium investigated. This prompted a search for methods to directly compare eukaryotic genomes to their prokaryotic counterparts to investigate the origin of the mitochondrion and its host from the standpoint of nuclear genes. We examined pairwise amino acid sequence identity in comparisons of 6,214 nuclear protein-coding genes from Saccharomyces cerevisiae to 177,117 proteins encoded in sequenced genomes from 45 eubacteria and 15 archaebacteria. The results reveal that approximately 75% of yeast genes having homologues among the present prokaryotic sample share greater amino acid sequence identity to eubacterial than to archaebacterial homologues. At high stringency comparisons, only the eubacterial component of the yeast genome is detectable. Our findings indicate that at the levels of overall amino acid sequence identity and gene content, yeast shares a sister-group relationship with eubacteria, not with archaebacteria, in contrast to the current phylogenetic paradigm based on ribosomal RNA. Among eubacteria and archaebacteria, proteobacterial and methanogen genomes, respectively, shared more similarity with the yeast genome than other prokaryotic genomes surveyed

Crossref

Open Access LMU

MPG.PuRe