47 research outputs found
NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.R.A.V. received support from the CIPRES project (NSF #EF-03314953 to W.P.M.), the FP7 Marie Curie Programme (Call FP7-PEOPLE-IEF-2008—Proposal No. 237046) and, for the NeXML implementation in TreeBASE, the pPOD project (NSF IIS 0629846); P.E.M. and J.S. received support from CIPRES (NSF #EF-0331495, #EF-0715370); M.T.H. was supported by NSF (DEB-ATOL-0732920); X.X. received support from NSERC (Canada) Discovery and RTI grants; W.P.M. received support from an NSERC (Canada) Discovery grant; J.C. received support from a Google Summer of Code 2007 grant; A.P. received support from a Google Summer of Code 2010 grant
NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata
In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML
From PPROM to caul: The evolution of membrane rupture in mammals
Rupture of the extraembryonic membranes that form the gestational sac in humans is a typical feature of human parturition. However, preterm premature rupture of membranes (PPROM) occurs in approximately 1% of pregnancies, and is a leading cause of preterm birth. Conversely, retention of an intact gestational sac during parturition in the form of a caul is a rare occurrence. Understanding the molecular and evolutionary underpinnings of these disparate phenotypes can provide insight into both normal pregnancy and PPROM. Using phylogenetic techniques we reconstructed the evolution of the gestational sac phenotype at parturition in 55 mammal species representing all major viviparous mammal groups. We infer the ancestral state in therians, eutherians, and primates, as in humans, is a ruptured gestational sac at parturition. We present evidence that intact membranes at parturition have evolved convergently in diverse mammals including horses, elephants, and bats. In order to gain insight into the molecular underpinnings of the evolution of enhanced membrane integrity we also used comparative genomics techniques to reconstruct the evolution of a subset of genes implicated in PPROM, and find that four genes (ADAMTS2, COL1A1, COL5A1, LEPRE1) show significant evidence of increased nonsynonymous rates of substitution on lineages with intact membranes as compared to those with ruptured membranes. Among these genes, we also discovered that 17 human SNPs are associated with or near amino acid replacement sites in those mammals with intact membranes. These SNPs are candidate functional variants within humans, which may play roles in both PPROM and/or the retention of the gestational sac at birth
BIO::Phylo-phyloinformatic analysis using perl
<p>Abstract</p> <p>Background</p> <p>Phyloinformatic analyses involve large amounts of data and metadata of complex structure. Collecting, processing, analyzing, visualizing and summarizing these data and metadata should be done in steps that can be automated and reproduced. This requires flexible, modular toolkits that can represent, manipulate and persist phylogenetic data and metadata as objects with programmable interfaces.</p> <p>Results</p> <p>This paper presents Bio::Phylo, a Perl5 toolkit for phyloinformatic analysis. It implements classes and methods that are compatible with the well-known BioPerl toolkit, but is independent from it (making it easy to install) and features a richer API and a data model that is better able to manage the complex relationships between different fundamental data and metadata objects in phylogenetics. It supports commonly used file formats for phylogenetic data including the novel NeXML standard, which allows rich annotations of phylogenetic data to be stored and shared. Bio::Phylo can interact with BioPerl, thereby giving access to the file formats that BioPerl supports. Many methods for data simulation, transformation and manipulation, the analysis of tree shape, and tree visualization are provided.</p> <p>Conclusions</p> <p>Bio::Phylo is composed of 59 richly documented Perl5 modules. It has been deployed successfully on a variety of computer architectures (including various Linux distributions, Mac OS X versions, Windows, Cygwin and UNIX-like systems). It is available as open source (GPL) software from <url>http://search.cpan.org/dist/Bio-Phylo</url></p
Genomic heterogeneity differentiates clinical and environmental subgroups of Legionella pneumophila sequence type 1.
Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations
Primate 13-way alignments
This dataset includes 3,130 13-way one-to-one orthologs from Cebus apella, published data for Homo sapiens, Pongo abelii, Papio anubis, Colobus angolensis, Saimiri boliviensis, Pan troglodytes, Gorilla gorilla, Nomascus leucogenys, Chlorocebus aethiops, Macaca mulatta, Callithrix jacchus, Saguinus midas. These alignments are unfiltered PRANK alignments. Dataset also includes fasta ortholog files, that include IDs
Primate 6-way alignments
This dataset includes 4,770 six-way one-to-one orthologs from Cebus apella, published data for Homo sapiens, Pongo abelii, Papio anubis, Colobus angolensis and Saimiri boliviensis. These alignments are unfiltered PRANK alignments. Dataset also includes fasta ortholog files, that include IDs
Supplemental material for: Characterization of Legionella from watersheds in British Columbia, Canada
Supplemental material for the paper: Characterization of Legionella from watersheds in British Columbia, Canada. Contains Tables S1-S3 and Figures S1-S3
Cox2-alnFLYTREE
Alignment of dipteran mitochondrial COII (cox2) gene sequences from the FLYTREE project in fasta format