Search CORE

47 research outputs found

NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

Author: Balhoff James P.
Caravas Jason A.
Holder Mark T.
Lapp Hilmar
Maddison Wayne P.
Midford Peter E.
Priyam Anurag
Stoltzfus Arlin
Sukumaran Jeet
Vos Rutger A.
Xia Xuhua
Publication venue: 'Oxford University Press (OUP)'
Publication date: 10/04/2014
Field of study

In scientific research, integration and synthesis require a common understanding of where data come from, how much they can be trusted, and what they may be used for. To make such an understanding computer-accessible requires standards for exchanging richly annotated data. The challenges of conveying reusable data are particularly acute in regard to evolutionary comparative analysis, which comprises an ever-expanding list of data types, methods, research aims, and subdisciplines. To facilitate interoperability in evolutionary comparative analysis, we present NeXML, an XML standard (inspired by the current standard, NEXUS) that supports exchange of richly annotated comparative data. NeXML defines syntax for operational taxonomic units, character-state matrices, and phylogenetic trees and networks. Documents can be validated unambiguously. Importantly, any data element can be annotated, to an arbitrary degree of richness, using a system that is both flexible and rigorous. We describe how the use of NeXML by the TreeBASE and Phenoscape projects satisfies user needs that cannot be satisfied with other available file formats. By relying on XML Schema Definition, the design of NeXML facilitates the development and deployment of software for processing, transforming, and querying documents. The adoption of NeXML for practical use is facilitated by the availability of (1) an online manual with code samples and a reference to all defined elements and attributes, (2) programming toolkits in most of the languages used commonly in evolutionary informatics, and (3) input–output support in several widely used software applications. An active, open, community-based development process enables future revision and expansion of NeXML.R.A.V. received support from the CIPRES project (NSF #EF-03314953 to W.P.M.), the FP7 Marie Curie Programme (Call FP7-PEOPLE-IEF-2008—Proposal No. 237046) and, for the NeXML implementation in TreeBASE, the pPOD project (NSF IIS 0629846); P.E.M. and J.S. received support from CIPRES (NSF #EF-0331495, #EF-0715370); M.T.H. was supported by NSF (DEB-ATOL-0732920); X.X. received support from NSERC (Canada) Discovery and RTI grants; W.P.M. received support from an NSERC (Canada) Discovery grant; J.C. received support from a Google Summer of Code 2007 grant; A.P. received support from a Google Summer of Code 2010 grant

KU ScholarWorks

NeXML: Rich, Extensible, and Verifiable Representation of Comparative Data and Metadata

Author: Adida
Anurag Priyam
Arlin Stoltzfus
Ashburner
Balhoff
Beaman
Beckett
Benson
Biron
Bisby
Brandes
Cardona
Connelly
Constable
Dahdul
Dahdul
Drummond
Fallside
Felsenstein
Felsenstein
Gkoutos
Gopalan
Han
Hilmar Lapp
Hladish
Hyam
James P. Balhoff
Jason A. Caravas
Jeet Sukumaran
Johnson
Jordan
Leary
Leebens-Mack
Lewis
Li
Maddison
Maddison
Maddison
Maddison
Mark T. Holder
Matthews
McEntire
Miller
Moore
Mungall
O'Leary
Page
Parks
Peter E. Midford
Piel
Prosdocimi
Rausher
Rice
Ronquist
Rutger A. Vos
Sanderson
Schmitt
Sidlauskas
Smits
Stoesser
Sukumaran
Swofford
Taylor
Than
Thompson
Wayne P. Maddison
Whelan
Whitlock
Xia
Xuhua Xia
Zmasek
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Crossref

KU ScholarWorks

PubMed Central

Carolina Digital Repository

From PPROM to caul: The evolution of membrane rupture in mammals

Author: Derek E. Wildman
Gregory Stempfle
Jason A. Caravas
Michael R. McGowen
Publication venue: 'Elsevier BV'
Publication date: 28/08/2013
Field of study

Rupture of the extraembryonic membranes that form the gestational sac in humans is a typical feature of human parturition. However, preterm premature rupture of membranes (PPROM) occurs in approximately 1% of pregnancies, and is a leading cause of preterm birth. Conversely, retention of an intact gestational sac during parturition in the form of a caul is a rare occurrence. Understanding the molecular and evolutionary underpinnings of these disparate phenotypes can provide insight into both normal pregnancy and PPROM. Using phylogenetic techniques we reconstructed the evolution of the gestational sac phenotype at parturition in 55 mammal species representing all major viviparous mammal groups. We infer the ancestral state in therians, eutherians, and primates, as in humans, is a ruptured gestational sac at parturition. We present evidence that intact membranes at parturition have evolved convergently in diverse mammals including horses, elephants, and bats. In order to gain insight into the molecular underpinnings of the evolution of enhanced membrane integrity we also used comparative genomics techniques to reconstruct the evolution of a subset of genes implicated in PPROM, and find that four genes (ADAMTS2, COL1A1, COL5A1, LEPRE1) show significant evidence of increased nonsynonymous rates of substitution on lineages with intact membranes as compared to those with ruptured membranes. Among these genes, we also discovered that 17 human SNPs are associated with or near amino acid replacement sites in those mammals with intact membranes. These SNPs are candidate functional variants within humans, which may play roles in both PPROM and/or the retention of the gestational sac at birth

Elsevier - Publisher Connector

Directory of Open Access Journals

PubMed Central

BIO::Phylo-phyloinformatic analysis using perl

Author: Caravas Jason
Hartmann Klaas
Jensen Mark A
Miller Chase
Vos Rutger A
Publication venue: BMC
Publication date: 01/01/2011
Field of study

Abstract Background Phyloinformatic analyses involve large amounts of data and metadata of complex structure. Collecting, processing, analyzing, visualizing and summarizing these data and metadata should be done in steps that can be automated and reproduced. This requires flexible, modular toolkits that can represent, manipulate and persist phylogenetic data and metadata as objects with programmable interfaces. Results This paper presents Bio::Phylo, a Perl5 toolkit for phyloinformatic analysis. It implements classes and methods that are compatible with the well-known BioPerl toolkit, but is independent from it (making it easy to install) and features a richer API and a data model that is better able to manage the complex relationships between different fundamental data and metadata objects in phylogenetics. It supports commonly used file formats for phylogenetic data including the novel NeXML standard, which allows rich annotations of phylogenetic data to be stored and shared. Bio::Phylo can interact with BioPerl, thereby giving access to the file formats that BioPerl supports. Many methods for data simulation, transformation and manipulation, the analysis of tree shape, and tree visualization are provided. Conclusions Bio::Phylo is composed of 59 richly documented Perl5 modules. It has been deployed successfully on a variety of computer architectures (including various Linux distributions, Mac OS X versions, Windows, Cygwin and UNIX-like systems). It is available as open source (GPL) software from <url>http://search.cpan.org/dist/Bio-Phylo</url></p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Digital Commons@Wayne State University

Genomic heterogeneity differentiates clinical and environmental subgroups of Legionella pneumophila sequence type 1.

Author: Brian H Raphael
Jason A Caravas
Jeffrey W Mercante
Jonas M Winchell
Maliha K Ishaq
Natalia A Kozak-Muiznieks
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Legionella spp. are the cause of a severe bacterial pneumonia known as Legionnaires' disease (LD). In some cases, current genetic subtyping methods cannot resolve LD outbreaks caused by common, potentially endemic L. pneumophila (Lp) sequence types (ST), which complicates laboratory investigations and environmental source attribution. In the United States (US), ST1 is the most prevalent clinical and environmental Lp sequence type. In order to characterize the ST1 population, we sequenced 289 outbreak and non-outbreak associated clinical and environmental ST1 and ST1-variant Lp strains from the US and, together with international isolate sequences, explored their genetic and geographic diversity. The ST1 population was highly conserved at the nucleotide level; 98% of core nucleotide positions were invariant and environmental isolates unassociated with human disease (n = 99) contained ~65% more nucleotide diversity compared to clinical-sporadic (n = 139) or outbreak-associated (n = 28) ST1 subgroups. The accessory pangenome of environmental isolates was also ~30-60% larger than other subgroups and was enriched for transposition and conjugative transfer-associated elements. Up to ~10% of US ST1 genetic variation could be explained by geographic origin, but considerable genetic conservation existed among strains isolated from geographically distant states and from different decades. These findings provide new insight into the ST1 population structure and establish a foundation for interpreting genetic relationships among ST1 strains; these data may also inform future analyses for improved outbreak investigations

Directory of Open Access Journals

The Francis Crick Institute

Primate 13-way alignments

Author: Boddy Amy M.
Caravas Jason A.
Harrison Peter W.
Montgomery Stephen H.
Mundy Nicholas I
Phillips Kimberley A.
Raghanti Mary Ann
Wildman Derek E
Publication venue
Publication date: 14/03/2017
Field of study

This dataset includes 3,130 13-way one-to-one orthologs from Cebus apella, published data for Homo sapiens, Pongo abelii, Papio anubis, Colobus angolensis, Saimiri boliviensis, Pan troglodytes, Gorilla gorilla, Nomascus leucogenys, Chlorocebus aethiops, Macaca mulatta, Callithrix jacchus, Saguinus midas. These alignments are unfiltered PRANK alignments. Dataset also includes fasta ortholog files, that include IDs

Dryad Digital Repository (Duke University)

Primate 6-way alignments

Author: Boddy Amy M.
Caravas Jason A.
Harrison Peter W.
Montgomery Stephen H.
Mundy Nicholas I
Phillips Kimberley A.
Raghanti Mary Ann
Wildman Derek E
Publication venue
Publication date: 14/03/2017
Field of study

This dataset includes 4,770 six-way one-to-one orthologs from Cebus apella, published data for Homo sapiens, Pongo abelii, Papio anubis, Colobus angolensis and Saimiri boliviensis. These alignments are unfiltered PRANK alignments. Dataset also includes fasta ortholog files, that include IDs

Dryad Digital Repository (Duke University)

Supplemental material for: Characterization of Legionella from watersheds in British Columbia, Canada

Author: Brian H. Raphael (3185922)
Fiona S.L. Brinkman (4051117)
Jason A. Caravas (4051123)
Jeffrey W. Mercante (3185916)
Michael A. Peabody (4051114)
Natalie A. Prystajecky (4051120)
Shatavia S. Morrison (161716)
Publication venue
Publication date
Field of study

Supplemental material for the paper: Characterization of Legionella from watersheds in British Columbia, Canada. Contains Tables S1-S3 and Figures S1-S3

The Francis Crick Institute

Cox2-alnFLYTREE

Alignment of dipteran mitochondrial COII (cox2) gene sequences from the FLYTREE project in fasta format

Dryad Digital Repository (Duke University)

The Francis Crick Institute