Search CORE

11 research outputs found

TreeViewJ: An Application for Viewing and Analyzing Phylogenetic Trees

Author: Colosimo Marc E.
Peterson Matthew W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/10/2007
Field of study

BACKGROUND. Phylogenetic trees are widely used to visualize evolutionary relationships between different organisms or samples of the same organism. There exists a variety of both free and commercial tree visualization software available, but limitations in these programs often require researchers to use multiple programs for analysis, annotation, and the production of publication-ready images. RESULTS. We present TreeViewJ, a Java tool for visualizing, editing and analyzing phylogenetic trees. The software allows researchers to color and change the width of branches that they wish to highlight, and add names to nodes. If collection dates are available for taxa, the software can map them onto a timeline, and sort the tree in ascending or descending date order. CONCLUSION. TreeViewJ is a tool for researchers to visualize, edit, "decorate," and produce publication-ready images of phylogenetic trees. It is open-source, and released under an GPL license, and available at http://treeviewj.sourceforge.net

Boston University Institutional Repository (OpenBU)

Directory of Open Access Journals

PubMed Central

Data preparation and interannotator agreement: BioCreAtIvE Task 1B

Author: Colombe Jeffrey B
Colosimo Marc E
Hirschman Lynette
Morgan Alexander A
Yeh Alexander S
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

Abstract Background We prepared and evaluated training and test materials for an assessment of text mining methods in molecular biology. The goal of the assessment was to evaluate the ability of automated systems to generate a list of unique gene identifiers from PubMed abstracts for the three model organisms Fly, Mouse, and Yeast. This paper describes the preparation and evaluation of answer keys for training and testing. These consisted of lists of normalized gene names found in the abstracts, generated by adapting the gene list for the full journal articles found in the model organism databases. For the training dataset, the gene list was pruned automatically to remove gene names not found in the abstract; for the testing dataset, it was further refined by manual annotation by annotators provided with guidelines. A critical step in interpreting the results of an assessment is to evaluate the quality of the data preparation. We did this by careful assessment of interannotator agreement and the use of answer pooling of participant results to improve the quality of the final testing dataset. Results Interannotator analysis on a small dataset showed that our gene lists for Fly and Yeast were good (87% and 91% three-way agreement) but the Mouse gene list had many conflicts (mostly omissions), which resulted in errors (69% interannotator agreement). By comparing and pooling answers from the participant systems, we were able to add an additional check on the test data; this allowed us to find additional errors, especially in Mouse. This led to 1% change in the Yeast and Fly "gold standard" answer keys, but to an 8% change in the mouse answer key. Conclusion We found that clear annotation guidelines are important, along with careful interannotator experiments, to validate the generated gene lists. Also, abstracts alone are a poor resource for identifying genes in paper, containing only a fraction of genes mentioned in the full text (25% for Fly, 36% for Mouse). We found that there are intrinsic differences between the model organism databases related to the number of synonymous terms and also to curation criteria. Finally, we found that answer pooling was much faster and allowed us to identify more conflicting genes than interannotator analysis.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Nephele: genotyping via complete composition vectors and MapReduce

Author: A Drummond
A McKenna
A Rambaut
AL Ghindilis
AS De Groot
AS Fauci
B Budowle
BJ Frey
C Macken
C Notredame
C Ranger
CA Cummings
CB Do
D Janies
D Wang
E Gabriel
EC Holmes
G Giribet
G Lin
G Lu
HL Yang
IM Wallace
J Bullard
J Dean
JC Wilgenbusch
JD Retief
JD Thompson
KH Chu
KS Li
L Campitelli
L Gao
L Stuyver
Lynette Hirschman
M Colosimo
M Li
M Lindh
Marc E Colosimo
Matthew W Peterson
MC Schatz
MW Peterson
N Saitou
RC Edgar
RC Edgar
SA McEwen
Scott Mardis
SJ Matthews
T Hughes
TB Reddy
TZ DeSantis Jr
U Rost
V Brendel
X Wu
X Wu
XF Wan
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Current sequencing technology makes it practical to sequence many samples of a given organism, raising new challenges for the processing and interpretation of large genomics data sets with associated metadata. Traditional computational phylogenetic methods are ideal for studying the evolution of gene/protein families and using those to infer the evolution of an organism, but are less than ideal for the study of the whole organism mainly due to the presence of insertions/deletions/rearrangements. These methods provide the researcher with the ability to group a set of samples into distinct genotypic groups based on sequence similarity, which can then be associated with metadata, such as host information, pathogenicity, and time or location of occurrence. Genotyping is critical to understanding, at a genomic level, the origin and spread of infectious diseases. Increasingly, genotyping is coming into use for disease surveillance activities, as well as for microbial forensics. The classic genotyping approach has been based on phylogenetic analysis, starting with a multiple sequence alignment. Genotypes are then established by expert examination of phylogenetic trees. However, these traditional single-processor methods are suboptimal for rapidly growing sequence datasets being generated by next-generation DNA sequencing machines, because they increase in computational complexity quickly with the number of sequences. Results Nephele is a suite of tools that uses the complete composition vector algorithm to represent each sequence in the dataset as a vector derived from its constituent k-mers by passing the need for multiple sequence alignment, and affinity propagation clustering to group the sequences into genotypes based on a distance measure over the vectors. Our methods produce results that correlate well with expert-defined clades or genotypes, at a fraction of the computational cost of traditional phylogenetic methods run on traditional hardware. Nephele can use the open-source Hadoop implementation of MapReduce to parallelize execution using multiple compute nodes. We were able to generate a neighbour-joined tree of over 10,000 16S samples in less than 2 hours. Conclusions We conclude that using Nephele can substantially decrease the processing time required for generating genotype trees of tens to hundreds of organisms at genome scale sequence coverage.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

TreeViewJ: An application for viewing and analyzing phylogenetic trees-2

Author: Marc E Colosimo (73998)
Matthew W Peterson (83284)
Publication venue
Publication date
Field of study

Copyright information:Taken from "TreeViewJ: An application for viewing and analyzing phylogenetic trees"http://www.scfbm.org/content/2/1/7Source Code for Biology and Medicine 2007;2():7-7.Published online 31 Oct 2007PMCID:PMC2170439.tains multiple trees, are open. The currently displayed tree contains colored taxon names, and has had its taxa mapped to a timeline

The Francis Crick Institute

TreeViewJ: An application for viewing and analyzing phylogenetic trees-0

Author: Marc E Colosimo (73998)
Matthew W Peterson (83284)
Publication venue
Publication date
Field of study

The Francis Crick Institute

TreeViewJ: An application for viewing and analyzing phylogenetic trees-1

Author: Marc E Colosimo (73998)
Matthew W Peterson (83284)
Publication venue
Publication date
Field of study

Copyright information:Taken from "TreeViewJ: An application for viewing and analyzing phylogenetic trees"http://www.scfbm.org/content/2/1/7Source Code for Biology and Medicine 2007;2():7-7.Published online 31 Oct 2007PMCID:PMC2170439. the taxa alphabetically. The numbers above the internal nodes show the order that sorting is done in

The Francis Crick Institute

Graph showing the differences between the participant's original F-measure and their final F-measure

Author: Alexander A Morgan (34437)
Alexander S Yeh (73999)
Jeffrey B Colombe (74000)
Lynette Hirschman (29365)
Marc E Colosimo (73998)
Publication venue
Publication date
Field of study

Copyright information:Taken from "Data preparation and interannotator agreement: BioCreAtIvE Task 1B"BMC Bioinformatics 2005;6(Suppl 1):S12-S12.Published online 24 May 2005PMCID:PMC1869005.</p

The Francis Crick Institute

TreeViewJ: An application for viewing and analyzing phylogenetic trees

Author: A Drummond
CM Zmasek
DL Swofford
DR Maddison
Marc E Colosimo
Matthew W Peterson
RD Page
S Gu
T Hughes
WP Maddison Maddison
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref