Search CORE

12 research outputs found

Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

Author: Allen Gabrielle D.
Berriman Bruce
Choi Sou-Cheng T.
Elster Anne C.
Hanwell Marcus D.
Hetherington James
Howison James
Katz Daniel S.
Lapp Hilmar
Löffler Frank
Maheshwari Ketan
Swenson Shel
Turk Matthew
Venters Colin
Wilkins-Diehr Nancy
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 12/06/2014
Field of study

Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists’ research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop. Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of “software papers”, and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Huddersfield Repository

An experimental study of Quartets MaxCut and other supertree methods

Author: A Ben-dor
A Dress
A Stamatakis
B Holland
B Rannala
BR Baum
C Randal Linder
CJ Creevey
D Chen
D Chen
D Thain
DL Swofford
H Bolaender
JG Burleigh
K Strimmer
KC Nixon
KS John
LR Foulds
M Bansal
M Shel Swenson
MA Ragan
MS Swenson
ORP Bininda-Emonds
ORP Bininda-Emonds
Rahul Suri
S Snir
S Snir
T Jiang
T Jiang
Tandy Warnow
V Ranwez
V Ranwez
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method. Results We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic. Conclusions Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks

A simulation study comparing supertree and combined analysis methods using SMIDGen

Abstract Background Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix. Results In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or <it>SMIDGen</it>) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa. Conclusions This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods.</p

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Texas ScholarWorks

SuperFineSource

Author: Linder C. Randal
Suri Rahul
Swenson M. Shel
Warnow Tandy
Publication venue
Publication date: 16/05/2011
Field of study

This zipped archive contains source code necessary for installing and running SuperFine

Dryad Digital Repository (Duke University)

Data from: SuperFine: fast and accurate supertree estimation

Author: Linder C. Randal
Suri Rahul
Swenson M. Shel
Warnow Tandy
Publication venue
Publication date: 16/05/2011
Field of study

Many research groups are estimating trees containing anywhere from a few thousand to hundreds of thousands of species, towards the eventual goal of the estimation of a Tree of Life, containing perhaps as many as several million leaves. These phylogenetic estimations present enormous computational challenges, and current computational methods are likely to fail to run even on datasets in the low end of this range. One approach to estimate a large species tree is to use phylogenetic estimation methods (such as maximum likelihood) on a supermatrix produced by concatenating multiple sequence alignments for a collection of markers; however, the most accurate of these phylogenetic estimation methods are extremely computationally intensive for datasets with more than a few thousand sequences. Supertree methods, which assemble phylogenetic trees from a collection of trees on subsets of the taxa, are important tools for phylogeny estimation where phylogenetic analyses based upon maximum likelihood are infeasible. In this paper, we introduce SuperFine, a meta-method that utilizes a novel two-step procedure in order to improve the accuracy and scalability of supertree methods. Our study, using both simulated and empirical data, shows that SuperFine-boosted supertree methods produce more accurate trees than standard supertree methods, and run quickly on very datasets with thousands of sequences. Furthermore, SuperFine-boosted MRP (Matrix Representation with Parsimony, the most well known supertree method) approaches the accuracy of maximum likelihood methods on supermatrix datasets under realistic conditions

ZENODO

Electronic Archiving System

SuperFine: Fast and Accurate Supertree Estimation

Author: Bansal
Baum
Baum
Beck
Bininda-Emonds
Bininda-Emonds
Bininda-Emonds
Burleigh
C. Randal Linder
Cardillo
Chen
Cotton
Creevey
Day
Foulds
Goloboff
Holland
Huson
Huson
Jiang
Kennedy
Liu
M. Shel Swenson
McMahon
Moret
Nixon
Pisani
Price
Ragan
Rahul Suri
Ranwez
Ranwez
Roshan
Roshan
Snir
Stamatakis
Steel
Sukumaran
Swenson
Swenson
Swenson
Swenson
Swenson
Swofford
Tandy Warnow
Wang
Warnow
Warnow
Wilkinson
Wilkinson
Wojciechowski
Zwickl
Publication venue: 'Oxford University Press (OUP)'
Publication date
Field of study

Crossref