Search CORE

230 research outputs found

A comparison of common programming languages used in bioinformatics

Author: A Conesa
AB Clegg
D Butt
D Posada
EM Zdobnov
GPS Raghava
H Mangalam
L Prechelt
LJ McGuffin
Mathieu Fourment
Michael R Gillings
MK Kuhner
N Saitou
RA Irizarry
S Guindon
SF Altschul
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python. Results Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found. Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url> Conclusion This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

MaxAlign: maximizing usable data in an alignment

Author: A Bateman
A Pang
Anders G Pedersen
CJ Creevey
DL Swofford
DP Robinson
G Estabrook
H Li
JL Thorne
JL Thorne
K Katoh
MJ Bishop
MK Kuhner
Peter W Sackett
Rodrigo Gouveia-Oliveira
S Guindon
Z Yang
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Abstract Background The presence of gaps in an alignment of nucleotide or protein sequences is often an inconvenience for bioinformatical studies. In phylogenetic and other analyses, for instance, gapped columns are often discarded entirely from the alignment. Results MaxAlign is a program that optimizes the alignment prior to such analyses. Specifically, it maximizes the number of nucleotide (or amino acid) symbols that are present in gap-free columns – the alignment area – by selecting the optimal subset of sequences to exclude from the alignment. MaxAlign can be used prior to phylogenetic and bioinformatical analyses as well as in other situations where this form of alignment improvement is useful. In this work we test MaxAlign's performance in these tasks and compare the accuracy of phylogenetic estimates including and excluding gapped columns from the analysis, with and without processing with MaxAlign. In this paper we also introduce a new simple measure of tree similarity, Normalized Symmetric Similarity (NSS) that we consider useful for comparing tree topologies. Conclusion We demonstrate how MaxAlign is helpful in detecting misaligned or defective sequences without requiring manual inspection. We also show that it is not advisable to exclude gapped columns from phylogenetic analyses unless MaxAlign is used first. Finally, we find that the sequences removed by MaxAlign from an alignment tend to be those that would otherwise be associated with low phylogenetic accuracy, and that the presence of gaps in any given sequence does not seem to disturb the phylogenetic estimates of <it>other </it>sequences. The MaxAlign web-server is freely available online at http://www.cbs.dtu.dk/services/MaxAlign where supplementary information can also be found. The program is also freely available as a Perl stand-alone package.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Online Research Database In Technology

Evolutionary distances in the twilight zone -- a rational kernel approach

Author: A Keller
A Löytynoja
A Stamatakis
B Chor
B Schölkopf
Benjamin Merget
C Cortes
C Daskalakis
CB Do
E Rivas
F Bemm
Florian Markowetz
Frank Förster
G Talavera
HH Otu
I Ulitsky
J Felsenstein
J Friedrich
J Hein
JL Thorne
JL Thorne
Jörg Schultz
KM Wong
LS Wang
M Höhl
M Höhl
M Mohri
M Mohri
M Wolf
MA Buchheim
MA Suchard
Matthias Wolf
MJ Bishop
MK Kuhner
MS Waterman
N Goldman
N Higham
R Durbin
RC Edgar
RF Doolittle
Roland F. Schwarz
S Roch
S Whelan
SR Eddy
T Mailund
T Müller
TH Ogden
V Levenshtein
W Fletcher
W Fletcher
Wayne Delport
William Fletcher
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/11/2010
Field of study

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly. Alignment-free methods, on the other hand, are based on abstract string comparisons and avoid potential alignment problems. However, in general they are not biologically motivated and ignore our knowledge about the evolution of sequences. Thus, it is still a major open question how to define an evolutionary distance metric between divergent sequences that makes use of indel information and known substitution models without the need for a multiple alignment. Here we propose a new evolutionary distance metric to close this gap. It uses finite-state transducers to create a biologically motivated similarity score which models substitutions and indels, and does not depend on a multiple sequence alignment. The sequence similarity score is defined in analogy to pairwise alignments and additionally has the positive semi-definite property. We describe its derivation and show in simulation studies and real-world examples that it is more accurate in reconstructing phylogenies than competing methods. The result is a new and accurate way of determining evolutionary distances in and beyond the twilight zone of sequence alignments that is suitable for large datasets.Comment: to appear in PLoS ON

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

MDC Repository

Mitochondrial phylogeography and demographic history of the Vicuña: implications for conservation

Author: A Prieto
A Spotorno
AN Menegaz
AR Templeton
BJ MacFadden
C Ammann
C Kull
C S Casey
CB Koford
CM Clapperton
D Hoces
D Posada
DL Swofford
F Tajima
GA Watterson
GI Molina
GS Miller Jr
H Jungius
H Schaschl
HF Stanley
HG Nami
IR Grimwood
J C Marín
J C Wheeler
J Olazabal
J Rodriguez
J Rozas
J Thornback
JA Harrison
JC Wheeler
JC Wheeler
JC Wheeler
JC Wheeler
JD Clayton
K Yaya
KA Tolley
L Excoffier
L Excoffier
M Clement
M Kadwell
M Kadwell
M Ubilla
M W Bruford
MC Norambuena
MK Kuhner
ML Maté
MW Bruford
N Ray
O Thomas
OS Paulo
R Hoffstetter
R Rosadio
RE Palma
RJ Sarno
S Schneider
SD Webb
T Tserenbataa
TD Dillehay
WP Madison
YX Fu
Z Lin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

The vicuña (Vicugna vicugna; Miller, 1924) is a conservation success story, having recovered from near extinction in the 1960s to current population levels estimated at 275 000. However, lack of information about its demographic history and genetic diversity has limited both our understanding of its recovery and the development of science-based conservation measures. To examine the evolution and recent demographic history of the vicuña across its current range and to assess its genetic variation and population structure, we sequenced mitochondrial DNA from the control region (CR) for 261 individuals from 29 populations across Peru, Chile and Argentina. Our results suggest that populations currently designated as Vicugna vicugna vicugna and Vicugna vicugna mensalis comprise separate mitochondrial lineages. The current population distribution appears to be the result of a recent demographic expansion associated with the last major glacial event of the Pleistocene in the northern (18 to 22°S) dry Andes 14–12 000 years ago and the establishment of an extremely arid belt known as the 'Dry Diagonal' to 29°S. Within the Dry Diagonal, small populations of V. v. vicugna appear to have survived showing the genetic signature of demographic isolation, whereas to the north V. v. mensalis populations underwent a rapid demographic expansion before recent anthropogenic impacts

University of Lincoln Institutional Repository

CiteSeerX

Crossref

Online Research @ Cardiff

A new, fast algorithm for detecting protein coevolution using maximum compatible cliques

Author: A Rodionov
A Valencia
AK Ramani
Alex Rodionov
Alexandr Bezginov
AM Altenhoff
D MacLeod
D Robinson
Elisabeth RM Tillier
ERM Tillier
ERM Tillier
F Pazos
F Pazos
GW Clark
J Felsenstein
J Felsenstein
Jonathan Rose
K Katoh
MK Kuhner
PRJ Östergård
R Jothi
RG Beiko
RM Karp
S Razick
T Sato
V Soria-Carrasco
W Li
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The MatrixMatchMaker algorithm was recently introduced to detect the similarity between phylogenetic trees and thus the coevolution between proteins. MMM finds the largest common submatrices between pairs of phylogenetic distance matrices, and has numerous advantages over existing methods of coevolution detection. However, these advantages came at the cost of a very long execution time. Results In this paper, we show that the problem of finding the maximum submatrix reduces to a multiple maximum clique subproblem on a graph of protein pairs. This allowed us to develop a new algorithm and program implementation, MMMvII, which achieved more than 600× speedup with comparable accuracy to the original MMM. Conclusions MMMvII will thus allow for more more extensive and intricate analyses of coevolution. Availability An implementation of the MMMvII algorithm is available at: <url>http://www.uhnresearch.ca/labs/tillier/MMMWEBvII/MMMWEBvII.php</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genetic Differentiation of the Western Capercaillie Highlights the Importance of South-Eastern Europe for Understanding the Species Phylogeography

Author: AR Rogers
AR Rogers
B Blanco-Fontao
B Frenzel
B Huntley
B Krystufek
B Milá
BK Sandercock
D Gačić
D Magri
D Posada
D Radović
D Raguž
Dalibor Ballian
DM Lambert
DR Denver
E de Juana
E Heyer
E Randi
F Manni
F Tajima
F Tajima
G Segelbacher
G Segelbacher
G Segelbacher
G Zubić
GA Watterson
Goran Zubić
HC Harpending
HJ Bandelt
Hojka Kraigher
I Anić
I Storch
I Storch
I Storch
IK Petrov
J Felsenstein
J Felsenstein
J Reynolds
J Rolstad
JC Avise
K Bollmann
K Eiberle
KD Bennett
KJ Willis
KJ Willis
L Excoffier
L Excoffier
L Kutnar
Ladislav Paule
M Adamič
M Adamič
M González
M Nei
M Quevedo
M Slatkin
M Čas
M Čas
M Čas
M Čas
M Čas
M Čas
MA Larkin
Marijan Grubešić
Marko Bajc
MB Horváth
Michael Hofreiter
Miran Čas
MK Kuhner
MK Kuhner
MK Kuhner
MS Monmonier
O Duriez
P Angelstam
P Berthold
P Sümegi
P Taberlet
Petar Zhelev
PH Brito
PW Hedrick
R Chakraborty
R Rodríguez-Muñoz
S Guindon
S Klaus
S Sachot
S Schneider
Saša Kunovac
SD Matvejev
SD Matvejev
SJB Cooper
SV Drovetski
SV Drovetski
T Liukkonen-Anttila
T Polzin
T Price
Tine Grebenc
V Deffontaine
V Grimm
V Lucchini
W Babik
W Suter
YX Fu
Z Boev
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

The Western Capercaillie (Tetrao urogallus L.) is a grouse species of open boreal or high altitude forests of Eurasia. It is endangered throughout most mountain range habitat areas in Europe. Two major genetically identifiable lineages of Western Capercaillie have been described to date: the southern lineage at the species' southernmost range of distribution in Europe, and the boreal lineage. We address the question of genetic differentiation of capercaillie populations from the Rhodope and Rila Mountains in Bulgaria, across the Dinaric Mountains to the Slovenian Alps. The two lineages' contact zone and resulting conservation strategies in this so-far understudied area of distribution have not been previously determined. The results of analysis of mitochondrial DNA control region sequences of 319 samples from the studied populations show that Alpine populations were composed exclusively of boreal lineage; Dinaric populations of both, but predominantly (96%) of boreal lineage; and Rhodope-Rila populations predominantly (>90%) of southern lineage individuals. The Bulgarian mountains were identified as the core area of the southern lineage, and the Dinaric Mountains as the western contact zone between both lineages in the Balkans. Bulgarian populations appeared genetically distinct from Alpine and Dinaric populations and exhibited characteristics of a long-term stationary population, suggesting that they should be considered as a glacial relict and probably a distinct subspecies. Although all of the studied populations suffered a decline in the past, the significantly lower level of genetic diversity when compared with the neighbouring Alpine and Bulgarian populations suggests that the isolated Dinaric capercaillie is particularly vulnerable to continuing population decline. The results are discussed in the context of conservation of the species in the Balkans, its principal threats and legal protection status. Potential conservation strategies should consider the existence of the two lineages and their vulnerable Dinaric contact zone and support the specificities of the populations

Crossref

Directory of Open Access Journals

PubMed Central

SciVie

Digital repository of Slovenian research organizations

Controlling Population Evolution in the Laboratory to Evaluate Methods of Historical Inference

Author: A Graustein
AD Cutter
BS Weir
DE Pearse
DM Hillis
EG Williamson-Natesan
ES Dolgin
GA Wilson
IM Caldicott
J Hey
J Sambrook
J Wang
J-M Cornuet
JC Garza
K Kiontke
L Excoffier
L Excoffier
M Nei
M Raymond
M Slatkin
M Woodworth
MA Beaumont
MA Beaumont
Marie-Anne Vaesen
Matthew W. Hahn
MC Whitlock
Michel C. Milinkovitch
MK Kuhner
MW Hahn
P Beerli
P Faubet
Patrick Mardulyn
R Frankham
S Piry
S Wright
SE Baird
TC Glenn
Z Abdo
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Natural populations of known detailed past demographic history are extremely valuable to evaluate methods of historical inference, yet are extremely rare. As an alternative approach, we have generated multiple replicate microsatellite data sets from laboratory-cultured populations of a gonochoric free-living nematode, Caenorhabditis remanei, that were constrained to pre-defined demographic histories featuring different levels of migration among populations or bottleneck events of different magnitudes. These data sets were then used to evaluate the performances of two recently developed population genetics methods, BayesAss+, that estimates recent migration rates among populations, and Bottleneck, that detects the occurrence of recent bottlenecks. Migration rates inferred by BayesAss+ were generally over-estimates, although these were often included within the confidence interval. Analyses of data sets simulated in-silico, using a model mimicking the laboratory experiments, produced less biased estimates of the migration rates, and showed increased efficiency of the program when the number of loci and sampled genotypes per population was higher. In the replicates for which the pre-bottleneck laboratory-cultured populations did not significantly depart from a mutation/drift equilibrium, an important assumption of the program Bottleneck, only a portion of the bottleneck events were detected. This result was confirmed by in-silico simulations mirroring the laboratory bottleneck experiments. More generally, our study demonstrates the feasibility, and highlights some of the limits, of the approach that consists in generating molecular genetic data sets by controlling the evolution of laboratory-reared nematode populations, for the purpose of validating methods inferring population history

Crossref

Directory of Open Access Journals

PubMed Central

DI-fusion

New Insights into the Lake Chad Basin Population Structure Revealed by High-Throughput Genotyping of Mitochondrial DNA Coding SNPs

Author: A Achilli
A Brandstätter
A Mosquera-Miguel
A Olivieri
A Salas
A Salas
A Salas
A Salas
A Salas
A Torroni
Antonio Salas
B Quintáns
C Batello
C Herrnstadt
D Posada
DM Behar
JC Rando
JL Elson
L Excoffier
L Pereira
L Quintana-Murci
Lluis Quintana-Murci
M Cerezo
M Coble
M Pala
M Richards
M Tanaka
M van Oven
María Cerezo
MD Coble
MK Gonder
MK Kuhner
P Librado
P Soares
RE Bereir
RM Andrews
S Beleza
S Finnilä
S Kropelin
S Plaza
SA Tishkoff
T Güldemann
T Kivisild
T Kivisild
UA Perego
V Álvarez-Iglesias
V Álvarez-Iglesias
V Álvarez-Iglesias
V Černý
V Černý
V Černý
V Černý
Viktor Černý
Ángel Carracedo
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: Located in the Sudan belt, the Chad Basin forms a remarkable ecosystem, where several unique agricultural and pastoral techniques have been developed. Both from an archaeological and a genetic point of view, this region has been interpreted to be the center of a bidirectional corridor connecting West and East Africa, as well as a meeting point for populations coming from North Africa through the Saharan desert. METHODOLOGY/PRINCIPAL FINDINGS: Samples from twelve ethnic groups from the Chad Basin (n = 542) have been high-throughput genotyped for 230 coding region mitochondrial DNA (mtDNA) Single Nucleotide Polymorphisms (mtSNPs) using Matrix-Assisted Laser Desorption/Ionization Time-Of-Flight (MALDI-TOF) mass spectrometry. This set of mtSNPs allowed for much better phylogenetic resolution than previous studies of this geographic region, enabling new insights into its population history. Notable haplogroup (hg) heterogeneity has been observed in the Chad Basin mirroring the different demographic histories of these ethnic groups. As estimated using a Bayesian framework, nomadic populations showed negative growth which was not always correlated to their estimated effective population sizes. Nomads also showed lower diversity values than sedentary groups. CONCLUSIONS/SIGNIFICANCE: Compared to sedentary population, nomads showed signals of stronger genetic drift occurring in their ancestral populations. These populations, however, retained more haplotype diversity in their hypervariable segments I (HVS-I), but not their mtSNPs, suggesting a more ancestral ethnogenesis. Whereas the nomadic population showed a higher Mediterranean influence signaled mainly by sub-lineages of M1, R0, U6, and U5, the other populations showed a more consistent sub-Saharan pattern. Although lifestyle may have an influence on diversity patterns and hg composition, analysis of molecular variance has not identified these differences. The present study indicates that analysis of mtSNPs at high resolution could be a fast and extensive approach for screening variation in population studies where labor-intensive techniques such as entire genome sequencing remain unfeasible

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Repositorio Institucional da Universidade de Santiago de Compostela

Genotyping of Bacillus cereus Strains by Microarray-Based Resequencing

Author: A Ciammaruconi
A Sorokin
Alfred Mateczun
Andrew C. Stewart
AR Hoffmaster
B Candelon
B Ewing
B Ewing
B Lin
BE Dutilh
D Daffonchio
DA Rasko
DA Rasko
DJ Cutler
E Helgason
E Helgason
FG Priest
GB Jensen
J Dagerhamn
J Felsenstein
J Shendure
JD Thompson
KS Ko
M Doran
M Ehling-Schulz
Maureen P. Kiley
ME Zwick
Michael E. Zwick
MK Kuhner
MN Van Ert
MN Van Ert
N Berthet
NJ Tourasse
P Keim
Rosemary Jeanne Redfield
SR Klee
TD Read
TD Read
Timothy D. Read
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

The ability to distinguish microbial pathogens from closely related but nonpathogenic strains is key to understanding the population biology of these organisms. In this regard, Bacillus anthracis, the bacterium that causes inhalational anthrax, is of interest because it is closely related and often difficult to distinguish from other members of the B. cereus group that can cause diverse diseases. We employed custom-designed resequencing arrays (RAs) based on the genome sequence of Bacillus anthracis to generate 422 kb of genomic sequence from a panel of 41 Bacillus cereus sensu lato strains. Here we show that RAs represent a “one reaction” genotyping technology with the ability to discriminate between highly similar B. anthracis isolates and more divergent strains of the B. cereus s.l. Clade 1. Our data show that RAs can be an efficient genotyping technology for pre-screening the genetic diversity of large strain collections to selected the best candidates for whole genome sequencing

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central