121 research outputs found
Adding Phylogenies to QGIS and Lifemapper for Evolutionary Studies of Species Diversity
Phylogenetic data from the “Tree of Life” have explicit spatial and temporal components when paired with species distribution and ecological data for testing contributions to biological community assembly at different geographic scales of species interaction. Important questions in biology about the degree of niche suitability and whether the history of a community’s assembly for an area can affect whether the species in a community are more or less phylogenetically related can be answered using several different spatially-filtered measures of phylogenetic diversity. Phylogenetic analyses which support the description of ecological processes are usually achieved in a handful of software libraries that are narrowly focused on a single set of tasks. Very few applications scale to large datasets and most do not have an explicit spatial component without relying on external visualization packages. This prompted us to explore bringing phylogenetic data into an open-source GIS environment. The Lifemapper Macroecology/Range & Diversity QGIS plug-in is a custom plug-in which we use to calculate and map biodiversity indices that describe range-diversity relationships derived from large multi-species datasets. We describe extensions to that plug-in which expand the Lifemapper set of ecological tools to link phylogenies to spatially derived ’diversity field’ statistics that describe the phylogenetic composition of natural communities
High performance bioinformatics and computational biology on general-purpose graphics processing units
Bioinformatics and Computational Biology (BCB) is a relatively new
multidisciplinary field which brings together many aspects of the fields of
biology, computer science, statistics, and engineering. Bioinformatics extracts
useful information from biological data and makes these more intuitive and
understandable by applying principles of information sciences, while
computational biology harnesses computational approaches and technologies
to answer biological questions conveniently. Recent years have seen an
explosion of the size of biological data at a rate which outpaces the rate of
increases in the computational power of mainstream computer technologies,
namely general purpose processors (GPPs). The aim of this thesis is to explore
the use of off-the-shelf Graphics Processing Unit (GPU) technology in the high
performance and efficient implementation of BCB applications in order to meet
the demands of biological data increases at affordable cost.
The thesis presents detailed design and implementations of GPU solutions for
a number of BCB algorithms in two widely used BCB applications, namely
biological sequence alignment and phylogenetic analysis. Biological sequence
alignment can be used to determine the potential information about a newly
discovered biological sequence from other well-known sequences through
similarity comparison. On the other hand, phylogenetic analysis is concerned
with the investigation of the evolution and relationships among organisms,
and has many uses in the fields of system biology and comparative genomics.
In molecular-based phylogenetic analysis, the relationship between species is
estimated by inferring the common history of their genes and then
phylogenetic trees are constructed to illustrate evolutionary relationships
among genes and organisms. However, both biological sequence alignment
and phylogenetic analysis are computationally expensive applications as their computing and memory requirements grow polynomially or even worse with
the size of sequence databases.
The thesis firstly presents a multi-threaded parallel design of the Smith-
Waterman (SW) algorithm alongside an implementation on NVIDIA GPUs. A
novel technique is put forward to solve the restriction on the length of the
query sequence in previous GPU-based implementations of the SW algorithm.
Based on this implementation, the difference between two main task
parallelization approaches (Inter-task and Intra-task parallelization) is
presented. The resulting GPU implementation matches the speed of existing
GPU implementations while providing more flexibility, i.e. flexible length of
sequences in real world applications. It also outperforms an equivalent GPPbased
implementation by 15x-20x. After this, the thesis presents the first
reported multi-threaded design and GPU implementation of the Gapped
BLAST with Two-Hit method algorithm, which is widely used for aligning
biological sequences heuristically. This achieved up to 3x speed-up
improvements compared to the most optimised GPP implementations.
The thesis then presents a multi-threaded design and GPU implementation of
a Neighbor-Joining (NJ)-based method for phylogenetic tree construction and
multiple sequence alignment (MSA). This achieves 8x-20x speed up compared
to an equivalent GPP implementation based on the widely used ClustalW
software. The NJ method however only gives one possible tree which strongly
depends on the evolutionary model used. A more advanced method uses
maximum likelihood (ML) for scoring phylogenies with Markov Chain Monte
Carlo (MCMC)-based Bayesian inference. The latter was the subject of another
multi-threaded design and GPU implementation presented in this thesis,
which achieved 4x-8x speed up compared to an equivalent GPP
implementation based on the widely used MrBayes software.
Finally, the thesis presents a general evaluation of the designs and
implementations achieved in this work as a step towards the evaluation of
GPU technology in BCB computing, in the context of other computer technologies including GPPs and Field Programmable Gate Arrays (FPGA)
technology
Evolutionary genomics : statistical and computational methods
This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward
- …