1,441 research outputs found
Ï€BUSS:a parallel BEAST/BEAGLE utility for sequence simulation under complex evolutionary scenarios
Background: Simulated nucleotide or amino acid sequences are frequently used
to assess the performance of phylogenetic reconstruction methods. BEAST, a
Bayesian statistical framework that focuses on reconstructing time-calibrated
molecular evolutionary processes, supports a wide array of evolutionary models,
but lacked matching machinery for simulation of character evolution along
phylogenies.
Results: We present a flexible Monte Carlo simulation tool, called piBUSS,
that employs the BEAGLE high performance library for phylogenetic computations
within BEAST to rapidly generate large sequence alignments under complex
evolutionary models. piBUSS sports a user-friendly graphical user interface
(GUI) that allows combining a rich array of models across an arbitrary number
of partitions. A command-line interface mirrors the options available through
the GUI and facilitates scripting in large-scale simulation studies. Analogous
to BEAST model and analysis setup, more advanced simulation options are
supported through an extensible markup language (XML) specification, which in
addition to generating sequence output, also allows users to combine simulation
and analysis in a single BEAST run.
Conclusions: piBUSS offers a unique combination of flexibility and
ease-of-use for sequence simulation under realistic evolutionary scenarios.
Through different interfaces, piBUSS supports simulation studies ranging from
modest endeavors for illustrative purposes to complex and large-scale
assessments of evolutionary inference procedures. The software aims at
implementing new models and data types that are continuously being developed as
part of BEAST/BEAGLE.Comment: 13 pages, 2 figures, 1 tabl
Statistical approaches to viral phylodynamics
The recent years have witnessed a rapid increase in the quantity and quality of
genomic data collected from human and animal pathogens, viruses in particular.
When coupled with mathematical and statistical models, these data allow us to
combine evolutionary theory and epidemiology to understand pathogen dynamics.
While these developments led to important epidemiological questions being tackled,
it also exposed the need for improved analytical methods. In this thesis I employ
modern statistical techniques to address two pressing issues in phylodynamics: (i)
computational tools for Bayesian phylogenetics and (ii) data integration. I detail
the development and testing of new transition kernels for Markov Chain Monte
Carlo (MCMC) for time-calibrated phylogenetics in Chapter 2 and show that an
adaptive kernel leads to improved MCMC performance in terms of mixing for a
range of data sets, in particular for a challenging Ebola virus phylogeny with 1610
taxa/sequences. As a trade-off, I also found that the new adaptive kernels have
longer warm up times in general, suggesting room for improvement. Chapter 3
shows how to apply state-of-the-art techniques to visualise and analyse phylogenetic
space and MCMC for time-calibrated phylogenies, which are crucial to the viral
phylodynamics analysis pipeline. I describe a pipeline for a typical phylodynamic
analysis which includes convergence diagnostics for continuous parameters and in
phylogenetic space, extending existing methods to deal with large time-calibrated
phylogenies. In addition I investigate different representations of phylogenetic space
through multi-dimensional scaling (MDS) or univariate distributions of distances
to a focal tree and show that even for the simplest toy examples phylogenetic
space remains complex and in particular not all metrics lead to desirable or useful
representations. On the data integration front, Chapters 4 and 5 detail the use data
from the 2013-2016 Ebola virus disease (EVD) epidemic in West Africa to show how
one can combine phylogenetic and epidemiological data to tackle epidemiological
questions. I explore the determinants of the Ebola epidemic in Chapter 4 through a
generalised linear model framework coupled with Bayesian stochastic search variable
selection (BSSVS) to assess the relative importance climatic and socio-economic
variables on EVD number of cases. In Chapter 5 I tackle the question of whether
a particular glycoprotein mutation could lead to increased human mortality from
EVD. I show that a principled analysis of the available data that accounts for several
sources of uncertainty as well as shared ancestry between samples does not allow us
to ascertain the presence of such effect of a viral mutation on mortality. Chapter
6 attempts to bring the findings of the thesis together and discuss how the field of
phylodynamics, in special its methodological aspect, might move forward
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen)
Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis
Beyond the shortest path: the path length index as a distribution
The traditional complex network approach considers only the shortest paths
from one node to another, not taking into account several other possible paths.
This limitation is significant, for example, in urban mobility studies. In this
short report, as the first steps, we present an exhaustive approach to address
that problem and show we can go beyond the shortest path, but we do not need to
go so far: we present an interactive procedure and an early stop possibility.
After presenting some fundamental concepts in graph theory, we presented an
analytical solution for the problem of counting the number of possible paths
between two nodes in complete graphs, and a depth-limited approach to get all
possible paths between each pair of nodes in a general graph (an NP-hard
problem). We do not collapse the distribution of path lengths between a pair of
nodes into a scalar number, we look at the distribution itself - taking all
paths up to a pre-defined path length (considering a truncated distribution),
and show the impact of that approach on the most straightforward distance-based
graph index: the walk/path length
A survey of the marine birds in the route Rio de Janeiro: BahÃa (Brazil)
Marine birds were surveyed between Rio de Janeiro and Bahia, latitudes 24º44'S and 17º50'S, from July to September 1984. Sixteen species were recorded belonging to six families, with most sightings occurring between 24º44'S and 22º3$'S. Data suggest two distinct communities, the more southerly one is represented by Daption capense and the one further north by Puffinus gravis
SISVEST - banco de questões
Orientador : Kelly Rafaela OtemaierMonografia (graduação) - Universidade Federal do Paraná, Setor de Educação Profissional e Tecnológica, Curso de Tecnologia em Análise e Desenvolvimento de Sistemas.Inclui Bibliografi
- …