Search CORE

11 research outputs found

A Bayesian Model for Gene Family Evolution

Author: Kalavacharla Venugopal
Liu Liang
Liu Zhanji
Yu Lili
Publication venue: Digital Commons@Georgia Southern
Publication date: 01/11/2011
Field of study

Background A birth and death process is frequently used for modeling the size of a gene family that may vary along the branches of a phylogenetic tree. Under the birth and death model, maximum likelihood methods have been developed to estimate the birth and death rate and the sizes of ancient gene families (numbers of gene copies at the internodes of the phylogenetic tree). This paper aims to provide a Bayesian approach for estimating parameters in the birth and death model. Results We develop a Bayesian approach for estimating the birth and death rate and other parameters in the birth and death model. In addition, a Bayesian hypothesis test is developed to identify the gene families that are unlikely under the birth and death process. Simulation results suggest that the Bayesian estimate is more accurate than the maximum likelihood estimate of the birth and death rate. The Bayesian approach was applied to a real dataset of 3517 gene families across genomes of five yeast species. The results indicate that the Bayesian model assuming a constant birth and death rate among branches of the phylogenetic tree cannot adequately explain the observed pattern of the sizes of gene families across species. The yeast dataset was thus analyzed with a Bayesian heterogeneous rate model that allows the birth and death rate to vary among the branches of the tree. The unlikely gene families identified by the Bayesian heterogeneous rate model are different from those given by the maximum likelihood method. Conclusions Compared to the maximum likelihood method, the Bayesian approach can produce more accurate estimates of the parameters in the birth and death model. In addition, the Bayesian hypothesis test is able to identify unlikely gene families based on Bayesian posterior p-values. As a powerful statistical technique, the Bayesian approach can effectively extract information from gene family data and thereby provide useful information regarding the evolutionary process of gene families across genomes

Crossref

Springer - Publisher Connector

PubMed Central

Georgia Southern University: Digital Commons@Georgia Southern

A Bayesian model for gene family evolution

Author: A Fortna
A Gelman
AS Novozhilov
CJ Basten
DL Wallace
GP Karev
H Jeffreys
I Yanai
J Bernardo
JP Demuth
JP Demuth
L Arvestad
Liang Liu
Lili Yu
M Csuros
M Csuros
M Lynch
M Nei
M Ridley
MA Huynen
MK Cowles
MN Barber
MW Hahn
MW Hahn
MW Hahn
MW Hahn
N Metropolis
O Cohen
O Cohen
O Cohen
RD Page
RS Holmes
SL Lauritzen
T De Bie
T Ohta
T Ohta
T Ohta
VE Johnson
Venugopal Kalavacharla
W Feller
W Iwasaki
WK Hastings
X Meng
Zhanji Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

BadiRate Software

Author: Librado Sanz Pablo
Rozas Liras Julio A.
Vieira Filipe G.
Publication venue
Publication date: 02/11/2011
Field of study

Podeu consultar l'article relacionat a: http://hdl.handle.net/2445/53350Podeu consultar la pàgina de desenvolupament del programari: http://www.ub.edu/softevol/badirate/BadiRate is a software package to estimate, within a phylogenetic context and by likelihood-based methods (or parsimony): (1) the gain, birth, death and innovation family turnover rates (2) most likely number of family members in internal nodes (3) families with turnover rates higher or lower than that estimated for the whole dat

Diposit Digital de la Universitat de Barcelona

DiscML: an R package for estimating evolutionary rates of discrete characters using maximum likelihood

Author: B Boussau
B Wu
DM Gay
DS Hibbett
E Paradis
HC Wang
J Felsenstein
J Felsenstein
JA Nelder
L Liu
M Csűrös
M Csűrös
M Pagel
M Spencer
MV Han
MW Hahn
MW van Passel
O Cohen
O Cohen
P Librado
P Lopez
PJ Waddell
PO Lewis
RJ Schmitz
Tane Kim
VB Yap
W Hao
W Hao
W Hao
W Hao
Weilong Hao
WP Maddison
Z Yang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

BadiRate: estimating family turnover rates by likelihood-based methods

Author: Cohen
Csuros
Csuros
De Bie
F. G. Vieira
Hahn
Hahn
J. Rozas
Johnson
Liu
Nei
Nozawa
Ohno
P. Librado
Sanchez-Gracia
Vieira
Publication venue: 'Oxford University Press (OUP)'
Publication date: 08/04/2014
Field of study

Podeu consultar el programari a: http://hdl.handle.net/2445/53403Motivation: The comparative analysis of gene gain and loss rates is critical for understanding the role of natural selection and adaptation in shaping gene family sizes. Studying complete genome data from closely related species allows accurate estimation of gene family turnover rates. Current methods and software tools, however, are not well designed for dealing with certain kinds of functional elements, such as microRNAs or transcription factor binding sites. Results: Here, we describe BadiRate, a new software tool to estimate family turnover rates, as well as the number of elements in internal phylogenetic nodes, by likelihood-based methods and parsimony. It implements two stochastic population models, which provide the appropriate statistical framework for testing hypothesis, such as lineage-specific gene family expansions or contractions. We have assessed the accuracy of BadiRate by computer simulations, and have also illustrated its functionality by analyzing a representative empirical dataset

Crossref

Diposit Digital de la Universitat de Barcelona

A phylogenetic model for understanding the effect of gene duplication on cancer progression

Author: Chang Zheng
Cui Juan
Liberles David A.
Liu Liang
Ma Qin
Reeves Jaxk H.
Xu Ying
Yu Lili
Zhao Jing
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 25/12/2013
Field of study

As biotechnology advances rapidly, a tremendous amount of cancer genetic data has become available, providing an unprecedented opportunity for understanding the genetic mechanisms of cancer. To understand the effects of duplications and deletions on cancer progression, two genomes (normal and tumor) were sequenced from each of five stomach cancer patients in different stages (I, II, III and IV). We developed a phylogenetic model for analyzing stomach cancer data. The model assumes that duplication and deletion occur in accordance with a continuous time Markov Chain along the branches of a phylogenetic tree attached with five extended branches leading to the tumor genomes. Moreover, coalescence times of the phylogenetic tree follow a coalescence process. The simulation study suggests that the maximum likelihood approach can accurately estimate parameters in the phylogenetic model. The phylogenetic model was applied to the stomach cancer data. We found that the expected number of changes (duplication and deletion) per gene for the tumor genomes is significantly higher than that for the normal genomes. The goodness-of-fit test suggests that the phylogenetic model with constant duplication and deletion rates can adequately fit the duplication data for the normal genomes. The analysis found nine duplicated genes that are significantly associated with stomach cancer

Crossref

DigitalCommons@University of Nebraska

PubMed Central

Georgia Southern University: Digital Commons@Georgia Southern

Robustness of birth-death and gain models for inferring evolutionary events

Author: A Hughes
A Novozhilov
A Wagner
B Efron
C Chapus
Dannie Durand
DG Kendall
G Bourque
G Schwartz
H Akaike
H Mi
J Demuth
J Meunier
J Zheng
L Arvestad
L Liu
Larry Wasserman
M Csürös
M Csürös
M Csürös
M Hahn
M Hahn
M Han
M Spencer
Maureen Stolzer
P Librado
R Ames
R Finn
R Finn
R Pastor-Satorras
SW Roy
T De Bie
W Iwasaki
W Li
YI Wolf
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A generalized birth and death process for modeling the fates of gene duplication

Author: A Force
A Fortna
A Konrad
AA Khan
AI Teufel
AI Teufel
AL Hughes
Ashley I. Teufel
B Boussau
B Rannala
C Roth
CJ Basten
D Aldous
D Graur
David A. Liberles
DD Pollock
DG Kendall
DL Rabosky
GJ Szollosi
GP Karev
H Akaike
H Innan
H Morlon
I Yanai
J Sjostrand
JA Cotton
Jing Zhao
JW Thornton
JZ Zhang
L Arvestad
L Arvestad
L Liu
Liang Liu
M Csuros
M Hurles
M Lynch
M Lynch
M Lynch
M Nei
MA Huynen
MD Rasmussen
MW Hahn
MW Hahn
N Bailey
N Hallinan
O Akerborg
P Flicek
P Zhang
S Hohna
S Hohna
S Hohna
S Nee
S Ohno
S Penel
S Rastogi
T Gernhard
T Gernhard
T Hughes
T Janzen
T Ohta
T Ohta
T Ohta
T Stadler
Thompson
W Feller
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

The development of computational methods for large-scale comparisons and analyses of genome evolution

Author: Moss Stephen P.
Publication venue
Publication date: 01/12/2015
Field of study

The last four decades have seen the development of a number of experimental methods for the deduction of the whole genome sequences of an ever-increasing number of organisms. These sequences have in the first instance, allowed their investigators the opportunity to examine the molecular primary structure of areas of scientific interest, but with the increased sampling of organisms across the phylogenetic tree and the improved quality and coverage of genome sequences and their associated annotations, the opportunity to undertake detailed comparisons both within and between taxonomic groups has presented itself. The work described in this thesis details the application of comparative bioinformatics analyses on inter- and intra-genomic datasets, to elucidate those genomic changes, which may underlie organismal adaptations and contribute to changes in the complexity of genome content and structure over time. The results contained herein demonstrate the power and flexibility of the comparative approach, utilising whole genome data, to elucidate the answers to some of the most pressing questions in the biological sciences today.As the volume of genomic data increases, both as a result of increased sampling of the tree of life and due to an increase in the quality and throughput of the sequencing methods, it has become clear that there is a necessity for computational analyses of these data. Manual analysis of this volume of data, which can extend beyond petabytes of storage space, is now impossible. Automated computational pipelines are therefore required to retrieve, categorise and analyse these data. Chapter two discusses the development of a computational pipeline named the Genome Comparison and Analysis Toolkit (GCAT). The pipeline was developed using the Perl programming language and is tightly integrated with the Ensembl Perl API allowing for the retrieval and analyses of their rich genomic resources. In the first instance the pipeline was tested for its robustness by retrieving and describing various components of genomic architecture across a number of taxonomic groups. Additionally, the need for programmatically independent means of accessing data and in particular the need for Semantic Web based protocols and tools for the sharing of genomics resources is highlighted. This is not just for the requirements of researchers, but for improved communication and sharing between computational infrastructure. A prototype Ensembl REST web service was developed in collaboration with the European Bioinformatics Institute (EBI) to provide a means of accessing Ensembl’s genomic data without having to rely on their Perl API. A comparison of the runtime and memory usage of the Ensembl Perl API and prototype REST API were made relative to baseline raw SQL queries, which highlights the overheads inherent in building wrappers around the SQL queries. Differences in the efficiency of the approaches were highlighted, and the importance of investing in the development of Semantic Web technologies as a tool to improve access to data for the wider scientific community are discussed.Data highlighted in chapter two led to the identification of relative differences in the intron structure of a number of organisms including teleost fish. Chapter three encompasses a published, peer-reviewed study. Inter-genomic comparisons were undertaken utilising the 5 available teleost genome sequences in order to examine and describe their intron content. The number and sizes of introns were compared across these fish and a frequency distribution of intron size was produced that identified a novel expansion in the Zebrafish lineage of introns in the size range of approximately 500-2,000 bp. Further hypothesis driven analyses of the introns across the whole distribution of intron sizes identified that the majority, but not all of the introns were largely comprised of repetitive elements. It was concluded that the introns in the Zebrafish peak were likely the result of an ancient expansion of repetitive elements that had since degraded beyond the ability of computational algorithms to identify them. Additional sampling throughout the teleost fish lineage will allow for more focused phylogenetically driven analyses to be undertaken in the future.In chapter four phylogenetic comparative analyses of gene duplications were undertaken across primate and rodent taxonomic groups with the intention of identifying significantly expanded or contracted gene families. Changes in the size of gene families may indicate adaptive evolution. A larger number of expansions, relative to time since common ancestor, were identified in the branch leading to modern humans than in any other primate species. Due to the unique nature of the human data in terms of quantity and quality of annotation, additional analyses were undertaken to determine whether the expansions were methodological artefacts or real biological changes. Novel approaches were developed to test the validity of the data including comparisons to other highly annotated genomes. No similar expansion was seen in mouse when comparing with rodent data, though, as assemblies and annotations were updated, there were differences in the number of significant changes, which brings into question the reliability of the underlying assembly and annotation data. This emphasises the importance of an understanding that computational predictions, in the absence of supporting evidence, may be unlikely to represent the actual genomic structure, and instead be more an artefact of the software parameter space. In particular, significant shortcomings are highlighted due to the assumptions and parameters of the models used by the CAFE gene family analysis software. We must bear in mind that genome assemblies and annotations are hypotheses that themselves need to be questioned and subjected to robust controls to increase the confidence in any conclusions that can be drawn from them.In addition functional genomics analyses were undertaken to identify the role of significantly changed genes and gene families in primates, testing against a hypothesis that would see the majority of changes involving immune, sensory or reproductive genes. Gene Ontology (GO) annotations were retrieved for these data, which enabled highlighting the broad GO groupings and more specific functional classifications of these data. The results showed that the majority of gene expansions were in families that may have arisen due to adaptation, or were maintained due to their necessary involvement in developmental and metabolic processes. Comparisons were made to previously published studies to determine whether the Ensembl functional annotations were supported by the de-novo analyses undertaken in those studies. The majority were not, with only a small number of previously identified functional annotations being present in the most recent Ensembl releases.The impact of gene family evolution on intron evolution was explored in chapter five, by analysing gene family data and intron characteristics across the genomes of 61 vertebrate species. General descriptive statistics and visualisations were produced, along with tests for correlation between change in gene family size and the number, size and density of their associated introns. There was shown to be very little impact of change in gene family size on the underlying intron evolution. Other, non-family effects were therefore considered. These analyses showed that introns were restricted to euchromatic regions, with heterochromatic regions such as the centromeres and telomeres being largely devoid of any such features. A greater involvement of spatial mechanisms such as recombination, GC-bias across GC-rich isochores and biased gene conversion was thus proposed to play more of a role, though depending largely on population genetic and life history traits of the organisms involved. Additional population level sequencing and comparative analyses across a divergent group of species with available recombination maps and life history data would be a useful future direction in understanding the processes involved

Repository@Hull - Worktribe

Recommended from our members

Bayesian estimation of a longitudinal mediation model with three-level clustered data

Author: Israni Anita
Publication venue
Publication date: 18/02/2016
Field of study

Longitudinal modeling allows researchers to capture changes in variables that take time to exert their effects. Furthermore, incorporating mediation into a longitudinal model allows for researchers to test causal inferences about, for example, how an independent variable might affect growth in an outcome variable through growth in a mediating variable. In scenarios in which multiple variables are measured over time, the parallel process model can be used to model the inter-relationships among the measures’ trajectories where both processes are modeled to have their own separate but related growth parameters. The hierarchical linear modeling (HLM) framework can be used to model a parallel process model and allows for easy extensions to handle multiple levels and non-hierarchical data, such as cross-classified or multiple membership data structures, in clustered data. This study assessed a three-level parallel process model couched in the context of longitudinal mediation where treatment was assigned at the cluster level, matching a longitudinal cluster randomized trial design. The treatment’s effect on growth in an outcome is modeled as mediated by the growth in a mediating variable at the cluster and individual level, resulting in a cross-level and cluster-level mediated effect. A simulation and real data analysis study were conducted using a fully Bayesian analysis. In the simulation study, the following four factors were manipulated to assess the recovery of the parameters of interest: mediated effect size, random effects variance component values, number of measurement occasions, and number of clusters. Overall, relative parameter bias and statistical power improved for higher values for each of the four factors. The cross-level mediated effects were less biased and had greater statistical power than the cluster-level mediated effects. For the mediated effects that were truly zero, coverage rates based on the highest posterior density intervals showed mostly acceptable rates for the cross-level mediated effect and when path b was zero paired with a non-zero path a for the cluster-level effect. For conditions with a true value of zero for the cluster-level mediated effect with a path a of zero, the cluster-level coverage rates provided over-coverage. Results are discussed along with clarification of study limitations and suggestions for future research. Recommendations for applied researchers are also noted.Educational Psycholog

Texas ScholarWorks