1,184 research outputs found

    Bayesian and maximum likelihood phylogenetic analyses of protein sequence data under relative branch-length differences and model violation

    Get PDF
    BACKGROUND: Bayesian phylogenetic inference holds promise as an alternative to maximum likelihood, particularly for large molecular-sequence data sets. We have investigated the performance of Bayesian inference with empirical and simulated protein-sequence data under conditions of relative branch-length differences and model violation. RESULTS: With empirical protein-sequence data, Bayesian posterior probabilities provide more-generous estimates of subtree reliability than does the nonparametric bootstrap combined with maximum likelihood inference, reaching 100% posterior probability at bootstrap proportions around 80%. With simulated 7-taxon protein-sequence datasets, Bayesian posterior probabilities are somewhat more generous than bootstrap proportions, but do not saturate. Compared with likelihood, Bayesian phylogenetic inference can be as or more robust to relative branch-length differences for datasets of this size, particularly when among-sites rate variation is modeled using a gamma distribution. When the (known) correct model was used to infer trees, Bayesian inference recovered the (known) correct tree in 100% of instances in which one or two branches were up to 20-fold longer than the others. At ratios more extreme than 20-fold, topological accuracy of reconstruction degraded only slowly when only one branch was of relatively greater length, but more rapidly when there were two such branches. Under an incorrect model of sequence change, inaccurate trees were sometimes observed at less extreme branch-length ratios, and (particularly for trees with single long branches) such trees tended to be more inaccurate. The effect of model violation on accuracy of reconstruction for trees with two long branches was more variable, but gamma-corrected Bayesian inference nonetheless yielded more-accurate trees than did either maximum likelihood or uncorrected Bayesian inference across the range of conditions we examined. Assuming an exponential Bayesian prior on branch lengths did not improve, and under certain extreme conditions significantly diminished, performance. The two topology-comparison metrics we employed, edit distance and Robinson-Foulds symmetric distance, yielded different but highly complementary measures of performance. CONCLUSIONS: Our results demonstrate that Bayesian inference can be relatively robust against biologically reasonable levels of relative branch-length differences and model violation, and thus may provide a promising alternative to maximum likelihood for inference of phylogenetic trees from protein-sequence data

    Data-driven normalization strategies for high-throughput quantitative RT-PCR

    Get PDF
    Background: High-throughput real-time quantitative reverse transcriptase polymerase chain reaction (qPCR) is a widely used technique in experiments where expression patterns of genes are to be profiled. Current stage technology allows the acquisition of profiles for a moderate number of genes (50 to a few thousand), and this number continues to grow. The use of appropriate normalization algorithms for qPCR-based data is therefore a highly important aspect of the data preprocessing pipeline

    Assessment of HIV testing among young methamphetamine users in Muse, Northern Shan State, Myanmar

    Get PDF
    Background Methamphetamine (MA) use has a strong correlation with risky sexual behaviors, and thus may be triggering the growing HIV epidemic in Myanmar. Although methamphetamine use is a serious public health concern, only a few studies have examined HIV testing among young drug users. This study aimed to examine how predisposing, enabling and need factors affect HIV testing among young MA users. Methods A cross-sectional study was conducted from January to March 2013 in Muse city in the Northern Shan State of Myanmar. Using a respondent-driven sampling method, 776 MA users aged 18-24 years were recruited. The main outcome of interest was whether participants had ever been tested for HIV. Descriptive statistics and multivariate logistic regression were applied in this study. Results Approximately 14.7% of young MA users had ever been tested for HIV. Significant positive predictors of HIV testing included predisposing factors such as being a female MA user, having had higher education, and currently living with one’s spouse/sexual partner. Significant enabling factors included being employed and having ever visited NGO clinics or met NGO workers. Significant need factors were having ever been diagnosed with an STI and having ever wanted to receive help to stop drug use. Conclusions Predisposing, enabling and need factors were significant contributors affecting uptake of HIV testing among young MA users. Integrating HIV testing into STI treatment programs, alongside general expansion of HIV testing services may be effective in increasing HIV testing uptake among young MA users

    Defining an informativeness metric for clustering gene expression data

    Get PDF
    Motivation: Unsupervised ‘cluster’ analysis is an invaluable tool for exploratory microarray data analysis, as it organizes the data into groups of genes or samples in which the elements share common patterns. Once the data are clustered, finding the optimal number of informative subgroups within a dataset is a problem that, while important for understanding the underlying phenotypes, is one for which there is no robust, widely accepted solution

    attract: A Method for Identifying Core Pathways That Define Cellular Phenotypes

    Get PDF
    attract is a knowledge-driven analytical approach for identifying and annotating the gene-sets that best discriminate between cell phenotypes. attract finds distinguishing patterns within pathways, decomposes pathways into meta-genes representative of these patterns, and then generates synexpression groups of highly correlated genes from the entire transcriptome dataset. attract can be applied to a wide range of biological systems and is freely available as a Bioconductor package and has been incorporated into the MeV software system

    Decomposition of Gene Expression State Space Trajectories

    Get PDF
    Representing and analyzing complex networks remains a roadblock to creating dynamic network models of biological processes and pathways. The study of cell fate transitions can reveal much about the transcriptional regulatory programs that underlie these phenotypic changes and give rise to the coordinated patterns in expression changes that we observe. The application of gene expression state space trajectories to capture cell fate transitions at the genome-wide level is one approach currently used in the literature. In this paper, we analyze the gene expression dataset of Huang et al. (2005) which follows the differentiation of promyelocytes into neutrophil-like cells in the presence of inducers dimethyl sulfoxide and all-trans retinoic acid. Huang et al. (2005) build on the work of Kauffman (2004) who raised the attractor hypothesis, stating that cells exist in an expression landscape and their expression trajectories converge towards attractive sites in this landscape. We propose an alternative interpretation that explains this convergent behavior by recognizing that there are two types of processes participating in these cell fate transitions—core processes that include the specific differentiation pathways of promyelocytes to neutrophils, and transient processes that capture those pathways and responses specific to the inducer. Using functional enrichment analyses, specific biological examples and an analysis of the trajectories and their core and transient components we provide a validation of our hypothesis using the Huang et al. (2005) dataset

    Viral Perturbations of Host Networks Reflect Disease Etiology

    Get PDF
    Many human diseases, arising from mutations of disease susceptibility genes (genetic diseases), are also associated with viral infections (virally implicated diseases), either in a directly causal manner or by indirect associations. Here we examine whether viral perturbations of host interactome may underlie such virally implicated disease relationships. Using as models two different human viruses, Epstein-Barr virus (EBV) and human papillomavirus (HPV), we find that host targets of viral proteins reside in network proximity to products of disease susceptibility genes. Expression changes in virally implicated disease tissues and comorbidity patterns cluster significantly in the network vicinity of viral targets. The topological proximity found between cellular targets of viral proteins and disease genes was exploited to uncover a novel pathway linking HPV to Fanconi anemia
    corecore