39 research outputs found

    Sequence embedding for fast construction of guide trees for multiple sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The most widely used multiple sequence alignment methods require sequences to be clustered as an initial step. Most sequence clustering methods require a full distance matrix to be computed between all pairs of sequences. This requires memory and time proportional to <it>N</it><sup>2 </sup>for <it>N </it>sequences. When <it>N </it>grows larger than 10,000 or so, this becomes increasingly prohibitive and can form a significant barrier to carrying out very large multiple alignments.</p> <p>Results</p> <p>In this paper, we have tested variations on a class of embedding methods that have been designed for clustering large numbers of complex objects where the individual distance calculations are expensive. These methods involve embedding the sequences in a space where the similarities within a set of sequences can be closely approximated without having to compute all pair-wise distances.</p> <p>Conclusions</p> <p>We show how this approach greatly reduces computation time and memory requirements for clustering large numbers of sequences and demonstrate the quality of the clusterings by benchmarking them as guide trees for multiple alignment. Source code is available for download from <url>http://www.clustal.org/mbed.tgz</url>.</p

    Identifying novel hypoxia-associated markers of chemoresistance in ovarian cancer

    Get PDF
    BACKGROUND Ovarian cancer is associated with poor long-term survival due to late diagnosis and development of chemoresistance. Tumour hypoxia is associated with many features of tumour aggressiveness including increased cellular proliferation, inhibition of apoptosis, increased invasion and metastasis, and chemoresistance, mostly mediated through hypoxia-inducible factor (HIF)-1α. While HIF-1α has been associated with platinum resistance in a variety of cancers, including ovarian, relatively little is known about the importance of the duration of hypoxia. Similarly, the gene pathways activated in ovarian cancer which cause chemoresistance as a result of hypoxia are poorly understood. This study aimed to firstly investigate the effect of hypoxia duration on resistance to cisplatin in an ovarian cancer chemoresistance cell line model and to identify genes whose expression was associated with hypoxia-induced chemoresistance. METHODS Cisplatin-sensitive (A2780) and cisplatin-resistant (A2780cis) ovarian cancer cell lines were exposed to various combinations of hypoxia and/or chemotherapeutic drugs as part of a 'hypoxia matrix' designed to cover clinically relevant scenarios in terms of tumour hypoxia. Response to cisplatin was measured by the MTT assay. RNA was extracted from cells treated as part of the hypoxia matrix and interrogated on Affymetrix Human Gene ST 1.0 arrays. Differential gene expression analysis was performed for cells exposed to hypoxia and/or cisplatin. From this, four potential markers of chemoresistance were selected for evaluation in a cohort of ovarian tumour samples by RT-PCR. RESULTS Hypoxia increased resistance to cisplatin in A2780 and A2780cis cells. A plethora of genes were differentially expressed in cells exposed to hypoxia and cisplatin which could be associated with chemoresistance. In ovarian tumour samples, we found trends for upregulation of ANGPTL4 in partial responders and down-regulation in non-responders compared with responders to chemotherapy; down-regulation of HER3 in partial and non-responders compared to responders; and down-regulation of HIF-1α in non-responders compared with responders. CONCLUSION This study has further characterized the relationship between hypoxia and chemoresistance in an ovarian cancer model. We have also identified many potential biomarkers of hypoxia and platinum resistance and provided an initial validation of a subset of these markers in ovarian cancer tissues

    Coevolution with bacteriophages drives genome-wide host evolution and constrains the acquisition of abiotic-beneficial mutations

    Get PDF
    This is the author accepted manuscript. The final version is available from OUP via the DOI in this record.Studies of antagonistic coevolution between hosts and parasites typically focus on resistance and infectivity traits. However, coevolution could also have genome-wide effects on the hosts due to pleiotropy, epistasis, or selection for evolvability. Here, we investigate these effects in the bacterium Pseudomonas fluorescens SBW25 during approximately 400 generations of evolution in the presence or absence of bacteriophage (coevolution or evolution treatments, respectively). Coevolution resulted in variable phage resistance, lower competitive fitness in the absence of phages, and greater genome-wide divergence both from the ancestor and between replicates, in part due to the evolution of increased mutation rates. Hosts from coevolution and evolution treatments had different suites of mutations. A high proportion of mutations observed in coevolved hosts were associated with a known phage target binding site, the lipopolysaccharide (LPS), and correlated with altered LPS length and phage resistance. Mutations in evolved bacteria were correlated with higher fitness in the absence of phages. However, the benefits of these growth-promoting mutations were completely lost when these bacteria were subsequently coevolved with phages, indicating that they were not beneficial in the presence of resistance mutations (consistent with negative epistasis). Our results show that in addition to affecting genome-wide evolution in loci not obviously linked to parasite resistance, coevolution can also constrain the acquisition of mutations beneficial for growth in the abiotic environment.This work was funded by European Research Council and NERC (UK)

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Optimizing substitution matrix choice and gap parameters for sequence alignment

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p

    Multidimensional Scaling Reveals the Main Evolutionary Pathways of Class A G-Protein-Coupled Receptors

    Get PDF
    Class A G-protein-coupled receptors (GPCRs) constitute the largest family of transmembrane receptors in the human genome. Understanding the mechanisms which drove the evolution of such a large family would help understand the specificity of each GPCR sub-family with applications to drug design. To gain evolutionary information on class A GPCRs, we explored their sequence space by metric multidimensional scaling analysis (MDS). Three-dimensional mapping of human sequences shows a non-uniform distribution of GPCRs, organized in clusters that lay along four privileged directions. To interpret these directions, we projected supplementary sequences from different species onto the human space used as a reference. With this technique, we can easily monitor the evolutionary drift of several GPCR sub-families from cnidarians to humans. Results support a model of radiative evolution of class A GPCRs from a central node formed by peptide receptors. The privileged directions obtained from the MDS analysis are interpretable in terms of three main evolutionary pathways related to specific sequence determinants. The first pathway was initiated by a deletion in transmembrane helix 2 (TM2) and led to three sub-families by divergent evolution. The second pathway corresponds to the differentiation of the amine receptors. The third pathway corresponds to parallel evolution of several sub-families in relation with a covarion process involving proline residues in TM2 and TM5. As exemplified with GPCRs, the MDS projection technique is an important tool to compare orthologous sequence sets and to help decipher the mutational events that drove the evolution of protein families
    corecore