121 research outputs found

    An optimized TOPS+ comparison method for enhanced TOPS models

    Get PDF
    This article has been made available through the Brunel Open Access Publishing Fund.Background Although methods based on highly abstract descriptions of protein structures, such as VAST and TOPS, can perform very fast protein structure comparison, the results can lack a high degree of biological significance. Previously we have discussed the basic mechanisms of our novel method for structure comparison based on our TOPS+ model (Topological descriptions of Protein Structures Enhanced with Ligand Information). In this paper we show how these results can be significantly improved using parameter optimization, and we call the resulting optimised TOPS+ method as advanced TOPS+ comparison method i.e. advTOPS+. Results We have developed a TOPS+ string model as an improvement to the TOPS [1-3] graph model by considering loops as secondary structure elements (SSEs) in addition to helices and strands, representing ligands as first class objects, and describing interactions between SSEs, and SSEs and ligands, by incoming and outgoing arcs, annotating SSEs with the interaction direction and type. Benchmarking results of an all-against-all pairwise comparison using a large dataset of 2,620 non-redundant structures from the PDB40 dataset [4] demonstrate the biological significance, in terms of SCOP classification at the superfamily level, of our TOPS+ comparison method. Conclusions Our advanced TOPS+ comparison shows better performance on the PDB40 dataset [4] compared to our basic TOPS+ method, giving 90 percent accuracy for SCOP alpha+beta; a 6 percent increase in accuracy compared to the TOPS and basic TOPS+ methods. It also outperforms the TOPS, basic TOPS+ and SSAP comparison methods on the Chew-Kedem dataset [5], achieving 98 percent accuracy. Software Availability: The TOPS+ comparison server is available at http://balabio.dcs.gla.ac.uk/mallika/WebTOPS/.This article is available through the Brunel Open Access Publishing Fun

    Grammar-based distance in progressive multiple sequence alignment

    Get PDF
    Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets

    SeqAn An efficient, generic C++ library for sequence analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of novel algorithmic techniques is pivotal to many important problems in life science. For example the sequencing of the human genome <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> would not have been possible without advanced assembly algorithms. However, owing to the high speed of technological progress and the urgent need for bioinformatics tools, there is a widening gap between state-of-the-art algorithmic techniques and the actual algorithmic components of tools that are in widespread use.</p> <p>Results</p> <p>To remedy this trend we propose the use of SeqAn, a library of efficient data types and algorithms for sequence analysis in computational biology. SeqAn comprises implementations of existing, practical state-of-the-art algorithmic components to provide a sound basis for algorithm testing and development. In this paper we describe the design and content of SeqAn and demonstrate its use by giving two examples. In the first example we show an application of SeqAn as an experimental platform by comparing different exact string matching algorithms. The second example is a simple version of the well-known MUMmer tool rewritten in SeqAn. Results indicate that our implementation is very efficient and versatile to use.</p> <p>Conclusion</p> <p>We anticipate that SeqAn greatly simplifies the rapid development of new bioinformatics tools by providing a collection of readily usable, well-designed algorithmic components which are fundamental for the field of sequence analysis. This leverages not only the implementation of new algorithms, but also enables a sound analysis and comparison of existing algorithms.</p

    An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery.</p> <p>Results</p> <p>We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences.</p> <p>Conclusion</p> <p>The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

    High-throughput mapping of regulatory DNA

    Get PDF
    Quantifying the effects of cis-regulatory DNA on gene expression is a major challenge. Here, we present the multiplexed editing regulatory assay (MERA), a high-throughput CRISPR-Cas9–based approach that analyzes the functional impact of the regulatory genome in its native context. MERA tiles thousands of mutations across ~40 kb of cis-regulatory genomic space and uses knock-in green fluorescent protein (GFP) reporters to read out gene activity. Using this approach, we obtain quantitative information on the contribution of cis-regulatory regions to gene expression. We identify proximal and distal regulatory elements necessary for expression of four embryonic stem cell–specific genes. We show a consistent contribution of neighboring gene promoters to gene expression and identify unmarked regulatory elements (UREs) that control gene expression but do not have typical enhancer epigenetic or chromatin features. We compare thousands of functional and nonfunctional genotypes at a genomic location and identify the base pair–resolution functional motifs of regulatory elements.National Institutes of Health (U.S.) (1U01HG007037

    Hsp40 Couples with the CSPα Chaperone Complex upon Induction of the Heat Shock Response

    Get PDF
    In response to a conditioning stress, the expression of a set of molecular chaperones called heat shock proteins is increased. In neurons, stress-induced and constitutively expressed molecular chaperones protect against damage induced by ischemia and neurodegenerative diseases, however the molecular basis of this protection is not known. Here we have investigated the crosstalk between stress-induced chaperones and cysteine string protein (CSPα). CSPα is a constitutively expressed synaptic vesicle protein bearing a J domain and a cysteine rich “string” region that has been implicated in the long term functional integrity of synaptic transmission and the defense against neurodegeneration. We have shown previously that the CSPα chaperone complex increases isoproterenol-mediated signaling by stimulating GDP/GTP exchange of Gαs. In this report we demonstrate that in response to heat shock or treatment with the Hsp90 inhibitor geldanamycin, the J protein Hsp40 becomes a major component of the CSPα complex. Association of Hsp40 with CSPα decreases CSPα-CSPα dimerization and enhances the CSPα-induced increase in steady state GTP hydrolysis of Gαs. This newly identified CSPα-Hsp40 association reveals a previously undescribed coupling of J proteins. In view of the crucial importance of stress-induced chaperones in the protection against cell death, our data attribute a role for Hsp40 crosstalk with CSPα in neuroprotection

    Comprehensive Analysis of 5-Aminolevulinic Acid Dehydrogenase (ALAD) Variants and Renal Cell Carcinoma Risk among Individuals Exposed to Lead

    Get PDF
    BACKGROUND: Epidemiologic studies are reporting associations between lead exposure and human cancers. A polymorphism in the 5-aminolevulinic acid dehydratase (ALAD) gene affects lead toxicokinetics and may modify the adverse effects of lead. METHODS: The objective of this study was to evaluate single-nucleotide polymorphisms (SNPs) tagging the ALAD region among renal cancer cases and controls to determine whether genetic variation alters the relationship between lead and renal cancer. Occupational exposure to lead and risk of cancer was examined in a case-control study of renal cell carcinoma (RCC). Comprehensive analysis of variation across the ALAD gene was assessed using a tagging SNP approach among 987 cases and 1298 controls. Occupational lead exposure was estimated using questionnaire-based exposure assessment and expert review. Odds ratios (OR) and 95% confidence intervals (CI) were calculated using logistic regression. RESULTS: The adjusted risk associated with the ALAD variant rs8177796(CT/TT) was increased (OR = 1.35, 95%CI = 1.05-1.73, p-value = 0.02) when compared to the major allele, regardless of lead exposure. Joint effects of lead and ALAD rs2761016 suggest an increased RCC risk for the homozygous wild-type and heterozygous alleles ((GG)OR = 2.68, 95%CI = 1.17-6.12, p = 0.01; (GA)OR = 1.79, 95%CI = 1.06-3.04 with an interaction approaching significance (p(int) = 0.06). No significant modification in RCC risk was observed for the functional variant rs1800435(K68N). Haplotype analysis identified a region associated with risk supporting tagging SNP results. CONCLUSION: A common genetic variation in ALAD may alter the risk of RCC overall, and among individuals occupationally exposed to lead. Further work in larger exposed populations is warranted to determine if ALAD modifies RCC risk associated with lead exposure
    corecore