18,717 research outputs found

    In search of lost introns

    Full text link
    Many fundamental questions concerning the emergence and subsequent evolution of eukaryotic exon-intron organization are still unsettled. Genome-scale comparative studies, which can shed light on crucial aspects of eukaryotic evolution, require adequate computational tools. We describe novel computational methods for studying spliceosomal intron evolution. Our goal is to give a reliable characterization of the dynamics of intron evolution. Our algorithmic innovations address the identification of orthologous introns, and the likelihood-based analysis of intron data. We discuss a compression method for the evaluation of the likelihood function, which is noteworthy for phylogenetic likelihood problems in general. We prove that after O(nL)O(nL) preprocessing time, subsequent evaluations take O(nL/logL)O(nL/\log L) time almost surely in the Yule-Harding random model of nn-taxon phylogenies, where LL is the input sequence length. We illustrate the practicality of our methods by compiling and analyzing a data set involving 18 eukaryotes, more than in any other study to date. The study yields the surprising result that ancestral eukaryotes were fairly intron-rich. For example, the bilaterian ancestor is estimated to have had more than 90% as many introns as vertebrates do now

    Control of DNA minor groove width and Fis protein binding by the purine 2-amino group.

    Get PDF
    The width of the DNA minor groove varies with sequence and can be a major determinant of DNA shape recognition by proteins. For example, the minor groove within the center of the Fis-DNA complex narrows to about half the mean minor groove width of canonical B-form DNA to fit onto the protein surface. G/C base pairs within this segment, which is not contacted by the Fis protein, reduce binding affinities up to 2000-fold over A/T-rich sequences. We show here through multiple X-ray structures and binding properties of Fis-DNA complexes containing base analogs that the 2-amino group on guanine is the primary molecular determinant controlling minor groove widths. Molecular dynamics simulations of free-DNA targets with canonical and modified bases further demonstrate that sequence-dependent narrowing of minor groove widths is modulated almost entirely by the presence of purine 2-amino groups. We also provide evidence that protein-mediated phosphate neutralization facilitates minor groove compression and is particularly important for binding to non-optimally shaped DNA duplexes

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo

    Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

    Full text link
    We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

    Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance

    Full text link
    This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance. At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Functionalisation of Ti6Al4V and hydroxyapatite surfaces with combined peptides based on KKLPDA and EEEEEEEE peptides

    Get PDF
    Surface modifications are usually performed on titanium alloys to improve osteo-integration and surface bioactivity. Modifications such as alkaline and acid etching, or coating with bioactive materials such as hydroxyapatite, have previously been demonstrated. The aim of this work is to develop a peptide with combined titanium oxide and hydroxyapatite binders in order to achieve a biomimetic hydroxyapatite coating on titanium surfaces. The technology would also be applicable for the functionalisation of titanium and hydroxyapatite surfaces for selective protein adsorption, conjugation of antimicrobial peptides, and adsorption of specialised drugs for drug delivery. In this work, functionalisation of Ti6Al4V and hydroxyapatite surfaces was achieved using combined titanium-hydroxyapatite (Ti-Hap) peptides based on titanium binder (RKLPDA) and hydroxyapatite binder (EEEEEEEE) peptides. Homogeneous peptide coatings on Ti6Al4V surfaces were obtained after surface chemical treatments with a 30 wt % aqueous solution of H2O2 for 24 and 48 hours. The treated titanium surfaces presented an average roughness of Sa=197 nm (24 h) and Sa=128 nm (48 h); an untreated mirror polished sample exhibited an Sa of 13 nm. The advancing water contact angle of the titanium oxide layer after 1 hour of exposure to 30 wt % aqueous solution of H2O2 was around 65°, decreasing gradually with time until it reached 35° after a 48 hour exposure, suggesting that the surface hydrophilicity increased over etching time. The presence of a lysine (L) amino acid in the sequence of the titanium binder resulted in fluorescence intensity roughly 16 % higher compared with the arginine (R) amino acid analogue and therefore the lysine containing titanium binder was used in this work. The Ti-Hap peptide KKLPDAEEEEEEEE (Ti-Hap1) was not adsorbed by the treated Ti6Al4V surfaces and therefore was modified. The modifications involved the inclusion of a glycine spacer between the binding terminals (Ti-Hap2) and the addition of a second titanium binder (KKLPDA) (Ti-Hap3 and Ti-Hap4). The Ti-Hap peptide aptamer which exhibited the strongest intensity after the titanium dip coating was KKLPDAKKLPDAEEEEEEEE (Ti-Hap4). On the other hand, hydroxyapatite surfaces, exhibiting an average roughness of Sa=1.42 µm, showed a higher fluorescence for all peptides compared with titanium surfaces
    corecore