18,717 research outputs found
In search of lost introns
Many fundamental questions concerning the emergence and subsequent evolution
of eukaryotic exon-intron organization are still unsettled. Genome-scale
comparative studies, which can shed light on crucial aspects of eukaryotic
evolution, require adequate computational tools.
We describe novel computational methods for studying spliceosomal intron
evolution. Our goal is to give a reliable characterization of the dynamics of
intron evolution. Our algorithmic innovations address the identification of
orthologous introns, and the likelihood-based analysis of intron data. We
discuss a compression method for the evaluation of the likelihood function,
which is noteworthy for phylogenetic likelihood problems in general. We prove
that after preprocessing time, subsequent evaluations take time almost surely in the Yule-Harding random model of -taxon
phylogenies, where is the input sequence length.
We illustrate the practicality of our methods by compiling and analyzing a
data set involving 18 eukaryotes, more than in any other study to date. The
study yields the surprising result that ancestral eukaryotes were fairly
intron-rich. For example, the bilaterian ancestor is estimated to have had more
than 90% as many introns as vertebrates do now
Control of DNA minor groove width and Fis protein binding by the purine 2-amino group.
The width of the DNA minor groove varies with sequence and can be a major determinant of DNA shape recognition by proteins. For example, the minor groove within the center of the Fis-DNA complex narrows to about half the mean minor groove width of canonical B-form DNA to fit onto the protein surface. G/C base pairs within this segment, which is not contacted by the Fis protein, reduce binding affinities up to 2000-fold over A/T-rich sequences. We show here through multiple X-ray structures and binding properties of Fis-DNA complexes containing base analogs that the 2-amino group on guanine is the primary molecular determinant controlling minor groove widths. Molecular dynamics simulations of free-DNA targets with canonical and modified bases further demonstrate that sequence-dependent narrowing of minor groove widths is modulated almost entirely by the presence of purine 2-amino groups. We also provide evidence that protein-mediated phosphate neutralization facilitates minor groove compression and is particularly important for binding to non-optimally shaped DNA duplexes
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome
We evaluate a version of the recently-proposed classification system named
Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space
of sequences of generic objects. The ODSE system has been originally presented
as a classification system for patterns represented as labeled graphs. However,
since ODSE is founded on the dissimilarity space representation of the input
data, the classifier can be easily adapted to any input domain where it is
possible to define a meaningful dissimilarity measure. Here we demonstrate the
effectiveness of the ODSE classifier for sequences by considering an
application dealing with the recognition of the solubility degree of the
Escherichia coli proteome. Solubility, or analogously aggregation propensity,
is an important property of protein molecules, which is intimately related to
the mechanisms underlying the chemico-physical process of folding. Each protein
of our dataset is initially associated with a solubility degree and it is
represented as a sequence of symbols, denoting the 20 amino acid residues. The
herein obtained computational results, which we stress that have been achieved
with no context-dependent tuning of the ODSE system, confirm the validity and
generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference
Neural Network and Bioinformatic Methods for Predicting HIV-1 Protease Inhibitor Resistance
This article presents a new method for predicting viral resistance to seven protease inhibitors from the HIV-1 genotype, and for identifying the positions in the protease gene at which the specific nature of the mutation affects resistance. The neural network Analog ARTMAP predicts protease inhibitor resistance from viral genotypes. A feature selection method detects genetic positions that contribute to resistance both alone and through interactions with other positions. This method has identified positions 35, 37, 62, and 77, where traditional feature selection methods have not detected a contribution to resistance.
At several positions in the protease gene, mutations confer differing degress of resistance, depending on the specific amino acid to which the sequence has mutated. To find these positions, an Amino Acid Space is introduced to represent genes in a vector space that captures the functional similarity between amino acid pairs. Feature selection identifies several new positions, including 36, 37, and 43, with amino acid-specific contributions to resistance. Analog ARTMAP networks applied to inputs that represent specific amino acids at these positions perform better than networks that use only mutation locations.Air Force Office of Scientific Research (F49620-01-1-0423); National Geospatial-Intelligence Agency (NMA 201-01-1-2016); National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Functionalisation of Ti6Al4V and hydroxyapatite surfaces with combined peptides based on KKLPDA and EEEEEEEE peptides
Surface modifications are usually performed on titanium alloys to improve osteo-integration and surface bioactivity. Modifications such as alkaline and acid etching, or coating with bioactive materials such as hydroxyapatite, have previously been demonstrated. The aim of this work is to develop a peptide with combined titanium oxide and hydroxyapatite binders in order to achieve a biomimetic hydroxyapatite coating on titanium surfaces. The technology would also be applicable for the functionalisation of titanium and hydroxyapatite surfaces for selective protein adsorption, conjugation of antimicrobial peptides, and adsorption of specialised drugs for drug delivery. In this work, functionalisation of Ti6Al4V and hydroxyapatite surfaces was achieved using combined titanium-hydroxyapatite (Ti-Hap) peptides based on titanium binder (RKLPDA) and hydroxyapatite binder (EEEEEEEE) peptides. Homogeneous peptide coatings on Ti6Al4V surfaces were obtained after surface chemical treatments with a 30 wt % aqueous solution of H2O2 for 24 and 48 hours. The treated titanium surfaces presented an average roughness of Sa=197 nm (24 h) and Sa=128 nm (48 h); an untreated mirror polished sample exhibited an Sa of 13 nm. The advancing water contact angle of the titanium oxide layer after 1 hour of exposure to 30 wt % aqueous solution of H2O2 was around 65°, decreasing gradually with time until it reached 35° after a 48 hour exposure, suggesting that the surface hydrophilicity increased over etching time. The presence of a lysine (L) amino acid in the sequence of the titanium binder resulted in fluorescence intensity roughly 16 % higher compared with the arginine (R) amino acid analogue and therefore the lysine containing titanium binder was used in this work. The Ti-Hap peptide KKLPDAEEEEEEEE (Ti-Hap1) was not adsorbed by the treated Ti6Al4V surfaces and therefore was modified. The modifications involved the inclusion of a glycine spacer between the binding terminals (Ti-Hap2) and the addition of a second titanium binder (KKLPDA) (Ti-Hap3 and Ti-Hap4). The Ti-Hap peptide aptamer which exhibited the strongest intensity after the titanium dip coating was KKLPDAKKLPDAEEEEEEEE (Ti-Hap4). On the other hand, hydroxyapatite surfaces, exhibiting an average roughness of Sa=1.42 µm, showed a higher fluorescence for all peptides compared with titanium surfaces
- …