Search CORE

19,321 research outputs found

Computational identification and analysis of noncoding RNAs - Unearthing the buried treasures in the genome

Author: Vaidyanathan P. P.
Yoon Byung-Jun
Publication venue
Publication date: 01/01/2007
Field of study

The central dogma of molecular biology states that the genetic information flows from DNA to RNA to protein. This dogma has exerted a substantial influence on our understanding of the genetic activities in the cells. Under this influence, the prevailing assumption until the recent past was that genes are basically repositories for protein coding information, and proteins are responsible for most of the important biological functions in all cells. In the meanwhile, the importance of RNAs has remained rather obscure, and RNA was mainly viewed as a passive intermediary that bridges the gap between DNA and protein. Except for classic examples such as tRNAs (transfer RNAs) and rRNAs (ribosomal RNAs), functional noncoding RNAs were considered to be rare. However, this view has experienced a dramatic change during the last decade, as systematic screening of various genomes identified myriads of noncoding RNAs (ncRNAs), which are RNA molecules that function without being translated into proteins [11], [40]. It has been realized that many ncRNAs play important roles in various biological processes. As RNAs can interact with other RNAs and DNAs in a sequence-specific manner, they are especially useful in tasks that require highly specific nucleotide recognition [11]. Good examples are the miRNAs (microRNAs) that regulate gene expression by targeting mRNAs (messenger RNAs) [4], [20], and the siRNAs (small interfering RNAs) that take part in the RNAi (RNA interference) pathways for gene silencing [29], [30]. Recent developments show that ncRNAs are extensively involved in many gene regulatory mechanisms [14], [17]. The roles of ncRNAs known to this day are truly diverse. These include transcription and translation control, chromosome replication, RNA processing and modification, and protein degradation and translocation [40], just to name a few. These days, it is even claimed that ncRNAs dominate the genomic output of the higher organisms such as mammals, and it is being suggested that the greater portion of their genome (which does not encode proteins) is dedicated to the control and regulation of cell development [27]. As more and more evidence piles up, greater attention is paid to ncRNAs, which have been neglected for a long time. Researchers began to realize that the vast majority of the genome that was regarded as “junk,” mainly because it was not well understood, may indeed hold the key for the best kept secrets in life, such as the mechanism of alternative splicing, the control of epigenetic variations and so forth [27]. The complete range and extent of the role of ncRNAs are not so obvious at this point, but it is certain that a comprehensive understanding of cellular processes is not possible without understanding the functions of ncRNAs [47]

Caltech Authors

Computational approaches to predict protein functional families and functional sites.

Author: Abbasian M
Orengo CA
Rauer C
Sen N
Waman VP
Publication venue
Publication date: 01/10/2021
Field of study

Understanding the mechanisms of protein function is indispensable for many biological applications, such as protein engineering and drug design. However, experimental annotations are sparse, and therefore, theoretical strategies are needed to fill the gap. Here, we present the latest developments in building functional subclassifications of protein superfamilies and using evolutionary conservation to detect functional determinants, for example, catalytic-, binding- and specificity-determining residues important for delineating the functional families. We also briefly review other features exploited for functional site detection and new machine learning strategies for combining multiple features

UCL Discovery

Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

Author: Arcas Túnez Francisco
Bueno Crespo Andrés
Cecilia Canales José María
García Valverde Teresa
Llanes Antonio
Muñoz Andrés
Pérez Sánchez Horacio
Sánchez Antonia María
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2016
Field of study

The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

Institutional Repository UCAM

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The interplay of descriptor-based computational analysis with pharmacophore modeling builds the basis for a novel classification scheme for feruloyl esterases

Author: Akin
Altschul
Andersen
Andreasen
Aurilia
Barnum
Bartolomé
Bendtsen
Benner
Benoit
Benoit
Bhasin
Bhasin
Blum
Cai
Cai
Castanares
Chang
Choi
Crepin
D.B.R.K. Gupta Udatha
Dodd
Donaghy
Donaghy
Dudoit
Dysvik
Ewing
Faulds
Ferguson
Fillingham
Finn
Garcia-Conesa
García-Conesa
Garrigues
Gasteiger
Gasteiger
Gianni Panagiotou
Giuliani
Goldstone
Hall
Han
Hatzakis
Henikoff
Hermoso
Hsu
Humberstone
Huson
Irene Kouskoumvekaki
Kaiser
Karchin
Keerthi
Kheder
Kikuzaki
Kim
Kohavi
Kohonen
Koseki
Koseki
Kroon
Kroon
Kumar
Lao
Larkin
Laszlo
Latha
Lee
Lesage-Meessen
Levasseur
Levasseur
Li
Lima
Lisbeth Olsson
MacKay
Marcotte
McAuley
Meinicke
Morris
Mukherjee
Nielsen
Noble
Nsereko
Oili
Ong
Platt
Prates
Pérez-Bercoff
Rashamuse
Record
Rost
Sancho
Sankararaman
Sankararaman
Schrödinger Suite 2009
Schubot
Slavin
Tarbouriech
Teodoro
Thompson
Tomoko
Topakas
Topakas
Topakas
Topakas
Topakas
Tsuchiyama
Tsuchiyama
Uestuen
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Vafiadi
Wang
Wang
Wang
Wilkinson
Publication venue
Publication date: 11/08/2010
Field of study

One of the most intriguing groups of enzymes, the feruloyl esterases (FAEs), is ubiquitous in both simple and complex organisms. FAEs have gained importance in biofuel, medicine and food industries due to their capability of acting on a large range of substrates for cleaving ester bonds and synthesizing high-added value molecules through esterification and transesterification reactions. During the past two decades extensive studies have been carried out on the production and partial characterization of FAEs from fungi, while much less is known about FAEs of bacterial or plant origin. Initial classification studies on FAEs were restricted on sequence similarity and substrate specificity on just four model substrates and considered only a handful of FAEs belonging to the fungal kingdom. This study centers on the descriptor-based classification and structural analysis of experimentally verified and putative FAEs; nevertheless, the framework presented here is applicable to every poorly characterized enzyme family. 365 FAE-related sequences of fungal, bacterial and plantae origin were collected and they were clustered using Self Organizing Maps followed by k-means clustering into distinct groups based on amino acid composition and physico-chemical composition descriptors derived from the respective amino acid sequence. A Support Vector Machine model was subsequently constructed for the classification of new FAEs into the pre-assigned clusters. The model successfully recognized 98.2% of the training sequences and all the sequences of the blind test. The underlying functionality of the 12 proposed FAE families was validated against a combination of prediction tools and published experimental data. Another important aspect of the present work involves the development of pharmacophore models for the new FAE families, for which sufficient information on known substrates existed. Knowing the pharmacophoric features of a small molecule that are essential for binding to the members of a certain family opens a window of opportunities for tailored applications of FAEs

Crossref

Chalmers Research

Nature Precedings

Online Research Database In Technology

Chalmers Publication Library

HKU Scholars Hub

Detecting and comparing non-coding RNAs in the high-throughput era.

Author: Bussotti Giovanni
Enright Anton J
Notredame Cedric
Publication venue: Int J Mol Sci
Publication date: 01/01/2013
Field of study

In recent years there has been a growing interest in the field of non-coding RNA. This surge is a direct consequence of the discovery of a huge number of new non-coding genes and of the finding that many of these transcripts are involved in key cellular functions. In this context, accurately detecting and comparing RNA sequences has become important. Aligning nucleotide sequences is a key requisite when searching for homologous genes. Accurate alignments reveal evolutionary relationships, conserved regions and more generally any biologically relevant pattern. Comparing RNA molecules is, however, a challenging task. The nucleotide alphabet is simpler and therefore less informative than that of amino-acids. Moreover for many non-coding RNAs, evolution is likely to be mostly constrained at the structural level and not at the sequence level. This results in very poor sequence conservation impeding comparison of these molecules. These difficulties define a context where new methods are urgently needed in order to exploit experimental results to their full potential. This review focuses on the comparative genomics of non-coding RNAs in the context of new sequencing technologies and especially dealing with two extremely important and timely research aspects: the development of new methods to align RNAs and the analysis of high-throughput data

CiteSeerX

Directory of Open Access Journals

PubMed Central

Apollo (Cambridge)

Ais-Psmaca: Towards Proposing an Artificial Immune System for Strengthening Psmaca: An Automated Protein Structure Prediction using Multiple Attractor Cellular Automata

Author: Dr. Inampudi Ramesh Babu
P.Kiran Sree
Publication venue: Global Journals Inc. (US)
Publication date: 21/06/2013
Field of study

Predicting the structure of proteins from their amino acid sequences has gained a remarkable attention in recent years. Even though there are some prediction techniques addressing this problem, the approximate accuracy in predicting the protein structure is closely 75%. An automated procedure was evolved with MACA (Multiple Attractor Cellular Automata) for predicting the structure of the protein. Artificial Immune System (AIS-PSMACA) a novel computational intelligence technique is used for strengthening the system (PSMACA) with more adaptability and incorporating more parallelism to the system. Most of the existing approaches are sequential which will classify the input into four major classes and these are designed for similar sequences. AIS-PSMACA is designed to identify ten classes from the sequences that share twilight zone similarity and identity with the training sequences with mixed and hybrid variations. This method also predicts three states (helix, strand, and coil) for the secondary structure. Our comprehensive design considers 10 feature selection methods and 4 classifiers to develop MACA (Multiple Attractor Cellular Automata) based classifiers that are build for each of the ten classes. We have tested the proposed classifier with twilight-zone and 1-high-similarity benchmark datasets with over three dozens of modern competing predictors shows that AIS-PSMACA provides the best overall accuracy that ranges between 80% and 89.8% depending on the dataset

Global Journal of Computer Science and Technology (GJCST)

Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction

Author: Iqbal Sumaiya
Publication venue: ScholarWorks@UNO
Publication date: 09/08/2017
Field of study

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools

University of New Orleans

Classification of G-protein coupled receptors based on support vector machine with maximum relevance minimum redundancy and genetic algorithm

Author: Dai Zong
Li Zhanchao
Zhou Xuan
Zou Xiaoyong
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central