98,402 research outputs found
Using transfer learning and loss function adaptation for RNA secondary structure prediction
openThe problem of predicting RNA secondary structure is a challenging topic,
which involves various fields of computer science. Accurate solutions to this
problem are helpful in the disciplines of medicine for vaccine development,
to design stable mRNA molecules, or biology for discerning between different
functions of various RNA molecules according to their shape.
The objective of this project is to study an emerging Machine Learning-based
approach to the problem of RNA secondary structure prediction via integration
of deep learning techniques like transfer learning and convolutional neural net-
works, aided by adaptations made for the specific problem at hand, like data
representation and loss function.
The objective of this project is to provide a new robust Machine Learning-based approach to the problem of RNA secondary structure prediction via integration and improvement of emerging Deep Learning techniques.The problem of predicting RNA secondary structure is a challenging topic,
which involves various fields of computer science. Accurate solutions to this
problem are helpful in the disciplines of medicine for vaccine development,
to design stable mRNA molecules, or biology for discerning between different
functions of various RNA molecules according to their shape.
The objective of this project is to study an emerging Machine Learning-based
approach to the problem of RNA secondary structure prediction via integration
of deep learning techniques like transfer learning and convolutional neural net-
works, aided by adaptations made for the specific problem at hand, like data
representation and loss function.
The objective of this project is to provide a new robust Machine Learning-based approach to the problem of RNA secondary structure prediction via integration and improvement of emerging Deep Learning techniques
Generative Tertiary Structure-based RNA Design
Learning from 3D biological macromolecules with artificial intelligence
technologies has been an emerging area. Computational protein design, known as
the inverse of protein structure prediction, aims to generate protein sequences
that will fold into the defined structure. Analogous to protein design, RNA
design is also an important topic in synthetic biology, which aims to generate
RNA sequences by given structures. However, existing RNA design methods mainly
focus on the secondary structure, ignoring the informative tertiary structure,
which is commonly used in protein design. To explore the complex coupling
between RNA sequence and 3D structure, we introduce an RNA tertiary structure
modeling method to efficiently capture useful information from the 3D structure
of RNA. For a fair comparison, we collect abundant RNA data and split the data
according to tertiary structures. With the standard dataset, we conduct a
benchmark by employing structure-based protein design approaches with our RNA
tertiary structure modeling method. We believe our work will stimulate the
future development of tertiary structure-based RNA design and bridge the gap
between the RNA 3D structures and sequences
Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques
Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches
Identification of RNA Binding Proteins and RNA Binding Residues Using Effective Machine Learning Techniques
Identification and annotation of RNA Binding Proteins (RBPs) and RNA Binding residues from sequence information alone is one of the most challenging problems in computational biology. RBPs play crucial roles in several fundamental biological functions including transcriptional regulation of RNAs and RNA metabolism splicing. Existing experimental techniques are time-consuming and costly. Thus, efficient computational identification of RBPs directly from the sequence can be useful to annotate RBP and assist the experimental design. Here, we introduce AIRBP, a computational sequence-based method, which utilizes features extracted from evolutionary information, physiochemical properties, and disordered properties to train a machine learning method designed using stacking, an advanced machine learning technique, for effective prediction of RBPs. Furthermore, it makes use of efficient machine learning algorithms like Support Vector Machine, Logistic Regression, K-Nearest Neighbor and XGBoost (Extreme Gradient Boosting Algorithm). In this research work, we also propose another predictor for efficient annotation of RBP residues. This RBP residue predictor also uses stacking and evolutionary algorithms for efficient annotation of RBPs and RNA Binding residue. The RNA-binding residue predictor also utilizes various evolutionary, physicochemical and disordered properties to train a robust model. This thesis presents a possible solution to the RBP and RNA binding residue prediction problem through two independent predictors, both of which outperform existing state-of-the-art approaches
Deep learning models for predicting RNA degradation via dual crowdsourcing
Medicines based on messenger RNA (mRNA) hold immense potential, as evidenced by their rapid deployment as COVID-19 vaccines. However, worldwide distribution of mRNA molecules has been limited by their thermostability, which is fundamentally limited by the intrinsic instability of RNA molecules to a chemical degradation reaction called in-line hydrolysis. Predicting the degradation of an RNA molecule is a key task in designing more stable RNA-based therapeutics. Here, we describe a crowdsourced machine learning competition (‘Stanford OpenVaccine’) on Kaggle, involving single-nucleotide resolution measurements on 6,043 diverse 102–130-nucleotide RNA constructs that were themselves solicited through crowdsourcing on the RNA design platform Eterna. The entire experiment was completed in less than 6 months, and 41% of nucleotide-level predictions from the winning model were within experimental error of the ground truth measurement. Furthermore, these models generalized to blindly predicting orthogonal degradation data on much longer mRNA molecules (504–1,588 nucleotides) with improved accuracy compared with previously published models. These results indicate that such models can represent in-line hydrolysis with excellent accuracy, supporting their use for designing stabilized messenger RNAs. The integration of two crowdsourcing platforms, one for dataset creation and another for machine learning, may be fruitful for other urgent problems that demand scientific discovery on rapid timescales
- …