453 research outputs found

    New methods for protein structure prediction using machine learning and deep learning

    Get PDF
    Computational protein structure prediction is one of the most challenging problems in bioinformatics area. Due to the widespread use of sampling-and-selection strategy, protein model quality assessment became important. In this dissertation, new machine learning and deep learning methods have been proposed for protein model quality assessment, protein contact prediction, protein model refinement, and loop modeling. The goal of model quality assessment (QA) is to estimate the quality of predicted protein models. First, two new single-model QA methods based on Residual Neural Networks, called PDRN and VDRN, were proposed to achieve state-of-the-art performance. They used a comprehensive set of structure features to predict a quality score in the range of [0, 1]. Next, three single-model QA methods, MMQA-1 MMQA-2 and MMQA-HE, were proposed based on ideas of two-stage learning and hierarchical ensembles. MMQA-1 and MMQA-2 divided the entire feature set into two different sets and used different feature sets and training data in each stage of learning. In addition, MMQA-HE created ensembles of models in the first stage of learning for improved performance. In CASP14, MMQA-1 ranked NO. 2 in terms of average GDT-TS difference. MMQA-2 and MMQA-HE outperformed MMQA-1 consistently across different QA performance metrics in our experiments. Furthermore, a quasi-single-model QA method called INC-QA was proposed using a new method that trained a deep neural network as a QA predictor for each protein target based on template structure information generated from the target sequence. Experimental results using CASP data showed that INC-QA achieved state-of-the-art results, outperforming existing methods on CASP QA stage 2 category on CASP 13 targets. With the release of groundbreaking protein structure prediction software AlphaFold2 and RosettaFold, many research teams start using them to generate highly accurate protein models. We evaluated the performance of different QA methods on models generated by them with random modification by 3DRobot and found that multi-model QA methods were still better than single-model QA methods on these kind of high-performance model pools. Finally, in terms of the prediction of overall folding accuracy and overall interface accuracy for protein complexes in CASP15, we found a strong correlation between the predicted folding accuracy and predicted interface accuracy of protein models. Loop modeling tries to predict the conformation of a relatively short stretch of protein backbone and sidechain. It is a difficult problem due to conformational variability. AlphaFold2 achieved outstanding results in 3-D protein structure prediction and was expected to perform well on loop modeling. We investigated the performances of AlphaFold2 variants on loop modeling benchmark datasets and proposed an efficient constant-time method of using AlphaFold2 for loop modeling, called IAFLoop. To predict the structure of a loop region, IAFLoop ran a fast version of AlphaFold2 with a reduced database without ensembling on an extended segment of the target loop region, and used RMSD based consensus scores to select the top models. Our experimental results showed that IAFLoop generated highly accurate loop models, outperforming basic AlphaFold2 by up to 17 percent in RMSD error, while using less than half of the time. Compared to the previous best method, IAFLoop reduces the RMSD error by more than half. Contact map prediction is to predict whether the Euclidean distance between two C[beta] atoms (C[alpha] for Glycine) in a protein structure is less than 8 angstroms. Contacts information can act as a powerful constraint for determining the overall structural and assist the protein 3D structure prediction process. Based on MUFold-Contact, a new two-stage multi-branch deep neural network based on Residual Network and Inception V3 Network was proposed to improve the performance of MUFold-Contact. In the first stage, distance maps of shortrange, medium-range and long-range residue pairs were predicted, respectively, and the predicted distance along with other features were used as input to predict a binary contact map in the second stage. The role of protein structure refinement is to take models generated by protein structure prediction process and bring them closer to the true native structure. Inspired by AlphaFold in CASP13, a new protein structure refinement process MUFOLD-REFINE based on distance distribution of template pool was developed and achieve improved performance over the MUFOLD refinement method used in CASP13Includes bibliographical references

    Machine learning methods for evaluating the quality of a single protein model using energy and structural properties

    Get PDF
    Computational protein structure prediction is one of the most important problems in bioinformatics. In the process of protein three-dimensional structure prediction, assessing the quality of generated models accurately is crucial. Although many model quality assessment (QA) methods have been developed in the past years, the accuracy of the state-of-the-art single-model QA methods is still not high enough for practical applications. Although consensus QA methods performed significantly better than single-model QA methods in the CASP (Critical Assessment of protein Structure Prediction) competitions, they require a pool of models with diverse quality to perform well. In this thesis, new machine learning based methods are developed for single-model QA and top-model selection from a pool of candidates. These methods are based on a comprehensive set of model structure features, such as matching of secondary structure and solvent accessibility, as well as existing potential or energy function scores. For each model, using these features as inputs, machine learning methods are able to predict a quality score in the range of. Five state-of-the-art machine learning algorithms are implemented, trained, and tested using CASP datasets on various QA and selection tasks. Among the five algorithms, boosting and random forest achieved the best results overall. They outperform existing single-model QA methods, including DFIRE, RW and Proq2, significantly, by up to 10% in QA scores

    Optimal Estimator Design and Properties Analysis for Interconnected Systems with Asymmetric Information Structure

    Full text link
    This paper studies the optimal state estimation problem for interconnected systems. Each subsystem can obtain its own measurement in real time, while, the measurements transmitted between the subsystems suffer from random delay. The optimal estimator is analytically designed for minimizing the conditional error covariance. The boundedness of the expected error covariance (EEC) is analyzed. In particular, a new condition that is easy to verify is established for the boundedness of EEC. Further, the properties of EEC with respect to the delay probability are studied. We found that there exists a critical probability such that the EEC is bounded if the delay probability is below the critical probability. Also, a lower and upper bound of the critical probability is derived. Finally, the proposed results are applied to a power system, and the effectiveness of the designed methods is illustrated by simulations

    Primal Dual Alternating Proximal Gradient Algorithms for Nonsmooth Nonconvex Minimax Problems with Coupled Linear Constraints

    Full text link
    Nonconvex minimax problems have attracted wide attention in machine learning, signal processing and many other fields in recent years. In this paper, we propose a primal dual alternating proximal gradient (PDAPG) algorithm and a primal dual proximal gradient (PDPG-L) algorithm for solving nonsmooth nonconvex-strongly concave and nonconvex-linear minimax problems with coupled linear constraints, respectively. The corresponding iteration complexity of the two algorithms are proved to be O(ε−2)\mathcal{O}\left( \varepsilon ^{-2} \right) and O(ε−3)\mathcal{O}\left( \varepsilon ^{-3} \right) to reach an ε\varepsilon-stationary point, respectively. To our knowledge, they are the first two algorithms with iteration complexity guarantee for solving the two classes of minimax problems

    EdgeYOLO: An Edge-Real-Time Object Detector

    Full text link
    This paper proposes an efficient, low-complexity and anchor-free object detector based on the state-of-the-art YOLO framework, which can be implemented in real time on edge computing platforms. We develop an enhanced data augmentation method to effectively suppress overfitting during training, and design a hybrid random loss function to improve the detection accuracy of small objects. Inspired by FCOS, a lighter and more efficient decoupled head is proposed, and its inference speed can be improved with little loss of precision. Our baseline model can reach the accuracy of 50.6% AP50:95 and 69.8% AP50 in MS COCO2017 dataset, 26.4% AP50:95 and 44.8% AP50 in VisDrone2019-DET dataset, and it meets real-time requirements (FPS>=30) on edge-computing device Nvidia Jetson AGX Xavier. We also designed lighter models with less parameters for edge computing devices with lower computing power, which also show better performances. Our source code, hyper-parameters and model weights are all available at https://github.com/LSH9832/edgeyolo
    • …
    corecore