4,195 research outputs found

    Protein Secondary Structure Prediction Using Cascaded Convolutional and Recurrent Neural Networks

    Full text link
    Protein secondary structure prediction is an important problem in bioinformatics. Inspired by the recent successes of deep neural networks, in this paper, we propose an end-to-end deep network that predicts protein secondary structures from integrated local and global contextual features. Our deep architecture leverages convolutional neural networks with different kernel sizes to extract multiscale local contextual features. In addition, considering long-range dependencies existing in amino acid sequences, we set up a bidirectional neural network consisting of gated recurrent unit to capture global contextual features. Furthermore, multi-task learning is utilized to predict secondary structure labels and amino-acid solvent accessibility simultaneously. Our proposed deep network demonstrates its effectiveness by achieving state-of-the-art performance, i.e., 69.7% Q8 accuracy on the public benchmark CB513, 76.9% Q8 accuracy on CASP10 and 73.1% Q8 accuracy on CASP11. Our model and results are publicly available.Comment: 8 pages, 3 figures, Accepted by International Joint Conferences on Artificial Intelligence (IJCAI

    Prediction of 8-state protein secondary structures by a novel deep learning architecture

    Full text link
    Β© 2018 The Author(s). Background: Protein secondary structure can be regarded as an information bridge that links the primary sequence and tertiary structure. Accurate 8-state secondary structure prediction can significantly give more precise and high resolution on structure-based properties analysis. Results: We present a novel deep learning architecture which exploits an integrative synergy of prediction by a convolutional neural network, residual network, and bidirectional recurrent neural network to improve the performance of protein secondary structure prediction. A local block comprised of convolutional filters and original input is designed for capturing local sequence features. The subsequent bidirectional recurrent neural network consisting of gated recurrent units can capture global context features. Furthermore, the residual network can improve the information flow between the hidden layers and the cascaded recurrent neural network. Our proposed deep network achieved 71.4% accuracy on the benchmark CB513 dataset for the 8-state prediction; and the ensemble learning by our model achieved 74% accuracy. Our model generalization capability is also evaluated on other three independent datasets CASP10, CASP11 and CASP12 for both 8- and 3-state prediction. These prediction performances are superior to the state-of-the-art methods. Conclusion: Our experiment demonstrates that it is a valuable method for predicting protein secondary structure, and capturing local and global features concurrently is very useful in deep learning

    Geometric combinatorics and computational molecular biology: branching polytopes for RNA sequences

    Full text link
    Questions in computational molecular biology generate various discrete optimization problems, such as DNA sequence alignment and RNA secondary structure prediction. However, the optimal solutions are fundamentally dependent on the parameters used in the objective functions. The goal of a parametric analysis is to elucidate such dependencies, especially as they pertain to the accuracy and robustness of the optimal solutions. Techniques from geometric combinatorics, including polytopes and their normal fans, have been used previously to give parametric analyses of simple models for DNA sequence alignment and RNA branching configurations. Here, we present a new computational framework, and proof-of-principle results, which give the first complete parametric analysis of the branching portion of the nearest neighbor thermodynamic model for secondary structure prediction for real RNA sequences.Comment: 17 pages, 8 figure

    생물학적 μ„œμ—΄ 데이터에 λŒ€ν•œ ν‘œν˜„ ν•™μŠ΅

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·정보곡학뢀, 2021.8. μœ€μ„±λ‘œ.As we are living in the era of big data, the biomedical domain is not an exception. With the advent of technologies such as next-generation sequencing, developing methods to capitalize on the explosion of biomedical data is one of the most major challenges in bioinformatics. Representation learning, in particular deep learning, has made significant advancements in diverse fields where the artificial intelligence community has struggled for many years. However, although representation learning has also shown great promises in bioinformatics, it is not a silver bullet. Off-the-shelf applications of representation learning cannot always provide successful results for biological sequence data. There remain full of challenges and opportunities to be explored. This dissertation presents a set of representation learning methods to address three issues in biological sequence data analysis. First, we propose a two-stage training strategy to address throughput and information trade-offs within wet-lab CRISPR-Cpf1 activity experiments. Second, we propose an encoding scheme to model interaction between two sequences for functional microRNA target prediction. Third, we propose a self-supervised pre-training method to bridge the exponentially growing gap between the numbers of unlabeled and labeled protein sequences. In summary, this dissertation proposes a set of representation learning methods that can derive invaluable information from the biological sequence data.μš°λ¦¬λŠ” λΉ…λ°μ΄ν„°μ˜ μ‹œλŒ€λ₯Ό λ§žμ΄ν•˜κ³  있으며, μ˜μƒλͺ… λΆ„μ•Ό λ˜ν•œ μ˜ˆμ™Έκ°€ μ•„λ‹ˆλ‹€. μ°¨μ„ΈλŒ€ μ—ΌκΈ°μ„œμ—΄ 뢄석과 같은 κΈ°μˆ λ“€μ΄ λ„λž˜ν•¨μ— 따라, 폭발적인 μ˜μƒλͺ… λ°μ΄ν„°μ˜ 증가λ₯Ό ν™œμš©ν•˜κΈ° μœ„ν•œ λ°©λ²•λ‘ μ˜ κ°œλ°œμ€ 생물정보학 λΆ„μ•Όμ˜ μ£Όμš” 과제 μ€‘μ˜ ν•˜λ‚˜μ΄λ‹€. 심측 ν•™μŠ΅μ„ ν¬ν•¨ν•œ ν‘œν˜„ ν•™μŠ΅ 기법듀은 인곡지λŠ₯ 학계가 μ˜€λž«λ™μ•ˆ 어렀움을 κ²ͺμ–΄μ˜¨ λ‹€μ–‘ν•œ λΆ„μ•Όμ—μ„œ μƒλ‹Ήν•œ λ°œμ „μ„ μ΄λ£¨μ—ˆλ‹€. ν‘œν˜„ ν•™μŠ΅μ€ 생물정보학 λΆ„μ•Όμ—μ„œλ„ λ§Žμ€ κ°€λŠ₯성을 λ³΄μ—¬μ£Όμ—ˆλ‹€. ν•˜μ§€λ§Œ λ‹¨μˆœν•œ μ μš©μœΌλ‘œλŠ” 생물학적 μ„œμ—΄ 데이터 λΆ„μ„μ˜ 성곡적인 κ²°κ³Όλ₯Ό 항상 얻을 μˆ˜λŠ” μ•ŠμœΌλ©°, μ—¬μ „νžˆ 연ꡬ가 ν•„μš”ν•œ λ§Žμ€ λ¬Έμ œλ“€μ΄ λ‚¨μ•„μžˆλ‹€. λ³Έ ν•™μœ„λ…Όλ¬Έμ€ 생물학적 μ„œμ—΄ 데이터 뢄석과 κ΄€λ ¨λœ μ„Έ 가지 μ‚¬μ•ˆμ„ ν•΄κ²°ν•˜κΈ° μœ„ν•΄, ν‘œν˜„ ν•™μŠ΅μ— κΈ°λ°˜ν•œ 일련의 방법둠듀을 μ œμ•ˆν•œλ‹€. 첫 번째둜, μœ μ „μžκ°€μœ„ μ‹€ν—˜ 데이터에 λ‚΄μž¬λœ 정보와 수율의 κ· ν˜•μ— λŒ€μ²˜ν•  수 μžˆλŠ” 2단계 ν•™μŠ΅ 기법을 μ œμ•ˆν•œλ‹€. 두 번째둜, 두 μ—ΌκΈ° μ„œμ—΄ κ°„μ˜ μƒν˜Έ μž‘μš©μ„ ν•™μŠ΅ν•˜κΈ° μœ„ν•œ λΆ€ν˜Έν™” 방식을 μ œμ•ˆν•œλ‹€. μ„Έ 번째둜, κΈ°ν•˜κΈ‰μˆ˜μ μœΌλ‘œ μ¦κ°€ν•˜λŠ” νŠΉμ§•λ˜μ§€ μ•Šμ€ λ‹¨λ°±μ§ˆ μ„œμ—΄μ„ ν™œμš©ν•˜κΈ° μœ„ν•œ 자기 지도 사전 ν•™μŠ΅ 기법을 μ œμ•ˆν•œλ‹€. μš”μ•½ν•˜μžλ©΄, λ³Έ ν•™μœ„λ…Όλ¬Έμ€ 생물학적 μ„œμ—΄ 데이터λ₯Ό λΆ„μ„ν•˜μ—¬ μ€‘μš”ν•œ 정보λ₯Ό λ„μΆœν•  수 μžˆλŠ” ν‘œν˜„ ν•™μŠ΅μ— κΈ°λ°˜ν•œ 일련의 방법둠듀을 μ œμ•ˆν•œλ‹€.1 Introduction 1 1.1 Motivation 1 1.2 Contents of Dissertation 4 2 Background 8 2.1 Representation Learning 8 2.2 Deep Neural Networks 12 2.2.1 Multi-layer Perceptrons 12 2.2.2 Convolutional Neural Networks 14 2.2.3 Recurrent Neural Networks 16 2.2.4 Transformers 19 2.3 Training of Deep Neural Networks 23 2.4 Representation Learning in Bioinformatics 26 2.5 Biological Sequence Data Analyses 29 2.6 Evaluation Metrics 32 3 CRISPR-Cpf1 Activity Prediction 36 3.1 Methods 39 3.1.1 Model Architecture 39 3.1.2 Training of Seq-deepCpf1 and DeepCpf1 41 3.2 Experiment Results 44 3.2.1 Datasets 44 3.2.2 Baselines 47 3.2.3 Evaluation of Seq-deepCpf1 49 3.2.4 Evaluation of DeepCpf1 51 3.3 Summary 55 4 Functional microRNA Target Prediction 56 4.1 Methods 62 4.1.1 Candidate Target Site Selection 63 4.1.2 Input Encoding 64 4.1.3 Residual Network 67 4.1.4 Post-processing 68 4.2 Experiment Results 70 4.2.1 Datasets 70 4.2.2 Classification of Functional and Non-functional Targets 71 4.2.3 Distinguishing High-functional Targets 73 4.2.4 Ablation Studies 76 4.3 Summary 77 5 Self-supervised Learning of Protein Representations 78 5.1 Methods 83 5.1.1 Pre-training Procedure 83 5.1.2 Fine-tuning Procedure 86 5.1.3 Model Architecturen 87 5.2 Experiment Results 90 5.2.1 Experiment Setup 90 5.2.2 Pre-training Results 92 5.2.3 Fine-tuning Results 93 5.2.4 Comparison with Larger Protein Language Models 97 5.2.5 Ablation Studies 100 5.2.6 Qualitative Interpreatation Analyses 103 5.3 Summary 106 6 Discussion 107 6.1 Challenges and Opportunities 107 7 Conclusion 111 Bibliography 113 Abstract in Korean 130λ°•
    • …
    corecore