290 research outputs found

    Artificial intelligence methods enhance the discovery of RNA interactions

    Get PDF
    Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type

    Graph Neural Network and Phylogenetic Tree Construction

    Get PDF
    Deep Learning had been widely used in computational biology research in past few years. A great amount of deep learning methods were proposed to solve bioinformatics problems, such as gene function prediction, protein interaction classification, drug effects analysis, and so on; most of these methods yield better solutions than traditional computing methods. However, few methods were proposed to solve problems encountered in evolutionary biology research. In this dissertation, two neural network learning methods are proposed to solve the problems of genome location prediction and median genome generation encountered in phylogenetic tree construction; the ability of neural network learning models on solving evolutionary biology problems will be explored. Phylogenetic tree construction based on genomics genotype has more accurate results than construction based on genomics phenotype. The most famous phylogenetic tree construction framework utilizes median genome algorithms to filter tree topology structure and update phylogenetic ancestral genome. Currently, there are several median genome algorithms which could be applied on short genome and simple evolution pattern, however when genome length becomes longer and evolution pattern is complex these algorithms have unstable performance and exceptionally long running time. In order to lift these limitations, a novel median genome generator based on graph neural network learning model is proposed in this research. With graph neural network, genome rearrangement pattern and genome relation could be extracted out from internal gene connection. Experiment results show that this generator could obtain stable median genome results in constant time no matter how long or how complex genomes are; its outstanding performance makes it the best choice in GRAPPA framework for phylogenetic tree construction

    Light Microscopy Combined with Computational Image Analysis Uncovers Virus-Specific Infection Phenotypes and Host Cell State Variability

    Get PDF
    Abstract: The study of virus infection phenotypes and variability plays a critical role in understanding viral pathogenesis and host response. Virus-host interactions can be investigated by light and various label-free microscopy methods, which provide a powerful tool for the spatiotemporal analysis of patterns at the cellular and subcellular levels in live or fixed cells. Analysis of microscopy images is increasingly complemented by sophisticated statistical methods and leverages artificial intelligence (AI) to address the tasks of image denoising, segmentation, classification, and tracking. Work in this thesis demonstrates that combining microscopy with AI techniques enables models that accurately detect and quantify viral infection due to the virus-induced cytopathic effect (CPE). Furthermore, it shows that statistical analysis of microscopy image data can disentangle stochastic and deterministic factors that contribute to viral infection variability, such as the cellular state. In summary, the integration of microscopy and computational image analysis offers a powerful and flexible approach for studying virus infection phenotypes and variability, ultimately contributing to a more advanced understanding of infection processes and creating possibilities for the development of more effective antiviral strategies

    Identifying disease-associated genes based on artificial intelligence

    Get PDF
    Identifying disease-gene associations can help improve the understanding of disease mechanisms, which has a variety of applications, such as early diagnosis and drug development. Although experimental techniques, such as linkage analysis, genome-wide association studies (GWAS), have identified a large number of associations, identifying disease genes is still challenging since experimental methods are usually time-consuming and expensive. To solve these issues, computational methods are proposed to predict disease-gene associations. Based on the characteristics of existing computational algorithms in the literature, we can roughly divide them into three categories: network-based methods, machine learning-based methods, and other methods. No matter what models are used to predict disease genes, the proper integration of multi-level biological data is the key to improving prediction accuracy. This thesis addresses some limitations of the existing computational algorithms, and integrates multi-level data via artificial intelligence techniques. The thesis starts with a comprehensive review of computational methods, databases, and evaluation methods used in predicting disease-gene associations, followed by one network-based method and four machine learning-based methods. The first chapter introduces the background information, objectives of the studies and structure of the thesis. After that, a comprehensive review is provided in the second chapter to discuss the existing algorithms as well as the databases and evaluation methods used in existing studies. Having the objectives and future directions, the thesis then presents five computational methods for predicting disease-gene associations. The first method proposed in Chapter 3 considers the issue of non-disease gene selection. A shortest path-based strategy is used to select reliable non-disease genes from a disease gene network and a differential network. The selected genes are then used by a network-energy model to improve its performance. The second method proposed in Chapter 4 constructs sample-based networks for case samples and uses them to predict disease genes. This strategy improves the quality of protein-protein interaction (PPI) networks, which further improves the prediction accuracy. Chapter 5 presents a generic model which applies multimodal deep belief nets (DBN) to fuse different types of data. Network embeddings extracted from PPI networks and gene ontology (GO) data are fused with the multimodal DBN to obtain cross-modality representations. Chapter 6 presents another deep learning model which uses a convolutional neural network (CNN) to integrate gene similarities with other types of data. Finally, the fifth method proposed in Chapter 7 is a nonnegative matrix factorization (NMF)-based method. This method maps diseases and genes onto a lower-dimensional manifold, and the geodesic distance between diseases and genes are used to predict their associations. The method can predict disease genes even if the disease under consideration has no known associated genes. In summary, this thesis has proposed several artificial intelligence-based computational algorithms to address the typical issues existing in computational algorithms. Experimental results have shown that the proposed methods can improve the accuracy of disease-gene prediction

    Machine Learning Based Disease Gene Identification and MHC Immune Protein-peptide Binding Prediction

    Get PDF
    Machine learning and deep learning methods have been increasingly applied to solve challenging and important bioinformatics problems such as protein structure prediction, disease gene identification, and drug discovery. However, the performances of existing machine learning based predictive models are still not satisfactory. The question of how to exploit the specific properties of bioinformatics data and couple them with the unique capabilities of the learning algorithms remains elusive. In this dissertation, we propose advanced machine learning and deep learning algorithms to address two important problems: mislocation-related cancer gene identification and major histocompatibility complex-peptide binding affinity prediction. Our first contribution proposes a kernel-based logistic regression algorithm for identifying potential mislocation-related genes among known cancer genes. Our algorithm takes protein-protein interaction networks, gene expression data, and subcellular location gene ontology data as input, which is particularly lightweight comparing with existing methods. The experiment results demonstrate that our proposed pipeline has a good capability to identify mislocation-related cancer genes. Our second contribution addresses the modeling and prediction of human leukocyte antigen (HLA) peptide binding of human immune system. We present an allelespecific convolutional neural network model with one-hot encoding. With extensive evaluation over the standard IEDB datasets, it is shown that the performance of our model is better than all existing prediction models. To achieve further improvement, we propose a novel pan-specific model on peptide-HLA class I binding affinities prediction, which allows us to exploit all the training samples of different HLA alleles. iv Our sequence based pan model is currently the only algorithm not using pseudo sequence encoding — a dominant structure-based encoding method in this area. The benchmark studies show that our method could achieve state-of-the-art performance. Our proposed model could be integrated into existing ensemble methods to improve their overall prediction capabilities on highly diverse MHC alleles. Finally, we present a LSTM-CNN deep learning model with attention mechanism for peptide-HLA class II binding affinities and binding cores prediction. Our model achieved very good performance and outperformed existing methods on half of tested alleles. With the help of attention mechanism, our model could directly output the peptide binding core based on attention weight without any additional post- or preprocessing

    Closed-loop experiments and brain machine interfaces with multiphoton microscopy

    Full text link
    In the field of neuroscience, the importance of constructing closed-loop experimental systems has increased in conjunction with technological advances in measuring and controlling neural activity in live animals. This paper provides an overview of recent technological advances in the field, focusing on closed-loop experimental systems where multiphoton microscopy (the only method capable of recording and controlling targeted population activity of neurons at a single-cell resolution in vivo) works through real-time feedback. Specifically, we present some examples of brain machine interfaces (BMIs) using in vivo two-photon calcium imaging and discuss applications of two-photon optogenetic stimulation and adaptive optics to real-time BMIs. We also consider conditions for realizing future optical BMIs at the synaptic level, and their possible roles in understanding the computational principles of the brain

    Development of a deep learning-based computational framework for the classification of protein sequences

    Get PDF
    Dissertação de mestrado em BioinformaticsProteins are one of the more important biological structures in living organisms, since they perform multiple biological functions. Each protein has different characteristics and properties, which can be employed in many industries, such as industrial biotechnology, clinical applications, among others, demonstrating a positive impact. Modern high-throughput methods allow protein sequencing, which provides the protein sequence data. Machine learning methodologies are applied to characterize proteins using information of the protein sequence. However, a major problem associated with this method is how to properly encode the protein sequences without losing the biological relationship between the amino acid residues. The transformation of the protein sequence into a numeric representation is done by encoder methods. In this sense, the main objective of this project is to study different encoders and identify the methods which yield the best biological representation of the protein sequences, when used in machine learning (ML) models to predict different labels related to their function. The methods were analyzed in two study cases. The first is related to enzymes, since they are a well-established case in the literature. The second used transporter sequences, a lesser studied case in the literature. In both cases, the data was collected from the curated database Swiss-Prot. The encoders that were tested include: calculated protein descriptors; matrix substitution methods; position-specific scoring matrices; and encoding by pre-trained transformer methods. The use of state-of-the-art pretrained transformers to encode protein sequences proved to be a good biological representation for subsequent application in state-of-the-art ML methods. Namely, the ESM-1b transformer achieved a Mathews correlation coefficient above 0.9 for any multiclassification task of the transporter classification system.As proteínas são estruturas biológicas importantes dos organismos vivos, uma vez que estas desempenham múltiplas funções biológicas. Cada proteína tem características e propriedades diferentes, que podem ser aplicadas em diversas indústrias, tais como a biotecnologia industrial, aplicações clínicas, entre outras, demonstrando um impacto positivo. Os métodos modernos de alto rendimento permitem a sequenciação de proteínas, fornecendo dados da sequência proteica. Metodologias de aprendizagem de máquinas tem sido aplicada para caracterizar as proteínas utilizando informação da sua sequência. Um problema associado a este método e como representar adequadamente as sequências proteicas sem perder a relação biológica entre os resíduos de aminoácidos. A transformação da sequência de proteínas numa representação numérica é feita por codificadores. Neste sentido, o principal objetivo deste projeto é estudar diferentes codificadores e identificar os métodos que produzem a melhor representação biológica das sequências proteicas, quando utilizados em modelos de aprendizagem mecânica para prever a classificação associada à sua função a sua função. Os métodos foram analisados em dois casos de estudo. O primeiro caso foi baseado em enzimas, uma vez que são um caso bem estabelecido na literatura. O segundo, na utilização de proteínas de transportadores, um caso menos estudado na literatura. Em ambos os casos, os dados foram recolhidos a partir da base de dados curada Swiss-Prot. Os codificadores testados incluem: descritores de proteínas calculados; métodos de substituição por matrizes; matrizes de pontuação específicas da posição; e codificação por modelos de transformadores pré-treinados. A utilização de transformadores de última geração para codificar sequências de proteínas demonstrou ser uma boa representação biológica para aplicação subsequente em métodos ML de última geração. Nomeadamente, o transformador ESM-1b atingiu um coeficiente de correlação de Matthews acima de 0,9 para multiclassificação do sistema de classificação de proteínas transportadoras

    Deep Learning for Detection and Segmentation in High-Content Microscopy Images

    Get PDF
    High-content microscopy led to many advances in biology and medicine. This fast emerging technology is transforming cell biology into a big data driven science. Computer vision methods are used to automate the analysis of microscopy image data. In recent years, deep learning became popular and had major success in computer vision. Most of the available methods are developed to process natural images. Compared to natural images, microscopy images pose domain specific challenges such as small training datasets, clustered objects, and class imbalance. In this thesis, new deep learning methods for object detection and cell segmentation in microscopy images are introduced. For particle detection in fluorescence microscopy images, a deep learning method based on a domain-adapted Deconvolution Network is presented. In addition, a method for mitotic cell detection in heterogeneous histopathology images is proposed, which combines a deep residual network with Hough voting. The method is used for grading of whole-slide histology images of breast carcinoma. Moreover, a method for both particle detection and cell detection based on object centroids is introduced, which is trainable end-to-end. It comprises a novel Centroid Proposal Network, a layer for ensembling detection hypotheses over image scales and anchors, an anchor regularization scheme which favours prior anchors over regressed locations, and an improved algorithm for Non-Maximum Suppression. Furthermore, a novel loss function based on Normalized Mutual Information is proposed which can cope with strong class imbalance and is derived within a Bayesian framework. For cell segmentation, a deep neural network with increased receptive field to capture rich semantic information is introduced. Moreover, a deep neural network which combines both paradigms of multi-scale feature aggregation of Convolutional Neural Networks and iterative refinement of Recurrent Neural Networks is proposed. To increase the robustness of the training and improve segmentation, a novel focal loss function is presented. In addition, a framework for black-box hyperparameter optimization for biomedical image analysis pipelines is proposed. The framework has a modular architecture that separates hyperparameter sampling and hyperparameter optimization. A visualization of the loss function based on infimum projections is suggested to obtain further insights into the optimization problem. Also, a transfer learning approach is presented, which uses only one color channel for pre-training and performs fine-tuning on more color channels. Furthermore, an approach for unsupervised domain adaptation for histopathological slides is presented. Finally, Galaxy Image Analysis is presented, a platform for web-based microscopy image analysis. Galaxy Image Analysis workflows for cell segmentation in cell cultures, particle detection in mice brain tissue, and MALDI/H&E image registration have been developed. The proposed methods were applied to challenging synthetic as well as real microscopy image data from various microscopy modalities. It turned out that the proposed methods yield state-of-the-art or improved results. The methods were benchmarked in international image analysis challenges and used in various cooperation projects with biomedical researchers

    VPS13D REGULATES MITOCHONDRIAL MORPHOLOGY AND PROMOTES PEROXISOME BIOGENESIS IN HUMAN CELLS

    Get PDF
    The VPS13 family is highly conserved among eukaryotes, from yeast to humans. There are four members of this family in mammals: VPS13A-D. Each gene has been near or completely definitely linked to distinct neurological disorders: Chorea-Acanthocytosis (VPS13A), Cohen Syndrome (VPS13B), Parkinson’s Disease (VPS13C), and Spinocerebellar Ataxia, autosomal recessive 4 (SCAR4). Until recently, their cellular functions were poorly understood and there remains much more to be elucidated. Recent research strongly suggests that yeast Vps13 and its orthologs could function as lipid transfer proteins at membrane contact sites (MCSs). My thesis work strengthens the argument to designate VPS13D as a MCS lipid transfer protein. We generated VPS13A-D knockout (KO) HeLa cell lines for characterization. Of the individual KOs, only VPS13D had an apparent effect on organelle phenotypes. VPS13D-KO cells exhibit abnormal mitochondrial morphology. We also found that VPS13D loss induces a partial or total loss of peroxisomes. We support these findings by establishing non-homogenous VPS13D KO in other human cell types. This work also provides a clinical insight, as we found a similar (though less severe) peroxisomal phenotype in SCAR4 patient fibroblasts. Our data show that VPS13D specifically regulates peroxisome number through biogenesis. In order to more precisely study the visual mitochondrial and peroxisomal phenotypes, I developed a method utilizing deep learning (DL) to generate two image classification models. These models provide an automated and less biased way to quantitate image-based phenotypes; in this case, the identification of peroxisomal loss by catalase (CAT) localization, and the rounded mitochondria phenotype distinct to VPS13D KO. This tool allows for accurate assaying of VPS13D’s role in both organelles, and even their linkage to each other
    corecore