137 research outputs found

    Protein interface prediction using graph convolutional networks

    Get PDF
    2017 Fall.Includes bibliographical references.Proteins play a critical role in processes both within and between cells, through their interactions with each other and other molecules. Proteins interact via an interface forming a protein complex, which is difficult, expensive, and time consuming to determine experimentally, giving rise to computational approaches. These computational approaches utilize known electrochemical properties of protein amino acid residues in order to predict if they are a part of an interface or not. Prediction can occur in a partner independent fashion, where amino acid residues are considered independently of their neighbor, or in a partner specific fashion, where pairs of potentially interacting residues are considered together. Ultimately, prediction of protein interfaces can help illuminate cellular biology, improve our understanding of diseases, and aide pharmaceutical research. Interface prediction has historically been performed with a variety of methods, to include docking, template matching, and more recently, machine learning approaches. The field of machine learning has undergone a revolution of sorts with the emergence of convolutional neural networks as the leading method of choice for a wide swath of tasks. Enabled by large quantities of data and the increasing power and availability of computing resources, convolutional neural networks efficiently detect patterns in grid structured data and generate hierarchical representations that prove useful for many types of problems. This success has motivated the work presented in this thesis, which seeks to improve upon state of the art interface prediction methods by incorporating concepts from convolutional neural networks. Proteins are inherently irregular, so they don't easily conform to a grid structure, whereas a graph representation is much more natural. Various convolution operations have been proposed for graph data, each geared towards a particular application. We adapted these convolutions for use in interface prediction, and proposed two new variants. Neural networks were trained on the Docking Benchmark Dataset version 4.0 complexes and tested on the new complexes added in version 5.0. Results were compared against the state of the art method partner specific method, PAIRpred [1]. Results show that multiple variants of graph convolution outperform PAIRpred, with no method emerging as the clear winner. In the future, additional training data may be incorporated from other sources, unsupervised pretraining such as autoencoding may be employed, and a generalization of convolution to simplicial complexes may also be explored. In addition, the various graph convolution approaches may be applied to other applications with graph structured data, such as Quantitative Structure Activity Relationship (QSAR) learning, and knowledge base inference

    Cyclic multiplex fluorescent immunohistochemistry and machine learning reveal distinct states of astrocytes and microglia in normal aging and Alzheimer’s disease

    Get PDF
    Background Astrocytes and microglia react to Aβ plaques, neurofibrillary tangles, and neurodegeneration in the Alzheimer’s disease (AD) brain. Single-nuclei and single-cell RNA-seq have revealed multiple states or subpopulations of these glial cells but lack spatial information. We have developed a methodology of cyclic multiplex fluorescent immunohistochemistry on human postmortem brains and image analysis that enables a comprehensive morphological quantitative characterization of astrocytes and microglia in the context of their spatial relationships with plaques and tangles. Methods Single FFPE sections from the temporal association cortex of control and AD subjects were subjected to 8 cycles of multiplex fluorescent immunohistochemistry, including 7 astroglial, 6 microglial, 1 neuronal, Aβ, and phospho-tau markers. Our analysis pipeline consisted of: (1) image alignment across cycles; (2) background subtraction; (3) manual annotation of 5172 ALDH1L1+ astrocytic and 6226 IBA1+ microglial profiles; (4) local thresholding and segmentation of profiles; (5) machine learning on marker intensity data; and (6) deep learning on image features. Results Spectral clustering identified three phenotypes of astrocytes and microglia, which we termed “homeostatic,” “intermediate,” and “reactive.” Reactive and, to a lesser extent, intermediate astrocytes and microglia were closely associated with AD pathology (≤ 50 µm). Compared to homeostatic, reactive astrocytes contained substantially higher GFAP and YKL-40, modestly elevated vimentin and TSPO as well as EAAT1, and reduced GS. Intermediate astrocytes had markedly increased EAAT2, moderately increased GS, and intermediate GFAP and YKL-40 levels. Relative to homeostatic, reactive microglia showed increased expression of all markers (CD68, ferritin, MHC2, TMEM119, TSPO), whereas intermediate microglia exhibited increased ferritin and TMEM119 as well as intermediate CD68 levels. Machine learning models applied on either high-plex signal intensity data (gradient boosting machines) or directly on image features (convolutional neural networks) accurately discriminated control vs. AD diagnoses at the single-cell level. Conclusions Cyclic multiplex fluorescent immunohistochemistry combined with machine learning models holds promise to advance our understanding of the complexity and heterogeneity of glial responses as well as inform transcriptomics studies. Three distinct phenotypes emerged with our combination of markers, thus expanding the classic binary “homeostatic vs. reactive” classification to a third state, which could represent “transitional” or “resilient” glia.España Ministry of Science, Innovation, and Universities FPU fellowship to CM-CMassachusetts Alzheimer’s Disease Research Center grant P30AG062421 to BTH, and 1R56AG061196 to BTHAlzheimer’s Association (AACF17-524184 and AACF-17-524184-RAPID to AS-P

    Hybrid deep neural networks for mining heterogeneous data

    Get PDF
    In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity. The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and heterogeneous meta data are modeled. Detecting Copy Number Variations (CNVs) in genetic studies is used as a motivating example. A CNN-DNN blended neural network is proposed to authenticate CNV calls made by current state-of-art CNV detection algorithms. It utilizes hybrid deep neural networks to leverage both scatter plot image signal and heterogeneous numerical meta data for improving CNV calling and review efficiency. The second part of this dissertation deals with data of various frequencies or scales in time series data analysis, the second kind of data heterogeneity. The stock return forecasting problem in the finance field is used as a motivating example. A hybrid framework of Long-Short Term Memory and Deep Neural Network (LSTM-DNN) is developed to enrich the time-series forecasting task with static fundamental information. The application of the proposed framework is not limited to the stock return forecasting problem, but any time-series based prediction tasks. The third part of this dissertation makes an extension of LSTM-DNN framework to account for both temporal and spatial dependency among variables, common in many applications. For example, it is known that stock prices of relevant firms tend to fluctuate together. Such coherent price changes among relevant stocks are referred to a spatial dependency. In this part, Variational Auto Encoder (VAE) is first utilized to recover the latent graphical dependency structure among variables. Then a hybrid deep neural network of Graph Convolutional Network and Long-Short Term Memory network (GCN-LSTM) is developed to model both the graph structured spatial dependency and temporal dependency of variables at different scales. Extensive experiments are conducted to demonstrate the effectiveness of the proposed neural networks with application to solve three representative real-world problems. Additionally, the proposed frameworks can also be applied to other areas filled with similar heterogeneous inputs

    Deep Learning Based Analysis of Prostate Cancer from MP-MRI

    Get PDF
    The diagnosis of prostate cancer faces a problem with over diagnosis that leads to damaging side effects due to unnecessary treatment. Research has shown that the use of multi-parametric magnetic resonance images to conduct biopsies can drastically help to mitigate the over diagnosis, thus reducing the side effects on healthy patients. This study aims to investigate the use of deep learning techniques to explore computer-aid diagnosis based on MRI as input. Several diagnosis problems ranging from classification of lesions as being clinically significant or not to the detection and segmentation of lesions are addressed with deep learning based approaches. This thesis tackled two main problems regarding the diagnosis of prostate cancer. Firstly, a deep neural network architecture, XmasNet, was used to conduct two large experiments on the classification of lesions. Secondly, detection and segmentation experiments were conducted, first on the prostate and afterward on the prostate cancer lesions. The former experiments explored the lesions through a two-dimensional space, while the latter explored models to work with three-dimensional inputs. For this task, the 3D models explored were the 3D U-Net and a pretrained 3D ResNet-18. A rigorous analysis of all these problems was conducted with a total of two networks, two cropping techniques, two resampling techniques, two crop sizes, five input sizes and data augmentations experimented for lesion classification. While for segmentation two models, two input sizes and data augmentations were experimented. Moreover the experiments were conducted for both sequences independently, and within the lesion classification problem, the experiments were also conducted for both sequences simultaneously. However, while the binary classification of the clinical significance of lesions and the detection and segmentation of the prostate already achieve the desired results (0.870 AUC and 0.915 dice score respectively), the classification of the PIRADS score and the segmentation of lesions still have a large margin to improve (0.664 accuracy and 0.690 dice score respectively). It was also studied how some flaws in the dataset can be addressed to improve the results of all these problems. Further research on the problem is still needed, but nonetheless, this thesis established sufficient ground for future work to be conducted
    • …
    corecore