468 research outputs found

    Computational Approaches to Drug Profiling and Drug-Protein Interactions

    Get PDF
    Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a long period of stagnation in drug approvals. Due to the extreme costs associated with introducing a drug to the market, locating and understanding the reasons for clinical failure is key to future productivity. As part of this PhD, three main contributions were made in this respect. First, the web platform, LigNFam enables users to interactively explore similarity relationships between ā€˜drug likeā€™ molecules and the proteins they bind. Secondly, two deep-learning-based binding site comparison tools were developed, competing with the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold relationships and has already been used in multiple projects, including integration into a virtual screening pipeline to increase the tractability of ultra-large screening experiments. Together, and with existing tools, the contributions made will aid in the understanding of drug-protein relationships, particularly in the fields of off-target prediction and drug repurposing, helping to design better drugs faster

    Classification of Protein-Binding Sites Using a Spherical Convolutional Neural Network

    Get PDF
    The analysis and comparison of protein-binding sites aid various applications in the drug discovery process, e.g., hit finding, drug repurposing, and polypharmacology. Classification of binding sites has been a hot topic for the past 30 years, and many different methods have been published. The rapid development of machine learning computational algorithms, coupled with the large volume of publicly available proteinā€“ligand 3D structures, makes it possible to apply deep learning techniques in binding site comparison. Our method uses a cutting-edge spherical convolutional neural network based on the DeepSphere architecture to learn global representations of protein-binding sites. The model was trained on TOUGH-C1 and TOUGH-M1 data and validated with the ProSPECCTs datasets. Our results show that our model can (1) perform well in protein-binding site similarity and classification tasks and (2) learn and separate the physicochemical properties of binding sites. Lastly, we tested the model on a set of kinases, where the results show that it is able to cluster the different kinase subfamilies effectively. This example demonstrates the methodā€™s promise for lead hopping within or outside a protein target, directly based on binding site information

    Computer Vision Approaches to Liquid-Phase Transmission Electron Microscopy

    Get PDF
    Electron microscopy (EM) is a technique that exploits the interaction between electron and matter to produce high resolution images down to atomic level. In order to avoid undesired scattering in the electron path, EM samples are conventionally imaged in solid state under vacuum conditions. Recently, this limit has been overcome by the realization of liquid-phase electron microscopy (LP EM), a technique that enables the analysis of samples in their liquid native state. LP EM paired with a high frame rate acquisition direct detection camera allows tracking the motion of particles in liquids, as well as their temporal dynamic processes. In this research work, LP EM is adopted to image the dynamics of particles undergoing Brownian motion, exploiting their natural rotation to access all the particle views, in order to reconstruct their 3D structure via tomographic techniques. However, specific computer vision-based tools were designed around the limitations of LP EM in order to elaborate the results of the imaging process. Consequently, different deblurring and denoising approaches were adopted to improve the quality of the images. Therefore, the processed LP EM images were adopted to reconstruct the 3D model of the imaged samples. This task was performed by developing two different methods: Brownian tomography (BT) and Brownian particle analysis (BPA). The former tracks in time a single particle, capturing its dynamics evolution over time. The latter is an extension in time of the single particle analysis (SPA) technique. Conventionally it is paired to cryo-EM to reconstruct 3D density maps starting from thousands of EM images by capturing hundreds of particles of the same species frozen on a grid. On the contrary, BPA has the ability to process image sequences that may not contain thousands of particles, but instead monitors individual particle views across consecutive frames, rather than across a single frame

    Deep learning techniques for biomedical data processing

    Get PDF
    The interest in Deep Learning (DL) has seen an exponential growth in the last ten years, producing a significant increase in both theoretical and applicative studies. On the one hand, the versatility and the ability to tackle complex tasks have led to the rapid and widespread diffusion of DL technologies. On the other hand, the dizzying increase in the availability of biomedical data has made classical analyses, carried out by human experts, progressively more unlikely. Contextually, the need for efficient and reliable automatic tools to support clinicians, at least in the most demanding tasks, has become increasingly pressing. In this survey, we will introduce a broad overview of DL models and their applications to biomedical data processing, specifically to medical image analysis, sequence processing (RNA and proteins) and graph modeling of molecular data interactions. First, the fundamental key concepts of DL architectures will be introduced, with particular reference to neural networks for structured data, convolutional neural networks, generative adversarial models, and siamese architectures. Subsequently, their applicability for the analysis of different types of biomedical data will be shown, in areas ranging from diagnostics to the understanding of the characteristics underlying the process of transcription and translation of our genetic code, up to the discovery of new drugs. Finally, the prospects and future expectations of DL applications to biomedical data will be discussed

    How to describe a cell: a path to automated versatile characterization of cells in imaging data

    Get PDF
    A cell is the basic functional unit of life. Most ulticellular organisms, including animals, are composed of a variety of different cell types that fulfil distinct roles. Within an organism, all cells share the same genome, however, their diverse genetic programs lead them to acquire different molecular and anatomical characteristics. Describing these characteristics is essential for understanding how cellular diversity emerged and how it contributes to the organism function. Probing cellular appearance by microscopy methods is the original way of describing cell types and the main approach to characterise cellular morphology and position in the organism. Present cutting-edge microscopy techniques generate immense amounts of data, requiring efficient automated unbiased methods of analysis. Not only can such methods accelerate the process of scientific discovery, they should also facilitate large-scale systematic reproducible analysis. The necessity of processing big datasets has led to development of intricate image analysis pipelines, however, they are mostly tailored to a particular dataset and a specific research question. In this thesis I aimed to address the problem of creating more general fully-automated ways of describing cells in different imaging modalities, with a specific focus on deep neural networks as a promising solution for extracting rich general-purpose features from the analysed data. I further target the problem of integrating multiple data modalities to generate a detailed description of cells on the whole-organism level. First, on two examples of cell analysis projects, I show how using automated image analysis pipelines and neural networks in particular, can assist characterising cells in microscopy data. In the first project I analyse a movie of drosophila embryo development to elucidate the difference in myosin patterns between two populations of cells with different shape fate. In the second project I develop a pipeline for automatic cell classification in a new imaging modality to show that the quality of the data is sufficient to tell apart cell types in a volume of mouse brain cortex. Next, I present an extensive collaborative effort aimed at generating a whole-body multimodal cell atlas of a three-segmented Platynereis dumerilii worm, combining high resolution morphology and gene expression. To generate a multi-sided description of cells in the atlas I create a pipeline for assigning coherent denoised gene expression profiles, obtained from spatial gene expression maps, to cells segmented in the EM volume. Finally, as the main project of this thesis, I focus on extracting comprehensive unbiased cell morphology features from an EM volume of Platynereis dumerilii. I design a fully unsupervised neural network pipeline for extracting rich morphological representations that enable grouping cells into morphological cell classes with characteristic gene expression. I further show how such descriptors could be used to explore the morphological diversity of cells, tissues and organs in the dataset

    Machine Learning Methods for Medical and Biological Image Computing

    Get PDF
    Medical and biological imaging technologies provide valuable visualization information of structure and function for an organ from the level of individual molecules to the whole object. Brain is the most complex organ in body, and it increasingly attracts intense research attentions with the rapid development of medical and bio-logical imaging technologies. A massive amount of high-dimensional brain imaging data being generated makes the design of computational methods for eļ¬ƒcient analysis on those images highly demanded. The current study of computational methods using hand-crafted features does not scale with the increasing number of brain images, hindering the pace of scientiļ¬c discoveries in neuroscience. In this thesis, I propose computational methods using high-level features for automated analysis of brain images at diļ¬€erent levels. At the brain function level, I develop a deep learning based framework for completing and integrating multi-modality neuroimaging data, which increases the diagnosis accuracy for Alzheimerā€™s disease. At the cellular level, I propose to use three dimensional convolutional neural networks (CNNs) for segmenting the volumetric neuronal images, which improves the performance of digital reconstruction of neuron structures. I design a novel CNN architecture such that the model training and testing image prediction can be implemented in an end-to-end manner. At the molecular level, I build a voxel CNN classiļ¬er to capture discriminative features of the input along three spatial dimensions, which facilitate the identiļ¬cation of secondary structures of proteins from electron microscopy im-ages. In order to classify genes speciļ¬cally expressed in diļ¬€erent brain cell-type, I propose to use invariant image feature descriptors to capture local gene expression information from cellular-resolution in situ hybridization images. I build image-level representations by applying regularized learning and vector quantization on generated image descriptors. The developed computational methods in this dissertation are evaluated using images from medical and biological experiments in comparison with baseline methods. Experimental results demonstrate that the developed representations, formulations, and algorithms are eļ¬€ective and eļ¬ƒcient in learning from brain imaging data

    Neural function approximation on graphs: shape modelling, graph discrimination & compression

    Get PDF
    Graphs serve as a versatile mathematical abstraction of real-world phenomena in numerous scientific disciplines. This thesis is part of the Geometric Deep Learning subject area, a family of learning paradigms, that capitalise on the increasing volume of non-Euclidean data so as to solve real-world tasks in a data-driven manner. In particular, we focus on the topic of graph function approximation using neural networks, which lies at the heart of many relevant methods. In the first part of the thesis, we contribute to the understanding and design of Graph Neural Networks (GNNs). Initially, we investigate the problem of learning on signals supported on a fixed graph. We show that treating graph signals as general graph spaces is restrictive and conventional GNNs have limited expressivity. Instead, we expose a more enlightening perspective by drawing parallels between graph signals and signals on Euclidean grids, such as images and audio. Accordingly, we propose a permutation-sensitive GNN based on an operator analogous to shifts in grids and instantiate it on 3D meshes for shape modelling (Spiral Convolutions). Following, we focus on learning on general graph spaces and in particular on functions that are invariant to graph isomorphism. We identify a fundamental trade-off between invariance, expressivity and computational complexity, which we address with a symmetry-breaking mechanism based on substructure encodings (Graph Substructure Networks). Substructures are shown to be a powerful tool that provably improves expressivity while controlling computational complexity, and a useful inductive bias in network science and chemistry. In the second part of the thesis, we discuss the problem of graph compression, where we analyse the information-theoretic principles and the connections with graph generative models. We show that another inevitable trade-off surfaces, now between computational complexity and compression quality, due to graph isomorphism. We propose a substructure-based dictionary coder - Partition and Code (PnC) - with theoretical guarantees that can be adapted to different graph distributions by estimating its parameters from observations. Additionally, contrary to the majority of neural compressors, PnC is parameter and sample efficient and is therefore of wide practical relevance. Finally, within this framework, substructures are further illustrated as a decisive archetype for learning problems on graph spaces.Open Acces

    Development and application of deep learning and spatial statistics within 3D bone marrow imaging

    Get PDF
    The bone marrow is a highly specialised organ, responsible for the formation of blood cells. Despite 50 years of research, the spatial organisation of the bone marrow remains an area full of controversy and contradiction. One reason for this is that imaging of bone marrow tissue is notoriously difficult. Secondly, efficient methodologies to fully extract and analyse large datasets remain the Achilles heels of imaging-based research. In this thesis I present a pipeline for generating 3D bone marrow images followed by the large-scale data extraction and spatial statistical analysis of the resulting data. Using these techniques, in the context of 3D imaging, I am able to identify and classify the location of hundreds of thousands of cells within various bone marrow samples. I then introduce a series of statistical techniques tailored to work with spatial data, resulting in a 3D statistical map of the tissue from which multi-cellular interactions can be clearly understood. As an illustration of the power of this new approach, I apply this pipeline to diseased samples of bone marrow with a particular focus on leukaemia and its interactions with CD8+ T cells. In so doing I show that this novel pipeline can be used to unravel complex multi-cellular interactions and assist researchers in understanding the processes taking place within the bone marrow.Open Acces

    Features of structural binding motifs and their predictive power

    No full text

    Positive Definite Kernels in Machine Learning

    Full text link
    This survey is an introduction to positive definite kernels and the set of methods they have inspired in the machine learning literature, namely kernel methods. We first discuss some properties of positive definite kernels as well as reproducing kernel Hibert spaces, the natural extension of the set of functions {k(x,ā‹…),xāˆˆX}\{k(x,\cdot),x\in\mathcal{X}\} associated with a kernel kk defined on a space X\mathcal{X}. We discuss at length the construction of kernel functions that take advantage of well-known statistical models. We provide an overview of numerous data-analysis methods which take advantage of reproducing kernel Hilbert spaces and discuss the idea of combining several kernels to improve the performance on certain tasks. We also provide a short cookbook of different kernels which are particularly useful for certain data-types such as images, graphs or speech segments.Comment: draft. corrected a typo in figure
    • ā€¦
    corecore