502 research outputs found
Probabilistic analysis of the human transcriptome with side information
Understanding functional organization of genetic information is a major
challenge in modern biology. Following the initial publication of the human
genome sequence in 2001, advances in high-throughput measurement technologies
and efficient sharing of research material through community databases have
opened up new views to the study of living organisms and the structure of life.
In this thesis, novel computational strategies have been developed to
investigate a key functional layer of genetic information, the human
transcriptome, which regulates the function of living cells through protein
synthesis. The key contributions of the thesis are general exploratory tools
for high-throughput data analysis that have provided new insights to
cell-biological networks, cancer mechanisms and other aspects of genome
function.
A central challenge in functional genomics is that high-dimensional genomic
observations are associated with high levels of complex and largely unknown
sources of variation. By combining statistical evidence across multiple
measurement sources and the wealth of background information in genomic data
repositories it has been possible to solve some the uncertainties associated
with individual observations and to identify functional mechanisms that could
not be detected based on individual measurement sources. Statistical learning
and probabilistic models provide a natural framework for such modeling tasks.
Open source implementations of the key methodological contributions have been
released to facilitate further adoption of the developed methods by the
research community.Comment: Doctoral thesis. 103 pages, 11 figure
Computational Approaches to Drug Profiling and Drug-Protein Interactions
Despite substantial increases in R&D spending within the pharmaceutical industry, denovo drug design has become a time-consuming endeavour. High attrition rates led to a
long period of stagnation in drug approvals. Due to the extreme costs associated with
introducing a drug to the market, locating and understanding the reasons for clinical failure
is key to future productivity. As part of this PhD, three main contributions were made in
this respect. First, the web platform, LigNFam enables users to interactively explore
similarity relationships between ‘drug like’ molecules and the proteins they bind. Secondly,
two deep-learning-based binding site comparison tools were developed, competing with
the state-of-the-art over benchmark datasets. The models have the ability to predict offtarget interactions and potential candidates for target-based drug repurposing. Finally, the
open-source ScaffoldGraph software was presented for the analysis of hierarchical scaffold
relationships and has already been used in multiple projects, including integration into a
virtual screening pipeline to increase the tractability of ultra-large screening experiments.
Together, and with existing tools, the contributions made will aid in the understanding of
drug-protein relationships, particularly in the fields of off-target prediction and drug
repurposing, helping to design better drugs faster
Evolutionary Computation and QSAR Research
[Abstract] The successful high throughput screening of molecule libraries for a specific biological property is one of the main improvements in drug discovery. The virtual molecular filtering and screening relies greatly on quantitative structure-activity relationship (QSAR) analysis, a mathematical model that correlates the activity of a molecule with molecular descriptors. QSAR models have the potential to reduce the costly failure of drug candidates in advanced (clinical) stages by filtering combinatorial libraries, eliminating candidates with a predicted toxic effect and poor pharmacokinetic profiles, and reducing the number of experiments. To obtain a predictive and reliable QSAR model, scientists use methods from various fields such as molecular modeling, pattern recognition, machine learning or artificial intelligence. QSAR modeling relies on three main steps: molecular structure codification into molecular descriptors, selection of relevant variables in the context of the analyzed activity, and search of the optimal mathematical model that correlates the molecular descriptors with a specific activity. Since a variety of techniques from statistics and artificial intelligence can aid variable selection and model building steps, this review focuses on the evolutionary computation methods supporting these tasks. Thus, this review explains the basic of the genetic algorithms and genetic programming as evolutionary computation approaches, the selection methods for high-dimensional data in QSAR, the methods to build QSAR models, the current evolutionary feature selection methods and applications in QSAR and the future trend on the joint or multi-task feature selection methods.Instituto de Salud Carlos III, PIO52048Instituto de Salud Carlos III, RD07/0067/0005Ministerio de Industria, Comercio y Turismo; TSI-020110-2009-53)Galicia. Consellería de Economía e Industria; 10SIN105004P
Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference
A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works
siRNA SCREEN FOR IDENTIFICATION OF HUMAN KINASES INVOLVED IN ASSEMBLY AND RELEASE OF HIV-1
The replication of the human immunodeficiency virus type 1 (HIV-1) is as yet not fully understood. In particular the knowledge of interactions between viral and host cell proteins and the understanding of complete virus-host protein networks are still imprecise. An integral picture of the hijacked cellular machinery is essential for a better comprehension of the virus. And as a prerequisite, new tools are needed for this purpose.
To create such a novel tool, a screening platform for host cell factors was established in this work. The screening assay serves as a powerful method to gain insights into virus-host-interactions. It was specifically tailored to addressing the stage of assembly and release of viral particles during the replication cycle of HIV-1. It was designed to be suitable for both RNAi and chemical compound screening. The first phase of this work comprised the setup and optimization of the assay. It was shown, that it was robust and reliable and delivered reproducible results. As a subsequent step, a siRNA library targeting 724 human kinases and accessory proteins was examined. After the evaluation of the complete siRNA library in a primary screen, all primary hits were validated in a second reconfirmation screen using different siRNAs. The purpose of this two-step approach was to identify and exclude false positives.
In the end, 43 genes were reconfirmed to influence the assembly and release of HIV-1. Out of those, 39 were host dependency and 4 host restriction factors. Several of them had already been described in the literature to interact with HIV-1. However, various so far unknown host cell proteins were identified within this work. A subsequent combinatory pathway analysis including hits from other published screens identified several important signaling pathways to be important for HIV-1 assembly and release. The described single key proteins and their underlying protein networks provide a basis for the next steps toward understanding the virus and improving treatment in the future
Recommended from our members
Integrating Functional Genomics with Systems Biology to Discover Drivers and Therapeutic Targets of Human Malignancies
Genome-wide RNAi screening has emerged as a powerful tool for loss-of-function studies that may lead to therapeutic target discovery for human malignancies in the era of personalized medicine. However, due to high false-positive and false-negative rates arising from noise of high-throughput measurements and off-target effects, powerful computational tools and additional knowledge are much needed to analyze and complement it. Availability of high-throughput genomic data including gene expression profiles, copy number variations from large-sampled primary patients and cell lines allows us to tackle underlying drivers causally associated with tumorigenesis or drug-resistance. In my dissertation, I have developed a framework to integrate functional RNAi screens with systems biology of cancer genomics to tailor potential therapeutics for reversal of drug-resistance or treatment of aggressive tumors. I developed a series of algorithms and tools to deconvolute, QC and post-analyze high-throughput shRNA screening data by next-generation sequencing technology (shSeq), particularly a novel Bayesian hierarchical modeling approach to integrate multiple shRNAs targeting the same gene, which outperforms existing methods. In parallel, I developed a systems biology algorithm, NetBID2, to infer disease drivers from high-throughput genomic data by reverse-engineering network and Bayesian inference, which is able to detect hidden drivers that traditional methods fail to find. Integrating NetBID2 with functional RNAi screens, I have identified known and novel driver-type therapeutic targets in various disease contexts. For example, I discovered that AKT1 is a driver for glucocorticoid (GC) resistance, a problem in the treatment of T-ALL. The inhibition of AKT1 was validated to reverse GC-resistance. Additionally, upon silencing predicted master regulators of GC resistance with shRNA screens, 13 out of 16 were validated to significantly overcome resistance. In breast cancer, I discovered that STAT3 is required for transformation of HER2+ breast cancer, an aggressive breast tumor subtype. The suppression of STAT3 was confirmed in vitro and in vivo to be an effective therapy for HER2+ breast cancer. Moreover, my analysis revealed that STAT3 silencing only works in ER- cases. Using my framework, I have also identified potential therapeutic targets for ABC or GCB-type DLBCL and subtype-based breast cancer that are currently being validated
A Farewell to Flat Biology. Three-dimensional Cell Culture Models in Cancer Drug Target Identification and Validation
Cells of epithelial origin, e.g. from breast and prostate cancers, effectively differentiate into complex multicellular structures when cultured in three-dimensions (3D) instead of conventional two-dimensional (2D) adherent surfaces. The spectrum of different organotypic morphologies is highly dependent on the culture environment that can be either non-adherent or scaffold-based. When embedded in physiological extracellular matrices (ECMs), such as laminin-rich basement membrane extracts, normal epithelial cells differentiate into acinar spheroids reminiscent of glandular ductal structures. Transformed cancer cells, in contrast, typically fail to undergo acinar morphogenic patterns, forming poorly differentiated or invasive multicellular structures. The 3D cancer spheroids are widely accepted to better recapitulate various tumorigenic processes and drug responses. So far, however, 3D models have been employed predominantly in the Academia, whereas the pharmaceutical industry has yet to adopt a more widely and routine use. This is mainly due to poor characterisation of cell models, lack of standardised workflows and high throughput cell culture platforms, and the availability of proper readout and quantification tools. In this thesis, a complete workflow has been established entailing well-characterised 3D cell culture models for prostate cancer, a standardised 3D cell culture routine based on high-throughput-ready platform, automated image acquisition with concomitant morphometric image analysis, and data visualisation, in order to enable large-scale high-content screens. Our integrated suite of software and statistical analysis tools were optimised and validated using a comprehensive panel of prostate cancer cell lines and 3D models. The tools quantify multiple key cancer-relevant morphological features, ranging from cancer cell invasion through multicellular differentiation to growth, and detect dynamic changes both in morphology and function, such as cell death and apoptosis, in response to experimental perturbations including RNA interference and small molecule inhibitors. Our panel of cell lines included many non-transformed and most currently available classic prostate cancer cell lines, which were characterised for their morphogenetic properties in 3D laminin-rich ECM. The phenotypes and gene expression profiles were evaluated concerning their relevance for pre-clinical drug discovery, disease modelling and basic research. In addition, a spontaneous model for invasive transformation was discovered, displaying a highdegree of epithelial plasticity. This plasticity is mediated by an abundant bioactive serum lipid, lysophosphatidic acid (LPA), and its receptor LPAR1. The invasive transformation was caused by abrupt cytoskeletal rearrangement through impaired G protein alpha 12/13 and RhoA/ROCK, and mediated by upregulated adenylyl cyclase/cyclic AMP (cAMP)/protein kinase A, and Rac/ PAK pathways. The spontaneous invasion model tangibly exemplifies the biological relevance of organotypic cell culture models. Overall, this thesis work underlines the power of novel morphometric screening tools in drug discovery.Siirretty Doriast
- …