145 research outputs found

    Analysis of genomic data to derive biological conclusions on (1) transcriptional regulation in the human genome and (2) antibody resistance in hepatitis C virus

    Full text link
    High­-throughput sequencing has become pervasive in all facets of genomic analysis. I developed computational methods to analyze high­-throughput sequencing data and derive biological conclusions in two research areas -- transcriptional regulation in mammals and evolution of virus under immune pressure. To investigate transcriptional regulation, I integrated data from multiple experiments performed by the ENCODE consortium. First, my analysis revealed that Transcription Factors (TFs) prefer to bind GC-­rich, histone­-depleted regions. By comparing in vivo and in vitro nucleosome dynamics, I observed that while histones have an innate preference for binding GC-­rich DNA, TF binding overrides this preference and produces a negative correlation between GC content and histone enrichment. In the next project, I found that the binding events of multiple TFs co-­occur at genomic regions enriched in activating histone marks that are typically associated with gene enhancers and promoters, suggesting that these regions may be enhancers or have TSS-­distal transcription. Lastly, I used supervised machine ­learning techniques to train histone enrichment signals and sequence features to predict transcriptional enhancers to be validated in mouse-­transgenic assays. In a post­-clinical trial exploratory analysis of Hepatitis C Virus (HCV), I traced the evolutionary path of the envelope proteins E1 and E2 in HCV-infected liver transplant patients, in response to a novel antibody. I developed a systematic amino acid­-level analysis pipeline that quantifies differences in amino acid frequencies in each position between two time points. Upon applying this method across all positions in the E1/E2 region and comparing pre-­liver­-transplant and post­-viral­-rebound time points, mutations in two positions emerged as being key to antibody evasion. Both these mutations--N415K/D and N417S--were in the epitope targeted by the antibody, but surprisingly, did not co­-occur. In post­-rebound viral genomes that contain the N417S mutation but retain the wild-­type variant at 415, N-­linked glycosylation of 415 is another possible escape mechanism. Using the same analysis pipeline, I also identified additional candidate escape mutations outside the epitope, which could be potential therapeutic targets

    Large margin methods for partner specific prediction of interfaces in protein complexes

    Get PDF
    2014 Spring.The study of protein interfaces and binding sites is a very important domain of research in bioinformatics. Information about the interfaces between proteins can be used not only in understanding protein function but can also be directly employed in drug design and protein engineering. However, the experimental determination of protein interfaces is cumbersome, expensive and not possible in some cases with today's technology. As a consequence, the computational prediction of protein interfaces from sequence and structure has emerged as a very active research area. A number of machine learning based techniques have been proposed for the solution to this problem. However, the prediction accuracy of most such schemes is very low. In this dissertation we present large-margin classification approaches that have been designed to directly model different aspects of protein complex formation as well as the characteristics of available data. Most existing machine learning techniques for this task are partner-independent in nature, i.e., they ignore the fact that the binding propensity of a protein to bind to another protein is dependent upon characteristics of residues in both proteins. We have developed a pairwise support vector machine classifier called PAIRpred to predict protein interfaces in a partner-specific fashion. Due to its more detailed model of the problem, PAIRpred offers state of the art accuracy in predicting both binding sites at the protein level as well as inter-protein residue contacts at the complex level. PAIRpred uses sequence and structure conservation, local structural similarity and surface geometry, residue solvent exposure and template based features derived from the unbound structures of proteins forming a protein complex. We have investigated the impact of explicitly modeling the inter-dependencies between residues that are imposed by the overall structure of a protein during the formation of a protein complex through transductive and semi-supervised learning models. We also present a novel multiple instance learning scheme called MI-1 that explicitly models imprecision in sequence-level annotations of binding sites in proteins that bind calmodulin to achieve state of the art prediction accuracy for this task

    Role of dietary ethiological factors in the molecular pathogenesis of liver cancer

    Get PDF
    Ankara : The Department of Molecular Biology and Genetics and the Institute of Engineering and Science of Bilkent University, 2011.Thesis (Ph. D.) -- Bilkent University, 2011.Includes bibliographical references leaves 121-136.Hepatocellular carcinoma is ranked third foremost cause of cancer deaths. Dietary factors play a crucial role in the molecular pathogenesis of liver cancer. Oxidative stress is usually coupled with the malignancy and progression of HCC since it is considered as a common factor during inflammation after chronic viral infection. Chemical stress caused by aflatoxin exposure, metabolic stress produced by alcohol abuse and selenium deficiency as a risk factor for HCC are associated with oxidative stress. It should be eliminated with an intact antioxidant defense mechanism. It is a major cause of genotoxicity endogenously through metabolic stress and exogenously produced by chemical and physical carcinogens. Even though the contribution of dietary factors in HCC progression has been established, the underlying molecular mechanism has not been fully understood. Cancer cells may respond to genotoxic stress with a cryptic development of survival advantage mechanisms. Therefore we wanted to investigate this idea with dietary factors involved in liver cancer. In this work, we studied the implication of Se-deficiency in tumorigenesis of hepatocytes and the mechanism underlying the selective selection of aflatoxins for p53-249 mutation in HCC. Aflatoxins are the most potent naturally occurring carcinogens and may play a causative role in 5-28% of hepatocellular carcinomas, worldwide. Aflatoxins are activated in liver cells and induce principally G- >T mutations, including a codon 249 (G->T) hotspot mutation of TP53 gene that is specifically associated with aflatoxin-related hepatocellular carcinoma. However, our comparative analysis showed that R249S does not provide survival advantage at heterozygous state. Thus, the selection could be at the mutation induction stage. The lack of p53 activation in Aflatoksin B1 exposed HCC cells led us to test DNA damage response after aflatoxin exposure. Unexpectedly, DNA damage checkpoint response to aflatoxins has not been studied thoroughly before. Although, DNA damage checkpoint response acts as an anti-tumor mechanism by protecting genome integrity against genotoxic agents, this highly critical aspect of aflatoxin carcinogenicity is poorly known. Our findings provide evidence for the contribution of ERK, p38MAPK and PI3K/Akt survival pathways under selenium supplementation in some HCC cell lines. Apart from the effect of selenium deficiency, our results enlighten the aflatoxin carcinogenicity in vitro. Our study pointed out for a negligent G1 and G2/M checkpoint response to aflatoxin B1-induced DNA damage. This defective response may account mostly for mutagenic and carcinogenic influences of aflatoxins. It may also associate with the frequent induction of TP53 hotspot mutation in aflatoxin-related human HCC.Yüzügüllü, Şehriban Özge GürsoyPh.D

    Computational Analysis of RNAi Screening Data to Identify Host Factors Involved in Viral Infection and to Characterize Protein-Protein Interactions

    Get PDF
    The study of gene functions in a variety of different treatments, cell lines and organisms has been facilitated by RNA interference (RNAi) technology that tracks the phenotype of cells after silencing of particular genes. In this thesis, I describe two computational approaches developed to analyze the image data from two different RNAi screens. Firstly, I developed an alternative approach to detect host factors (human proteins) that support virus growth and replication of cells infected with the Hepatitis C virus (HCV). To identify the human proteins that are crucial for the efficiency of viral infection, several RNAi experiments of viral-infected cells have been conducted. However, the target lists from different laboratories have shown only little overlap. This inconsistency might be caused not only by experimental discrepancies, but also by not fully explored possibilities of the data analysis. Observing only viral intensity readouts from the experiments might be insufficient. In this project, I describe our computational development as a new alternative approach to improve the reliability for the host factor identification. Our approach is based on characterizing the clustering of infected cells. The idea is that viral infection is spread by cell-cell contacts, or at least advantaged by the vicinity of cells. Therefore, clustering of the HCV infected cells is observed during spreading of the infection. We developed a clustering detection method basing on a distance-based point pattern analysis (K-function) to identify knockdown genes in which the clusters of HCV infected cells were reduced. The approach could significantly separate between positive and negative controls and found good correlations between the clustering score and intensity readouts from the experimental screens. In comparison to another clustering algorithm, the K-function method was superior to Quadrat analysis method. Statistical normalization approaches were exploited to identify protein targets from our clustering-based approach and the experimental screens. Integrating results from our clustering method, intensity readout analysis and secondary screen, we finally identified five promising host factors that are suitable candidate targets for drug therapy. Secondly, a machine learning based approach was developed to characterize protein-protein interactions (PPIs) in a signaling network. The characterization of each PPI is fundamental to our understanding of the complex signaling system of a human cell. Experiments for PPI identification, such as yeast two-hybrid and FRET analysis, are resource-intensive, and, therefore, computational approaches for analysing large-scale RNAi knockdown screens have become an important pursuit of inferring the functional similarities from the phenotypic similarities of the down-regulated proteins. However, these methods did not provide a more detailed characterization of the PPIs. In this project, I developed a new computational approach that is based on a machine learning technique which employs the mitotic phenotypes of an RNAi screen. It enables the identification of the nature of a PPI, i.e., if it is of rather activating or inhibiting nature. We established a systematic classification using Support Vector Machines (SVMs) that was based on the phenotypic descriptors and used it to classify the interactions that activate or inhibit signal transduction. The machines yielded promising results with good performance when integrating different sets of published descriptors and our own developed descriptors calculated from fractions of specific phenotypes, linear classification of phenotypes, and phenotypic distance to distinct proteins. A comprehensive model generated from the machines was used for further predictions. We investigated the nature of pairs of interacting proteins and generated a consistency score that enhanced the precisions of the classification results. We predicted the activating/inhibiting nature for 214 PPIs with high confidence in signaling pathways and enabled to identify a new subgroup of chemokine receptors. These findings might facilitate an enhanced understanding of the cellular mechanisms during inflammation and immunologic responses. In summary, two computational approaches were developed to analyze the image data of the different RNAi screens: 1) a clustering-based approach was used to identify the host factors that are crucial for HCV infection; and 2) a machine learning-based approach with various descriptors was employed to characterize PPI activities. The results from the host factor analysis revealed novel target proteins that are involved in the spread of the HCV. In addition, the results of the characterization of the PPIs lead to a better understanding of the signaling pathways. The two large-scale RNAi data were successfully analyzed by our established approaches to obtain new insights into virus biology and cellular signaling

    Metabolomis and lipidomic of both Zika and Dengue virus infectious processes; from mosquito to the patient  

    Get PDF
    Orientador: Rodrigo Ramos CatharinoTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Ciências MédicasResumo: O vírus da Zika (ZIKV) é um arbovírus que apresenta um papel importante no aumento de nascimentos de bebês microcefálicos e em adultos tem sido associado à síndrome de Guillain¿Barré, tendo se mostrado como de grande preocupação para a saúde pública. O vírus da Dengue (DENV) se apresenta endêmico no Brasil, se caracterizando, assim como o ZIKV, por provocar infecções autolimitada na grande maioria dos pacientes, sendo de grande importância na saúde pública pelas perdas econômicas em razão da diminuição da produtividade nas populações atingidas e sobre tudo na forma hemorrágica da doença que pode levar à morte. Esses dois arbovírus se adaptaram a mosquitos urbanos facilitando a ocorrência de graves epidemias de Dengue e Zika uma vez que o A. aegypti (principal vetor) apresenta discordância gonotrófica, o que o torna um excelente vetor. De uma forma geral as doenças transmitidas por mosquitos têm uma tradição de serem negligenciadas, existindo lacunas de informações biomédicas importantes como, o mecanismo de infecção no mosquito, mecanismo de infecção viral no homem e o diagnostico laboratorial. Em paralelo, a metabolômica e a lipidômica são metodologias de estudo estratégicas e revolucionárias que auxiliam na determinação de biomarcadores importantes no controle de infecções e dada a emergência global para a saúde pública e o potencial revolucionário das novas "ômicas"; que nos permite estudar mecanismos da infecção na forma adulta dos vetores, no homem e se mostra muito importante no diagnóstico. Dado em contexto em tela, este projeto visou estudar as duas arboviroses mais importantes atualmente no Brasil: o ZIKV, propondo busca por biomarcadores para identificar e entender o mecanismo da infecção no mosquito (Objetivo 1) e no Homem (Objetivo 2) para seu melhor controle e desenvolvimento de um método diagnóstico para o ZIKV através de plataformas "ômicas" (Objetivo 3). Para o DENV o objetivo proposto foi a busca de biomarcadores para as alterações provocadas pelo DENV em pacientes que apresentaram a Dengue hemorrágica (Objetivo 4)Abstract: Zika virus (ZIKV) is an arbovirus that plays an important role in increasing births of microcephalic babies and in adults and has been associated with Guillain-Barré syndrome and it has been of major public health concern. Dengue virus (DENV) is endemic in Brazil and, like ZIKV, is characterized by causing self-limiting infections in the vast majority of patients, being of great importance in public health due to economic losses due to decreased productivity in affected populations and above all in the hemorrhagic form of the disease that can lead to death. These two arboviruses have adapted to urban mosquitoes facilitating the occurrence of severe epidemics of Dengue and Zika since the A. aegypti (main vector) presents with gonotrophic discordance, which makes it an excellent vector. In general, mosquito-borne diseases have a tradition of being neglected, and there are gaps in important biomedical information such as mosquito infection mechanism, viral infection mechanism in man, and laboratory diagnosis. In parallel, metabolomics and lipidomics are strategic and revolutionary study methodologies that aid in the determination of important biomarkers in infection control and the global emergence of public health and the revolutionary potential of the new "omics"; which allows us to study mechanisms of infection in the adult form of the vectors in humans and is very important in the diagnosis. Given in context on screen, this project aimed to study the two most important arboviruses currently in Brazil: the ZIKV, proposing a search for biomarkers to identify and understand the mechanism of infection in the mosquito (Goal 1) and human (Goal 2) for its better disease control and development of a diagnostic method for ZIKV through "omic" platforms (Goal 3). For DENV the proposed goal was to search for biomarkers for DENV alterations in patients presenting with hemorrhagic Dengue (Goal 4)DoutoradoFisiopatologia MédicaDoutor em Ciências16/17066-2CAPESFAPES

    Machine-learning-based identification of factors that influence molecular virus-host interactions

    Get PDF
    Viruses are the cause of many infectious diseases such as the pandemic viruses: acquired immune deficiency syndrome (AIDS) and coronavirus disease 2019 (COVID-19). During the infection cycle, viruses invade host cells and trigger a series of virus-host interactions with different directionality. Some of these interactions disrupt host immune responses or promote the expression of viral proteins and exploitation of the host system thus are considered ‘pro-viral’. Some interactions display ‘pro-host’ traits, principally the immune response, to control or inhibit viral replication. Concomitant pro-viral and pro-host molecular interactions on the same host molecule suggests more complex virus-host conflicts and genetic signatures that are crucial to host immunity. In this work, machinelearning-based prediction of virus-host interaction directionality was examined by using data from Human immunodeficiency virus type 1 (HIV-1) infection. Host immune responses to viral infections are mediated by interferons(IFNs) in the initial stage of the immune response to infection. IFNs induce the expression of many IFN-stimulated genes (ISGs), which make the host cell refractory to further infection. We propose that there are many features associated with the up-regulation of human genes in the context of IFN-α stimulation. They make ISGs predictable using machine-learning models. In order to overcome the interference of host immune responses for successful replication, viruses adopt multiple strategies to avoid being detected by cellular sensors in order to hijack the machinery of host transcription or translation. Here, the strategy of mimicry of host-like short linear motifs (SLiMs) by the virus was investigated by using the example of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The integration of in silico experiments and analyses in this thesis demonstrates an interactive and intimate relationship between viruses and their hosts. Findings here contribute to the identification of host dependency and antiviral factors. They are of great importance not only to the ongoing COVID-19 pandemic but also to the understanding of future disease outbreaks

    Bioinformatical approaches to ranking of anti-HIV combination therapies and planning of treatment schedules

    Get PDF
    The human immunodeficiency virus (HIV) pandemic is one of the most serious health challenges humanity is facing today. Combination therapy comprising multiple antiretroviral drugs resulted in a dramatic decline in HIV-related mortality in the developed countries. However, the emergence of drug resistant HIV variants during treatment remains a problem for permanent treatment success and seriously hampers the composition of new active regimens. In this thesis we use statistical learning for developing novel methods that rank combination therapies according to their chance of achieving treatment success. These depend on information regarding the treatment composition, the viral genotype, features of viral evolution, and the patient's therapy history. Moreover, we investigate different definitions of response to antiretroviral therapy and their impact on prediction performance of our method. We address the problem of extending purely data-driven approaches to support novel drugs with little available data. In addition, we explore the prospect of prediction systems that are centered on the patient's treatment history instead of the viral genotype. We present a framework for rapidly simulating resistance development during combination therapy that will eventually allow application of combination therapies in the best order. Finally, we analyze surface proteins of HIV regarding their susceptibility to neutralizing antibodies with the aim of supporting HIV vaccine development.Die Humane Immundefizienz-Virus (HIV) Pandemie ist eine der schwerwiegendsten gesundheitlichen Herausforderungen weltweit. Kombinationstherapien bestehend aus mehreren Medikamenten führten in entwickelten Ländern zu einem drastischen Rückgang der HIV-bedingten Sterblichkeit. Die Entstehung von Arzneimittel-resistenten Varianten während der Behandlung stellt allerdings ein Problem für den anhaltenden Behandlungserfolg dar und erschwert die Zusammenstellung von neuen aktiven Kombinationen. In dieser Arbeit verwenden wir statistisches Lernen zur Entwicklung neuer Methoden, welche Kombinationstherapien bezüglich ihres erwarteten Behandlungserfolgs sortieren. Dabei nutzen wir Informationen über die Medikamente, das virale Erbgut, die Virus Evolution und die Therapiegeschichte des Patienten. Außerdem untersuchen wir unterschiedliche Definitionen für Therapieerfolg und ihre Auswirkungen auf die Güte unserer Modelle. Wir gehen das Problem der Erweiterung von daten-getriebenen Modellen bezüglich neuer Wirkstoffen an, und untersuchen weiterhin die Therapiegeschichte des Patienten als Ersatz für das virale Genom bei der Vorhersage. Wir stellen das Rahmenwerk für die schnelle Simulation von Resistenzentwicklung vor, welches schließlich erlaubt, die bestmögliche Reihenfolge von Kombinationstherapien zu suchen. Schließlich analysieren wir das HIV Oberflächenprotein im Hinblick auf seine Anfälligkeit für neutralisierende Antikörper mit dem Ziel die Impfstoff Entwicklung zu unterstützen
    corecore