15 research outputs found

    DATA DRIVEN APPROACHES TO IDENTIFY DETERMINANTS OF HEART DISEASES AND CANCER RESISTANCE

    Get PDF
    Cancer and cardio-vascular diseases are the leading causes of death world-wide. Caused by systemic genetic and molecular disruptions in cells, these disorders are the manifestation of profound disturbance of normal cellular homeostasis. People suffering or at high risk for these disorders need early diagnosis and personalized therapeutic intervention. Successful implementation of such clinical measures can significantly improve global health. However, development of effective therapies is hindered by the challenges in identifying genetic and molecular determinants of the onset of diseases; and in cases where therapies already exist, the main challenge is to identify molecular determinants that drive resistance to the therapies. Due to the progress in sequencing technologies, the access to a large genome-wide biological data is now extended far beyond few experimental labs to the global research community. The unprecedented availability of the data has revolutionized the capabilities of computational researchers, enabling them to collaboratively address the long standing problems from many different perspectives. Likewise, this thesis tackles the two main public health related challenges using data driven approaches. Numerous association studies have been proposed to identify genomic variants that determine disease. However, their clinical utility remains limited due to their inability to distinguish causal variants from associated variants. In the presented thesis, we first propose a simple scheme that improves association studies in supervised fashion and has shown its applicability in identifying genomic regulatory variants associated with hypertension. Next, we propose a coupled Bayesian regression approach -- eQTeL, which leverages epigenetic data to estimate regulatory and gene interaction potential, and identifies combinations of regulatory genomic variants that explain the gene expression variance. On human heart data, eQTeL not only explains a significantly greater proportion of expression variance in samples, but also predicts gene expression more accurately than other methods. We demonstrate that eQTeL accurately detects causal regulatory SNPs by simulation, particularly those with small effect sizes. Using various functional data, we show that SNPs detected by eQTeL are enriched for allele-specific protein binding and histone modifications, which potentially disrupt binding of core cardiac transcription factors and are spatially proximal to their target. eQTeL SNPs capture a substantial proportion of genetic determinants of expression variance and we estimate that 58% of these SNPs are putatively causal. The challenge of identifying molecular determinants of cancer resistance so far could only be dealt with labor intensive and costly experimental studies, and in case of experimental drugs such studies are infeasible. Here we take a fundamentally different data driven approach to understand the evolving landscape of emerging resistance. We introduce a novel class of genetic interactions termed synthetic rescues (SR) in cancer, which denotes a functional interaction between two genes where a change in the activity of one vulnerable gene (which may be a target of a cancer drug) is lethal, but subsequently altered activity of its partner rescuer gene restores cell viability. Next we describe a comprehensive computational framework --termed INCISOR-- for identifying SR underlying cancer resistance. Applying INCISOR to mine The Cancer Genome Atlas (TCGA), a large collection of cancer patient data, we identified the first pan-cancer SR networks, composed of interactions common to many cancer types. We experimentally test and validate a subset of these interactions involving the master regulator gene mTOR. We find that rescuer genes become increasingly activated as breast cancer progresses, testifying to pervasive ongoing rescue processes. We show that SRs can be utilized to successfully predict patients' survival and response to the majority of current cancer drugs, and importantly, for predicting the emergence of drug resistance from the initial tumor biopsy. Our analysis suggests a potential new strategy for enhancing the effectiveness of existing cancer therapies by targeting their rescuer genes to counteract resistance. The thesis provides statistical frameworks that can harness ever increasing high throughput genomic data to address challenges in determining the molecular underpinnings of hypertension, cardiovascular disease and cancer resistance. We discover novel molecular mechanistic insights that will advance the progress in early disease prevention and personalized therapeutics. Our analyses sheds light on the fundamental biological understanding of gene regulation and interaction, and opens up exciting avenues of translational applications in risk prediction and therapeutics

    Étude de la voie de signalisation de l’insuline chez la drosophile par une approche phosphoprotĂ©omique

    Full text link
    La phosphorylation est une modification post-traductionnelle modulant l’activitĂ©, la conformation ou la localisation d’une protĂ©ine et rĂ©gulant divers processus. Les kinases et phosphatases sont responsables de la dynamique de phosphorylation et agissent de maniĂšre coordonnĂ©e. L’activation anormale ou la dĂ©rĂ©gulation de kinases peuvent conduire au dĂ©veloppement de cancers ou de dĂ©sordres mĂ©taboliques. Les rĂ©cepteurs tyrosine kinase (RTKs) sont souvent impliquĂ©s dans des maladies et la comprĂ©hension des mĂ©canismes rĂ©gissant leur rĂ©gulation permet de dĂ©terminer les effets anticipĂ©s sur leurs substrats. Dans ce contexte, le but de cette thĂšse est d’identifier les Ă©vĂšnements de phosphorylation intervenant dans la voie de l’insuline chez la drosophile impliquant un RTK : le rĂ©cepteur de l’insuline (InR). La cascade de phosphorylation dĂ©clenchĂ©e suite Ă  l’activation du rĂ©cepteur est conservĂ©e chez le mammifĂšre. Afin d’étudier le phosphoprotĂ©ome de cellules S2 de drosophile, nous avons utilisĂ© une Ă©tape d’enrichissement de phosphopeptides sur dioxyde de titane suivie de leur sĂ©paration par chromatographie liquide (LC) et mobilitĂ© ionique (FAIMS). Les phosphopeptides sont analysĂ©s par spectromĂ©trie de masse en tandem Ă  haute rĂ©solution. Nous avons d’abord dĂ©montrĂ© les bĂ©nĂ©fices de l’utilisation du FAIMS comparativement Ă  une Ă©tude conventionnelle en rapportant une augmentation de 50 % dans le nombre de phosphopeptides identifiĂ©s avec FAIMS. Cette technique permet de sĂ©parer des phosphoisomĂšres difficilement distinguables par LC et l’acquisition de spectres MS/MS distincts oĂč la localisation prĂ©cise du phosphate est dĂ©terminĂ©e. Nous avons appliquĂ© cette approche pour l’étude des phosphoprotĂ©omes de cellules S2 contrĂŽles ou traitĂ©es Ă  l’insuline et avons identifiĂ© 32 phosphopeptides (sur 2 660 quantifiĂ©s) pour lesquels la phosphorylation est modulĂ©e. Étonnamment, 50 % des cibles rĂ©gulĂ©es possĂšdent un site consensus pour la kinase CK2. Une stratĂ©gie d’inhibition par RNAi a Ă©tĂ© implĂ©mentĂ©e afin d’investiguer le rĂŽle de CK2 dans la voie de l’insuline. Nous avons identifiĂ© 6 phosphoprotĂ©ines (CG30085, su(var)205, scny, protein CDV3 homolog, D1 et mu2) positivement rĂ©gulĂ©es suite Ă  l’insuline et nĂ©gativement modulĂ©es aprĂšs le traitement par RNAi CK2. Par essai kinase in vitro, nous avons identifiĂ© 29 cibles directes de CK2 dont 15 corrĂ©laient avec les rĂ©sultats obtenus par RNAi. Nous avons dĂ©montrĂ© que la phosphorylation de su(var)205 (S15) Ă©tait modulĂ©e par l’insuline en plus d’ĂȘtre une cible directe de CK2 suite Ă  l’expĂ©rience RNAi et Ă  l’essai kinase. L’analyse des donnĂ©es phosphoprotĂ©omiques a mis en Ă©vidence des phosphopeptides isomĂ©riques dont certains Ă©taient sĂ©parables par FAIMS. Nous avons dĂ©terminĂ© leur frĂ©quence lors d’études Ă  grande Ă©chelle grĂące Ă  deux algorithmes. Le script basĂ© sur les diffĂ©rences de temps de rĂ©tention entre isomĂšres a identifiĂ© 64 phosphoisomĂšres sĂ©parĂ©s par LC chez la souris et le rat (moins de 1 % des peptides identifiĂ©s). Chez la drosophile, 117 ont Ă©tĂ© rĂ©pertoriĂ©s en combinaison avec une approche ciblĂ©e impliquant des listes d’inclusion. Le second algorithme basĂ© sur la prĂ©sence d’ions caractĂ©ristiques suite Ă  la fragmentation de formes qui co-Ă©luent a rapportĂ© 23 paires isomĂ©riques. L’importance de pouvoir distinguer des phosphoisomĂšres est capitale dans le but d’associer une fonction biologique Ă  un site de phosphorylation prĂ©cis qui doit ĂȘtre identifiĂ© avec confiance.Phosphorylation is a reversible post-translational modification that modulates protein activity, and can impart conformational changes and affect translocation of their protein substrates. Kinases and phosphatases are responsible for the dynamic of changes in protein phosphorylation and act in a coordinated manner. Abnormal activation or misregulation of kinase activity can lead to the development of cancers and metabolic disorders. Tyrosine kinase receptor (RTK) associated signaling pathways are often implicated in numerous diseases and the further understanding of mechanisms affecting their regulation is necessary to determine their activity and effects anticipated on their substrates. In this context, the primary objective of this thesis is to study the phosphorylation events arising from the activation of the insulin receptor (InR) following stimulation of drosophila S2 cells with insulin. The phosphorylation cascade triggered after InR activation is conserved in mammals. In order to study the phosphoproteome of drosophila S2 cells, we enriched phosphopeptides on titanium dioxide (TiO2) stationary phase prior to their separation by liquid chromatography (LC) and ion mobility (FAIMS) mass spectrometry (MS). Phosphopeptides were then analysed by tandem MS at high resolution. We first compared the benefits of FAIMS to conventional LC-MS, and observed a 50% increase in the number of identified phosphopeptides when using ion mobility. FAIMS enables the separation of phosphoisomers that are typically unresolved by LC, enabling high confidence assignment of modification sites via distinct MS/MS spectra. This approach was used to profile phosphorylation changes taking place between control and insulin-treated drosophila cells and enabled the identification of 32 phosphopeptides (out of 2 660 quantified) showing differential regulation. Interestingly, 50% of the regulated targets have a CK2 consensus site. These preliminary experiments were followed-up by RNAi mediated inhibition of CK2 and revealed that 6 phosphoproteins (CG30085, su(var)205, scny, protein CDV3 homolog, D1 and mu2) were positively modulated after insulin stimulation and negatively regulated after CK2 RNAi treatment. Using in vitro kinase assay, we identified 29 direct CK2 targets, of which 15 were correlated with results from the CK2 RNAi experiment. We demonstrated specifically that the su(var)205 (S15) is regulated by insulin and is a direct CK2 target based on RNAi and kinase assays. Our phosphoproteomics data also highlighted the presence of isomeric phosphopeptides, several of which could be distinguished using FAIMS. We developed two algorithms to determine the occurrence of phosphoisomers in large scale studies. The first algorithm based on differences in retention times between isomers identified 64 candidates in mouse and rat phosphoproteome datasets corresponding to less than 1% of all identified phosphopeptides. We also identified 117 isomer candidates in drosophila using a targeted LC-MS/MS approach with inclusion lists. The second algorithm is based on the presence of characteristic fragment ions present in MS/MS spectra of co-eluting or partially resolved species and allowed the identification of 23 isomeric pairs. The ability to distinguish phosphoisomers in large-scale phosphoproteome datasets is of significance to correlate phosphorylation events taking place on specific residues with biological activities

    Discovering properties of new DNA-binding activity of proteins

    Get PDF
    Protein-DNA interactions are an essential feature in the genetic activities of life, and the ability to predict and manipulate such interactions has applications in a wide range of fields. This Thesis presents the methods of modelling the properties of protein-DNA interactions. In particular, it investigates the methods of visualising and predicting the specificity of DNA-binding Cys2His2 zinc finger interaction. The Cys2His2 zinc finger proteins interact via their individual fingers to base pair subsites on the target DNA. Four key residue positions on the a- helix of the zinc fingers make non-covalent interactions with the DNA with sequence specificity. Mutating these key residues generates combinatorial possibilities that could potentially bind to any DNA segment of interest. Many attempts have been made to predict the binding interaction using structural and chemical information, but with only limited success. The most important contribution of the thesis is that the developed model allows for the binding properties of a given protein-DNA binding to be visualised in relation to other protein-DNA combinations without having to explicitly physically model the specific protein molecule and specific DNA sequence. To prove this, various databases were generated, including a synthetic database which includes all possible combinations of the DNA-binding Cys2His2 zinc finger interactions. NeuroScale, a topographic visualisation technique, is exploited to represent the geometric structures of the protein-DNA interactions by measuring dissimilarity between the data points. In order to verify the effect of visualisation on understanding the binding properties of the DNA-binding Cys2His2 zinc finger interaction, various prediction models are constructed by using both the high dimensional original data and the represented data in low dimensional feature space. Finally, novel data sets are studied through the selected visualisation models based on the experimental DNA-zinc finger protein database. The result of the NeuroScale projection shows that different dissimilarity representations give distinctive structural groupings, but clustering in biologically-interesting ways. This method can be used to forecast the physiochemical properties of the novel proteins which may be beneficial for therapeutic purposes involving genome targeting in general

    AN INTEGRATIVE SYSTEMS BIOINFORMATICS APPROACH OF THE ENVIRONMENTAL, GENETIC AND MOLECULAR FACTORS REGULATING SLEEP

    Get PDF
    Environmental changes and genetic variations are two important drivers of biological diversity. In complex traits, a multitude of genetic and environmental factors interact and combine in cryptic ways to direct the phenotypic variation. Sleep is a classic illustration of a complex trait that is vital and heritable but still poorly understood. Many aspects of sleep like the timing, duration and quality are regulated by the interaction of two processes: the circadian oscillations and the sleep homeostasis. In the context of a study that aimed at uncovering more clearly the molecular pathways regulating the sleep homeostat through the ambiguous relationship that exists between sleep- wake cycle and metabolism, we built, assembled, analyzed an extensive multi-scaled dataset using the systems genetics design. Machine learning algorithms and novel high-throughput sequencing technology permit to appraise more precisely and broadly the plethora of physiological and molecular phenotypes that contribute to sleep under disparate circumstances and genetic background, in order to build novel hypotheses based on data-driven discoveries. This dataset is composed of 33 recombinant inbred lines (RIL) from the BXD panel that were interrogated under sleep deprivation and undisturbed conditions for 341 sleep-wake related physiological phenotypes, 124 blood plasma metabolites, and cortical and liver transcriptomics. First analyses pointed out the pervasive effects of sleep deprivation and genetics both at the molecular and behavioral level and the complex interaction between genetic and environmental factors at all phenotypic layers. Then, two novel integrative methods were developed, the first to prioritize candidate genes within large associated genomic regions for physiological or metabolic phenotypes and the second to visualize the meta-dimensionality of the molecular network using the deterministic structure of hiveplots. Our findings led to the discovery of a bidirectional relationship between fatty acid turnover and sleep homeostasis but also between brain slow-waves activity and ionotropic glutamate receptor transport. Using markup language and cloud-based technologies, we aimed at transforming this resourceful, multidisciplinary dataset into an exploitable digital research object. The generation of dynamic analysis reports and workflow metadata promoted the reproducibility this data-object. In addition, tools were developed for the exploration and mining of integrated data. The resulting database and associated web interface ensures the reusability of this dataset and associated methodologies. -- La diversitĂ© biologique est dirigĂ©e par deux opĂ©rateurs importants, les changements environnementaux ainsi que les variations gĂ©nĂ©tiques. Pour les traits dits complexe, leur variation est le fruit de nombreux facteurs gĂ©nĂ©tiques et environnementaux qui vont interagir et se combiner, souvent de maniĂšre cryptique. Le sommeil est un exemple-type de trait complexe, il est vital et hĂ©ritable mais fondamentalement mĂ©connu. La rĂ©gulation de nombreux aspects du sommeil comme sa durĂ©e, timing ou qualitĂ© fait intervenir deux processus : les oscillations circadiennes et l’homĂ©ostasie du sommeil. Afin de mieux cerner les voies qui rĂ©gulent le mĂ©canisme d’homĂ©ostasie du sommeil, en particulier celle mĂȘlant le mĂ©tabolisme, nous avons crĂ©Ă©, assemblĂ© et analysĂ© un grand set de donnĂ©es en utilisant une approche dite de gĂ©nĂ©tique des systĂšmes. Avec l’aide d’algorithmes d’apprentissage automatique et de nouvelles technologies de sĂ©quençage Ă  haut-dĂ©bit, nous avons pu mesurer dans des conditions et contextes gĂ©nĂ©tiques diffĂ©rents de nombreux phĂ©notypes molĂ©culaires ou physiologiques qui contribuent Ă  la rĂ©gulation du sommeil. Notre approche Ă©tant ainsi principalement axĂ©e sur la construction d’hypothĂšse guidĂ©e par les donnĂ©es. Ce set est composĂ© de 33 lignĂ©es de souris consanguines recombinantes (BXD) dont on a examinĂ©, dans des conditions de privation de sommeil et de contrĂŽle : 341 phĂ©notypes physiologiques liĂ©s au sommeil et Ă  l’éveil, 124 mĂ©tabolites du plasma sanguin, ainsi que leur transcriptome du cortex et du foie. Les premiĂšres analyses ont pointĂ© l’effet aigu de la privation de sommeil, de la gĂ©nĂ©tique ainsi que leur interaction sur tous les niveaux de phĂ©notypes. Ensuite, deux nouvelles mĂ©thodes d’intĂ©gration ont Ă©tĂ© dĂ©veloppĂ©es, la premiĂšre pour prioritiser les gĂšnes opĂ©rateurs du sommeil et du mĂ©tabolisme Ă  l’intĂ©rieur de grande rĂ©gion gĂ©nomique, la deuxiĂšme pour visualiser la mĂ©ta-dimensionalitĂ© des donnĂ©es molĂ©culaires via une structure de ‘hiveplot’. Nous avons mis en avant une relation bidirectionnelle entre les modifications d’acides gras et l’homĂ©ostasie du sommeil, ainsi que l’activitĂ© des ondes lentes du cerveau et le transport de rĂ©cepteur au glutamate ionotropique. En utilisant le langage de balisage ainsi que des technologies basĂ©es sur le cloud, nous avons cherchĂ© Ă  transformer ce jeu de donnĂ©es en un objet de recherche numĂ©rique. La reproductibilitĂ© de cet objet a Ă©tĂ© amĂ©liorĂ©e par la gĂ©nĂ©ration de rapports d'analyse dynamiques ainsi que de mĂ©tadonnĂ©es. De plus, des outils ont Ă©tĂ© dĂ©veloppĂ©s pour l'exploration et l'extraction de donnĂ©es via une interface web et assurent ainsi la rĂ©utilisation de ce set et de ces mĂ©thodologies associĂ©es

    Identification of Regulatory Binding Sites on mRNA Using in Vivo Derived Informations and SVMs

    No full text
    International audienc

    NOTIFICATION !!!

    Get PDF
    All the content of this special edition is retrieved from the conference proceedings published by the European Scientific Institute, ESI. http://eujournal.org/index.php/esj/pages/view/books The European Scientific Journal, ESJ, after approval from the publisher re publishes the papers in a Special edition

    NOTIFICATION !!!

    Get PDF
    All the content of this special edition is retrieved from the conference proceedings published by the European Scientific Institute, ESI. http://eujournal.org/index.php/esj/pages/view/books The European Scientific Journal, ESJ, after approval from the publisher re publishes the papers in a Special edition

    NOTIFICATION !!!

    Get PDF
    All the content of this special edition is retrieved from the conference proceedings published by the European Scientific Institute, ESI. http://eujournal.org/index.php/esj/pages/view/books The European Scientific Journal, ESJ, after approval from the publisher re publishes the papers in a Special edition
    corecore