703 research outputs found

    Relationship between Insertion/Deletion (Indel) Frequency of Proteins and Essentiality

    Get PDF
    Background: In a previous study, we demonstrated that some essential proteins from pathogenicorganisms contained sizable insertions/deletions (indels) when aligned to human proteins of highsequence similarity. Such indels may provide sufficient spatial differences between the pathogenicprotein and human proteins to allow for selective targeting. In one example, an indel difference wastargeted via large scale in-silico screening. This resulted in selective antibodies and smallcompounds which were capable of binding to the deletion-bearing essential pathogen proteinwithout any cross-reactivity to the highly similar human protein. The objective of the current studywas to investigate whether indels were found more frequently in essential than non-essentialproteins.Results: We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomycescerevisiae, for which high-quality protein essentiality data is available. Using these data, wedemonstrated with t-test calculations that the mean indel frequencies in essential proteins weregreater than that of non-essential proteins in the three proteomes. The abundance of indels in bothtypes of proteins was also shown to be accurately modeled by the Weibull distribution. However,Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not beused as a marker to accurately discriminate between essential and non-essential proteins in thethree proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae andobserved that indel-bearing proteins were involved in more interactions and had greaterbetweenness values within Protein Interaction Networks (PINs).Conclusion: Overall, our findings demonstrated that indels were not randomly distributed acrossthe studied proteomes and were likely to occur more often in essential proteins and those thatwere highly connected, indicating a possible role of sequence insertions and deletions in theregulation and modification of protein-protein interactions. Such observations will provide newinsights into indel-based drug design using bioinformatics and cheminformatics tools

    Targeting Protein-Protein Interactions for Parasite Control

    Get PDF
    Finding new drug targets for pathogenic infections would be of great utility for humanity, as there is a large need to develop new drugs to fight infections due to the developing resistance and side effects of current treatments. Current drug targets for pathogen infections involve only a single protein. However, proteins rarely act in isolation, and the majority of biological processes occur via interactions with other proteins, so protein-protein interactions (PPIs) offer a realm of unexplored potential drug targets and are thought to be the next-generation of drug targets. Parasitic worms were chosen for this study because they have deleterious effects on human health, livestock, and plants, costing society billions of dollars annually and many sequenced genomes are available. In this study, we present a computational approach that utilizes whole genomes of 6 parasitic and 1 free-living worm species and 2 hosts. The species were placed in orthologous groups, then binned in species-specific ortholgous groups. Proteins that are essential and conserved among species that span a phyla are of greatest value, as they provide foundations for developing broad-control strategies. Two PPI databases were used to find PPIs within the species specific bins. PPIs with unique helminth proteins and helminth proteins with unique features relative to the host, such as indels, were prioritized as drug targets. The PPIs were scored based on RNAi phenotype and homology to the PDB (Protein DataBank). EST data for the various life stages, GO annotation, and druggability were also taken into consideration. Several PPIs emerged from this study as potential drug targets. A few interactions were supported by co-localization of expression in M. incognita (plant parasite) and B. malayi (H. sapiens parasite), which have extremely different modes of parasitism. As more genomes of pathogens are sequenced and PPI databases expanded, this methodology will become increasingly applicable

    Integrative computational approaches to study protein-nucleic acid interactions

    Get PDF
    Interactions between proteins and nucleic acid molecules are central to the cellular regulation and homeostasis. To study them, I employ a wide range of computational analysis methods to integrate genomic data from many types of experiment. This thesis has three parts. In the first part, I explore the patterns of indels created by CRISPR-Cas9 genome editing. By thorough characterisation of the precision of editing at thousands of genomic target sites, we identify simple sequence rules that can help predict these outcomes. Furthermore, we examine the role of the structural chromatin context in fine-tuning Cas9-DNA interactions. In the second part, I explore methods to study protein-RNA interactions. I use comparative computational analyses to assess both the data quality of, and data analysis methods for, different crosslinking and immunoprecipitation (CLIP) technologies. I then develop new methods to analyse data generated by hybrid individual-nucleotide resolution CLIP (hiCLIP). By tailoring computational solutions to an understanding of experimental conditions, I improve the overall sensitivity of hiCLIP, and ultimately feedback to drive ongoing experimental development. In the third part, I focus on the Staufen family of double-stranded RNA binding proteins and using hiCLIP data to define transcriptome-wide atlases of RNA duplexes bound by these proteins both in a cell line and in rat brain tissue. Through integration with other data sets, both publicly available and newly generated, I derive insights into their function in RNA metabolism, and in how these interactions change during the course of mammalian brain development with putative roles in ribonucleoprotein complex formation. In summary, I present a range of tailored computational methods and analyses developed to understand interactions between proteins and nucleic acids; aiming to link these interactions to functional outcomes

    Characterising two newly identified Arabidopsis thaliana SUMO proteases

    Get PDF
    As food production efforts are under escalating threat particularly with abiotic and biotic stresses depleting crop yield, there is an increasing need to understand and manipulate the plant stress signalling pathways to generate stress-resilient crops. Recently, the post-translational modification (PTM) system, Small Ubiquitin-like Modifier (SUMO), has been shown to regulate a wide spectrum of plant adaptation processes. The research in this thesis explores our current knowledge of the SUMO pathway and investigates the SUMO proteases regulating deSUMOylation. Two proteases from a newly discovered class of SUMO proteases, deSUMOylating Isopeptidases (DeSis), in Arabidopsis thaliana (Arabidopsis) were extensively investigated in this study. The At4g25660 (AT60) and At4g25680 (AT80) DeSi proteases, displayed similar characteristics to one another and were both found to localise outside the nucleus towards the plasma membrane. An in vitro deSUMOylation assay displayed signs of the SUMO protease activity of the AT60 protease. Although functional redundancy was speculated between the two DeSi proteases, findings suggested unequal redundancy was more likely with AT80 being more important. Double knockout (KO) AT60-AT80 mutants using the clustered regularly interspaced short palindromic repeats (CRISPR) system and single AT60 and AT80 overexpressing transgenics were generated and subjected to stress-response assays. AT60-AT80KO mutants were hypersensitive to the presence of the stress modulator phytohormone, abscisic acid (ABA), and the pathogen response elicitor, flg22. Overexpressing lines displayed either no difference or increased tolerance to the stress elicitors relative to wild-type (WT) plants. The findings provided evidence that the AT60 DeSi protease was implicated in negatively regulating ABA signalling and plant immune responses. The AT80 protease was found to play a regulatory role in ABA and immune signalling responses, as well as showing potential implications in pathogen-induced guard cell responses. This study provides evidence the two DeSi proteases play a significant role in regulating the stress-induced growth and defence responses in Arabidopsis

    Pan-cancer analysis of post-translational modifications reveals shared patterns of protein regulation

    Get PDF
    Post-translational modifications (PTMs) play key roles in regulating cell signaling and physiology in both normal and cancer cells. Advances in mass spectrometry enable high-throughput, accurate, and sensitive measurement of PTM levels to better understand their role, prevalence, and crosstalk. Here, we analyze the largest collection of proteogenomics data from 1,110 patients with PTM profiles across 11 cancer types (10 from the National Cancer Institute\u27s Clinical Proteomic Tumor Analysis Consortium [CPTAC]). Our study reveals pan-cancer patterns of changes in protein acetylation and phosphorylation involved in hallmark cancer processes. These patterns revealed subsets of tumors, from different cancer types, including those with dysregulated DNA repair driven by phosphorylation, altered metabolic regulation associated with immune response driven by acetylation, affected kinase specificity by crosstalk between acetylation and phosphorylation, and modified histone regulation. Overall, this resource highlights the rich biology governed by PTMs and exposes potential new therapeutic avenues

    Implementing the CRISPR/Cas9 technology in Eucalyptus hairy roots and functional characterization of auxin-dependent transcription factors involved in wood formation

    Get PDF
    Les Eucalyptus sont les feuillus les plus plantĂ©s au monde pour les nombreuses utilisations industrielles de leurs bois tells que la pĂąte Ă  papier et la production Ă©mergente de biocarburants. L'analyse du gĂ©nome d'Eucalyptus grandis a conduit Ă  l'identification de nombreux candidats impliquĂ©s dans la formation du bois, tells que des mĂ©diateurs clĂ©s de la signalization de l'auxine (Aux/IAA et Auxin Response Factor (ARF). La caractĂ©risation fonctionnelle de ces gĂšnes candidats a Ă©tĂ© retardĂ©e jusqu'Ă  prĂ©sent par la difficultĂ© de supprimer leurs fonctions dans un systĂšme homologue. Pour pallier Ă  cela, le premier objectif de mon travail a Ă©tĂ© de mettre au point le puissant outil d'Ă©dition de gĂšnes "CRISPR/Cas9" en profitant de la transformation de "hairy roots" transgĂ©niques mĂ©diĂ©e par A. rhizogenes, rĂ©cemment dĂ©veloppĂ©e dans l'Ă©quipe. Dans un deuxiĂšme temps, mon objectif Ă©tait d'utiliser cette mĂ©thode d'Ă©dition de gĂ©nome pour Ă©tudier les rĂŽles potentiels de trois facteurs de transcription dĂ©pendant de l'auxine (IAA9A, IAA20 et ARF5) dans la formation du bois d'Eucalyptus. PremiĂšrement, comme preuve de concept pour la mise en oeuvre de la technologie CRISPR/Cas9, nous avons ciblĂ© la Cinnamoyl-CoA rĂ©ductase1 (CCR1), un gĂšne clĂ© de la biosynthĂšse de la lignine dont les effets de "down-regulation" sont bien connus. Nous avons Ă©galement utilisĂ© le gĂšne IAA9A comme cible. Presque toutes les lignĂ©es transgĂ©niques ont Ă©tĂ© Ă©ditĂ©es, mais les taux et les profils d'Ă©dition allĂ©liques variaient considĂ©rablement selon le gĂšne ciblĂ©. La plupart des Ă©vĂ©nements d'Ă©dition ont gĂ©nĂ©rĂ© des protĂ©ines tronquĂ©es. En utilisant une combinaison de spectroscopie Ă  infrarouge transformĂ©e de Fourier (FT-IR) et d'analyse multivariĂ©e (PLS-DA), j'ai pu montrer que les lignĂ©es Ă©ditĂ©es pour CCR1, Ă©taient clairement sĂ©parĂ©es des tĂ©moins. Les analyses histochimiques ont confirmĂ© la diminution de la lignification et la prĂ©sence de vaisseaux Ă©crasĂ©s dans les lignĂ©es Ă©ditĂ©es pour CCR1, qui sont des caractĂ©ristiques de la dĂ©ficience de ce gĂšne. Bien que l'efficacitĂ© de l'Ă©dition puisse ĂȘtre amĂ©liorĂ©e, la mĂ©thode dĂ©crite ici est dĂ©jĂ  un outil utile pour caractĂ©riser fonctionnellement des gĂšnes chez l'Eucalyptus. Dans la deuxiĂšme partie de mon travail, j'ai utilisĂ© cette mĂ©thode d'Ă©dition du gĂ©nome pour muter deux Aux/IAAs (IAA9A et IAA20) ainsi que ARF5 afin de mieux apprĂ©hender le rĂŽle de l'auxine dans la rĂ©gulation de la formation du bois chez l'Eucalyptus. J'ai gĂ©nĂ©rĂ© des "hairy roots" soit pour surexprimer ces gĂšnes, soit pour les muter par CRISPR/Cas9. Malheureusement, toutes les plantes transgĂ©niques surexprimant IAA9A et IAA20 (sous le contrĂŽle du promoteur CaMV35S) sont mortes pendant la pĂ©riode de confinement liĂ©e au Covid19 et seules trois lignĂ©es CRISPR-IAA20 ont survĂ©cu. Par consĂ©quent, je n'ai pu analyser que des plantes transgĂ©niques Ă©ditĂ©es pour deux candidats. Les lignĂ©es IAA9A gĂ©nĂ©rĂ©es par CRISPR/Cas9 prĂ©sentaient des taux de knock-out Ă©levĂ©s de 92,3% avec 58,3% de mutations biallĂ©liques. En revanche, les lignĂ©es ARF5 avaient des taux d'Ă©dition assez faibles (43%) et des mutations monoallĂ©liques et/ou chimĂ©riques. Dans les lignĂ©es Ă©ditĂ©es pour IAA9A, nous avons observĂ© un dĂ©veloppement prĂ©coce du xylĂšme et une augmentation du diamĂštre des vaisseaux du xylĂšme. Enfin, j'ai participĂ© au criblage d'une banque double hybride de xylĂšme d'Eucalyptus (Y2H) pour trouver des partenaires potentiels de IAA9A et IAA20. Pour IAA9A, des candidats prometteurs ont Ă©tĂ© obtenus tels que Histone Linker (EgH1.3) et CCoAOMT2, connus comme Ă©tant impliquĂ©s dans la formation du xylĂšme ; pour IAA20, l'interacteur principal est IAA9A, ce qui suggĂšre que IAA20 et IAA9A forment des dimĂšres pour rĂ©guler la formation du bois. Nous avons Ă©galement utilisĂ© la mĂ©thode double hybride ciblĂ©e pour confirmer les interactions protĂ©ine-protĂ©ine d'IAA9A et d'IAA20 avec ARF5 ainsi qu'avec d'autres candidats prĂ©fĂ©rentiellement exprimĂ©s dans le xylĂšme d'Eucalyptus.Eucalyptus is the most planted hardwood worldwide for many industrial end-uses such as pulp and paper and emerging biofuel production. The analysis of the Eucalyptus grandis genome led to many candidate genes involved in wood formation including key mediators of auxin signaling (Auxin/Indole-3-Acetic Acid (Aux/IAA) and Auxin Response Factor (ARF). The functional characterization of these candidate genes was hampered by the difficulty to general stable transgenic Eucalyptus and to knock out these genes. Taking advantage of rapid and efficient hairy root transformation mediated by A.rhizogenes, recently implemented by our team, the objectives of my work were to implement the powerful CRISPR/Cas9 gene editing tool and to use it to investigate the potential roles of three Eucalyptus auxin-dependent transcription factors (IAA9A, IAA20 and ARF5) in regulating wood formation. First, as a proof-of-concept for implementing CRISPR/Cas9, We targeted Cinnamoyl-CoA Reductase1 (CCR1), a key lignin biosynthetic gene whose down-regulation effects are well described in several plants. Almost all transgenic lines were edited but the allele-editing rates and profiles varied greatly depending on the genes targeted. Most edition events generated truncated proteins. The prevalent edition types were small deletions but large deletions were also observed. By using a combination of Fourier Transformed InfraRed (FT-IR) spectroscopy and multivariate analysis (partial least square analysis (PLS-DA), we showed that the CCR1-edited lines, which were clearly separated from the controls. The most discriminant wave-numbers were attributed to lignin. Histochemical analyses further confirmed the decreased lignification and the presence of collapsed vessels in CCR1-edited lines, which are characteristics of CCR1 deficiency. Although the efficiency of editing could be improved, the method described here is already a useful tool to functionally characterize eucalypts genes. In the second part of my work, we used this genome editing method to knock out two Aux/IAAs (IAA9A and IAA20) and one Auxin Response Factor (ARF5) in order to get more insights into the role of auxin in the regulation of wood formation in Eucalyptus. We generated transgenic Eucalyptus hairy root to overexpress and to knock out these genes. Unfortunately, all the transgenic plants overexpressing IAA9A and IAA20 (under the control of 35S promoter) died during the Covid19 lockdown period and only three IAA20-CRISPR lines survived. Therefore, we could only analyze CRISPR/Cas9 edited transgenic plants for two candidates (IAA9A and ARF5). Editing events were detected either by subcloning and/or webbased tools (DSDecode and ICE synthego). CRISPR/Cas9 generated IAA9A_lines had high knockout rates of 92.3% with 58.3% of biallelic mutations. In contrast, ARF5 lines had quite low editing rates (43%) showing monoallelic and chimera mutations. In IAA9A_edited lines we observed precocious xylem development and increased xylem vessel diameters, while no obvious phenotype was detected in ARF5_edited lines. Finally, we screened a Eucalyptus developing xylem Yeast Two-Hybrid (Y2H) library to find potential partners of IAA9A and IAA20. For IAA9A, we found some potentially promising candidates such as Histone Linker (EgH1.3), CCoAOMT2 previously reported to be involved in xylem formation; for IAA20 the main interactor revealed was IAA9A, suggesting that IAA20 and IAA9A form dimers in developing xylem to regulate wood formation. In addition, we used the yeast two hybrid method to confirm protein-protein interactions of EgrIAA9A and EgrIAA20 with EgrARF5 and other candidates preferentially expressed in Eucalyptus wood-forming tissue

    Karyotype diversification in colorectal cancer

    Get PDF
    From the moment of conception, every human being is in the process of developing cancer. Whether or not you will ultimately be diagnosed with this disease is simply a matter of whether something else kills you first. Cancer is not one disease. Rather, the radical growth of malignant cells represents a phenotypical extreme of cells that escape homeostasis. There are astronomically many potential combinations of genetic, epigenetic and environmental influences that can push cells into malignancy. Therefore, not only is every type of cancer different, but every instance of cancer is its own unique disease. The standardized treatments given to patients with similar types of malignancy are simplified abstractions, born from our lack of understanding of optimal treatment. The development of personalized cancer treatment options is one of the defining goals of modern medicine. Fundamental research plays an important role in progressing toward this goal, since without a true understanding of cancer biology we cannot hope to develop truly personalized treatment protocols. This thesis describes technological advances aimed at investigating fundamental cancer biology using patient derived organoids as a model system. Patient derived organoid culture protocols allow in vitro culture of three-dimensional (3D) wild-type and malignant tissue structures. By allowing differentiated outgrowth in 3D space, organoid culture protocols more accurately recapitulate tissue composition, density and cell division dynamics of human tumors in vivo. When combined with fluorescent live cell imaging, patient derived tumor organoids provide spectacular tools to support fundamental cancer research.One of the protocols we developed is called 3D Live-Seq, which combines imaging of tumor organoid outgrowth and single-cell sequencing of each imaged cell to reconstruct evolving tumor karyotypes (the number of chromosomes or sub chromosomal fragments per cell) across consecutive cell generations. By using 3D Live-Seq, we showed that advanced colorectal cancer cells continuously generate both singular and very complex chromosome errors during outgrowth of a tumor. Collectively, these cells represent a body of genetic diversity from which treatment resistant or more aggressive subclones may emerge during disease progression. <br/

    Statistics and Evolution of Functional Genomic Sequence

    Get PDF
    In this thesis, three separate problems of genomics are addressed, utilizing methods related to the field of statistical mechanics. The goal of the project discussed in the first chapter is the elucidation of post-transcriptional gene regulation imposed by microRNAs, a recently discovered class of tiny non-coding RNAs. A probabilistic algorithm for the computational identification of genes regulated by microRNAs is introduced, which was developed based on experimental data and statistical analysis of whole genome data. In particular, the application of this algorithm to multiple-alignments of groups of related species allows for the specific and sensitive detection of genes targeted by microRNAs on a genome-wide level. Examination of clade-specific predictions and cross-clade comparison yields deeper insights into microRNA biology and first clues about long-term evolution of microRNA regulation, which are discussed in detail. Modeling evolutionary dynamics of microsatellites, an abundant class of repetitive sequence in eukaryotic genomes, was the objective of the second project and is discussed in chapter two. Inspired by the putative functionality of some of these elements and the difficulty of constructing correct sequence alignments that reflect the evolutionary relationships between microsatellites, a neutral model for microsatellite evolution is developed and tested in the fruit fly Drosophila melanogaster by comparing evolutionary rates predicted by the model to independent measurements of these rates from multiple alignments of three closely relates Drosophila species. The model is applied separately to genomic sequence categories of different functional annotations in order to assess the varying influence of selective constraint among these categories. In the last chapter, a general population genetic model is introduced that allows for the determination of transcription factor binding site stability as a function of selection strength, mutation rate and effective population size at arbitrary values of these parameters. The analytical solution of this model indicates the probability of a binding site to be functional. The model is used to compute the population fraction of functional binding sites at fixed selection pressure across a variety of different taxa. The results lead to the conclusion that a decreasing effective population size, such as observed at the evolutionary transition from prokaryotes to eukaryotes, could result in loss of binding site stability. An extension to our model serves us to assess the compensatory effect of the emergence of multiple binding sites for the same transcription factor in order to maintain the existing regulatory relationship
    • 

    corecore