202 research outputs found

    Relationship between Insertion/Deletion (Indel) Frequency of Proteins and Essentiality

    Get PDF
    Background: In a previous study, we demonstrated that some essential proteins from pathogenicorganisms contained sizable insertions/deletions (indels) when aligned to human proteins of highsequence similarity. Such indels may provide sufficient spatial differences between the pathogenicprotein and human proteins to allow for selective targeting. In one example, an indel difference wastargeted via large scale in-silico screening. This resulted in selective antibodies and smallcompounds which were capable of binding to the deletion-bearing essential pathogen proteinwithout any cross-reactivity to the highly similar human protein. The objective of the current studywas to investigate whether indels were found more frequently in essential than non-essentialproteins.Results: We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomycescerevisiae, for which high-quality protein essentiality data is available. Using these data, wedemonstrated with t-test calculations that the mean indel frequencies in essential proteins weregreater than that of non-essential proteins in the three proteomes. The abundance of indels in bothtypes of proteins was also shown to be accurately modeled by the Weibull distribution. However,Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not beused as a marker to accurately discriminate between essential and non-essential proteins in thethree proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae andobserved that indel-bearing proteins were involved in more interactions and had greaterbetweenness values within Protein Interaction Networks (PINs).Conclusion: Overall, our findings demonstrated that indels were not randomly distributed acrossthe studied proteomes and were likely to occur more often in essential proteins and those thatwere highly connected, indicating a possible role of sequence insertions and deletions in theregulation and modification of protein-protein interactions. Such observations will provide newinsights into indel-based drug design using bioinformatics and cheminformatics tools

    Persistence drives gene clustering in bacterial genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene clustering plays an important role in the organization of the bacterial chromosome and several mechanisms have been proposed to explain its extent. However, the controversies raised about the validity of each of these mechanisms remind us that the cause of this gene organization remains an open question. Models proposed to explain clustering did not take into account the function of the gene products nor the likely presence or absence of a given gene in a genome. However, genomes harbor two very different categories of genes: those genes present in a majority of organisms – persistent genes – and those present in very few organisms – rare genes.</p> <p>Results</p> <p>We show that two classes of genes are significantly clustered in bacterial genomes: the highly persistent and the rare genes. The clustering of rare genes is readily explained by the selfish operon theory. Yet, genes persistently present in bacterial genomes are also clustered and we try to understand why. We propose a model accounting specifically for such clustering, and show that indispensability in a genome with frequent gene deletion and insertion leads to the transient clustering of these genes. The model describes how clusters are created via the gene flux that continuously introduces new genes while deleting others. We then test if known selective processes, such as co-transcription, physical interaction or functional neighborhood, account for the stabilization of these clusters.</p> <p>Conclusion</p> <p>We show that the strong selective pressure acting on the function of persistent genes, in a permanent state of flux of genes in bacterial genomes, maintaining their size fairly constant, that drives persistent genes clustering. A further selective stabilization process might contribute to maintaining the clustering.</p

    The role of the human INO80 complex in telomere maintenance

    Full text link
    Les extrémités des chromosomes contiennent des répétitions de séquences d’ADN appelées télomères qui empêchent l’activation inopportune de la réponse aux dommages de l'ADN afin de préserver l'intégrité génomique. Les télomères raccourcissent à chaque cycle de réplication d’ADN et la télomérase a pour fonction de contrebalancer cette érosion en allongeant les télomères. Les cellules somatiques n’expriment pas la télomérase, donc leur durée de vie est normalement limitée par ce raccourcissement progressif des télomères qui conduit à l'activation de la voie p53 entraînant un arrêt de la croissance cellulaire. En revanche, les cellules cancéreuses acquièrent l'immortalité cellulaire principalement en réactivant la télomérase ou en utilisant des méthodes alternatives d'allongement des télomères basées sur la recombinaison d’ADN. Auparavant, dans notre laboratoire, un criblage CRISPR à l'échelle du génome a été réalisé dans la lignée cellulaire pré-B NALM-6 traitée avec la molécule BIBR1532, un inhibiteur de la télomérase. Ces résultats suggéraient que cinq sous-unités du complexe de remodelage de la chromatine INO80, lorsque supprimées indépendamment, réduisaient la prolifération des cellules ayant un raccourcissement des télomères induit par le BIBR1532. Mon objectif était d'étudier cette interaction génétique afin de comprendre les processus biologiques impliqués dans cette létalité synthétique. Après l'élimination des gènes codant à la fois pour la sous-unité enzymatique de la télomérase humaine (hTERT) ainsi que les sous-unités spécifiques du complexe INO80 humain, nous avons constaté que les cellules double-négatives avaient une capacité proliférative réduite, ce qui démontre que l’interaction génétique mesurée par criblage CRISPR est bel et bien spécifique. Étant donné le rôle du facteur de transcription p53 dans la réponse cellulaire au raccourcissement télomérique, nous avons exploré l’importance de cette voie de signalisation pour l’interaction entre le complexe INO80 humain et la télomérase. Après l’activation de p53 avec un traitement avec la molécule nutlin-3a, les niveaux d'expression de plusieurs cibles de p53 tels que MDM2 et CDKN1A ont augmenté dans les cellules ayant une délétion du gène NFRKB, codant pour une sous-unité du complexe INO80 humain. Les cellules ayant une délétion du gène UCHL5, codant pour le partenaire d’interaction de NFRKB, ont également montré une augmentation de l’expression de MDM2 lorsque traitées avec nutlin-3a. Enfin, la perte de télomérase (hTERT) modifie les niveaux d'expression des composants de la 2 voie p53 CDKN1A, BAX et MDM2. En conclusion, la suppression des gènes codant pour des sous-unités du complexe INO80 telles que NFRKB ou UCHL5 est nuisible aux cellules ayant une délétion de la télomérase. Le complexe INO80 humain peut être impliqué dans l'inhibition de la voie p53, en réponse à l'activation de p53 soit par des télomères courts ou avec un traitement avec nutlin-3a. Des recherches plus approfondies sur cette interaction génétique pourraient mener au développement de nouvelles thérapies combinatoires afin d’inhiber la croissance des cellules cancéreuses.The ends of chromosomes contain telomeric repeats that prevent the DNA damage response from being activated in order to preserve genomic integrity. Telomerase functions to alleviate incomplete DNA replication at telomeres, and to repair those telomeres damaged by various means including oxidative damage. The lifespan of telomerase negative somatic cells is normally restricted by gradual telomere shortening which can lead to the activation of the p53 pathway resulting in cellular growth arrest. Cancer cells often elongate their telomeres in order to acquire cellular immortality predominantly by reactivating telomerase or by using recombination-based, alternative telomere lengthening methods. Previously in our lab, a genome-wide CRISPR screen was conducted in the pre-B cell line NALM-6 treated with a small molecule inhibitor of telomerase, BIBR1532. These previous results suggested that five subunits of the INO80 chromatin-remodeling complex, when independently deleted, reduced cellular proliferation in cells with BIBR1532 induced telomere shortening. My goal was to investigate this genetic interaction in order to understand the biological processes implicated in this synthetic lethal relationship. After the knockout of the genes encoding both the enzymatic subunit of human telomerase (hTERT) and specific subunits of the human INO80 complex, I found that the proliferative capacity of NALM-6 cells was reduced. This result indicates the genetic interaction identified by CRISPR screening is in fact specific. In addition, after p53 stimulation with nutlin-3a treatment, expression levels of the p53 pathway component MDM2 were altered after the knockout of the genes encoding specific subunits of the human INO80 complex, NFRKB and UCHL5, individually. CDKN1A expression was also altered after nutlin-3a treatment and NFRKB knockout. Finally, the loss of telomerase (hTERT) alters the expression levels of the p53 pathway components CDKN1A, BAX and MDM2. In conclusion, the deletion of the genes encoding specific subunits of the INO80 complex, including NFRKB and UCHL5, is harmful to cells after hTERT knockout. The human INO80 complex may be involved in inhibiting the p53 pathway, in response to p53 activation by short telomeres or nutlin-3a treatment. Further investigation into this synthetic lethal relationship may shed light on new combinatorial therapeutics in cancer

    IndelFR: a database of indels in protein structures and their flanking regions

    Get PDF
    Insertion/deletion (indel) is one of the most common methods of protein sequence variation. Recent studies showed that indels could affect their flanking regions and they are important for protein function and evolution. Here, we describe the Indel Flanking Region Database (IndelFR, http://indel.bioinfo.sdu.edu.cn), which provides sequence and structure information about indels and their flanking regions in known protein domains. The indels were obtained through the pairwise alignment of homologous structures in SCOP superfamilies. The IndelFR database contains 2 925 017 indels with flanking regions extracted from 373 402 structural alignment pairs of 12 573 non-redundant domains from 1053 superfamilies. IndelFR provides access to information about indels and their flanking regions, including amino acid sequences, lengths, locations, secondary structure constitutions, hydrophilicity/hydrophobicity, domain information, 3D structures and so on. IndelFR has already been used for molecular evolution studies and may help to promote future functional studies of indels and their flanking regions

    Targeting Protein-Protein Interactions for Parasite Control

    Get PDF
    Finding new drug targets for pathogenic infections would be of great utility for humanity, as there is a large need to develop new drugs to fight infections due to the developing resistance and side effects of current treatments. Current drug targets for pathogen infections involve only a single protein. However, proteins rarely act in isolation, and the majority of biological processes occur via interactions with other proteins, so protein-protein interactions (PPIs) offer a realm of unexplored potential drug targets and are thought to be the next-generation of drug targets. Parasitic worms were chosen for this study because they have deleterious effects on human health, livestock, and plants, costing society billions of dollars annually and many sequenced genomes are available. In this study, we present a computational approach that utilizes whole genomes of 6 parasitic and 1 free-living worm species and 2 hosts. The species were placed in orthologous groups, then binned in species-specific ortholgous groups. Proteins that are essential and conserved among species that span a phyla are of greatest value, as they provide foundations for developing broad-control strategies. Two PPI databases were used to find PPIs within the species specific bins. PPIs with unique helminth proteins and helminth proteins with unique features relative to the host, such as indels, were prioritized as drug targets. The PPIs were scored based on RNAi phenotype and homology to the PDB (Protein DataBank). EST data for the various life stages, GO annotation, and druggability were also taken into consideration. Several PPIs emerged from this study as potential drug targets. A few interactions were supported by co-localization of expression in M. incognita (plant parasite) and B. malayi (H. sapiens parasite), which have extremely different modes of parasitism. As more genomes of pathogens are sequenced and PPI databases expanded, this methodology will become increasingly applicable

    Genome-Wide Influence of Indel Substitutions on Evolution of Bacteria of the PVC Superphylum, Revealed Using a Novel Computational Method

    Get PDF
    Whole-genome scans for positive Darwinian selection are widely used to detect evolution of genome novelty. Most approaches are based on evaluation of nonsynonymous to synonymous substitution rate ratio across evolutionary lineages. These methods are sensitive to saturation of synonymous sites and thus cannot be used to study evolution of distantly related organisms. In contrast, indels occur less frequently than amino acid replacements, accumulate more slowly, and can be employed to characterize evolution of diverged organisms. As indels are also subject to the forces of natural selection, they can generate functional changes through positive selection. Here, we present a new computational approach to detect selective constraints on indel substitutions at the whole-genome level for distantly related organisms. Our method is based on ancestral sequence reconstruction, takes into account the varying susceptibility of different types of secondary structure to indels, and according to simulation studies is conservative. We applied this newly developed framework to characterize the evolution of organisms of the Planctomycetes, Verrucomicrobia, Chlamydiae (PVC) bacterial superphylum. The superphylum contains organisms with unique cell biology, physiology, and diverse lifestyles. It includes bacteria with simple cell organization and more complex eukaryote-like compartmentalization. Lifestyles range from free-living organisms to obligate pathogens. In this study, we conduct a whole-genome level analysis of indel substitutions specific to evolutionary lineages of the PVC superphylum and found that indels evolved under positive selection on up to 12% of gene tree branches. We also analyzed possible functional consequences for several case studies of predicted indel events

    Gene Essentiality Analyzed by In Vivo Transposon Mutagenesis and Machine Learning in a Stable Haploid Isolate of Candida albicans

    Get PDF
    This work was supported by European Research Council Advanced Award 340087 (RAPLODAPT) to J.B., the Dahlem Centre of Plant Sciences (DCPS) of the Freie Universität Berlin (R.K.), Israel Science Foundation grant no. 715/18 (R.S.), the Wellcome Trust (grants 086827, 075470, 101873, and 200208) and the MRC Centre for Medical Mycology (N006364/1) (N.A.R.G.). Data availability.All of the code and required dependencies for analysis of the TnSeq data are available at https://github.com/berman-lab/transposon-pipeline. Library insertion sequences are available at NCBI under project PRJNA490565 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA490565). Datasets S1 through S9 are available at https://doi.org/10.6084/m9.figshare.c.4251182.Peer reviewedPublisher PD

    Genomic analysis of a household tuberculosis transmission cluster over a ten-year period

    Get PDF
    Trabalho Final de Mestrado Integrado, Ciências Farmacêuticas, 2020, Universidade de Lisboa, Faculdade de Farmácia.A tuberculose continua a ser um grave problema de saúde pública, apesar de ser uma doença evitável e tratável. A Organização Mundial de Saúde estima que 1,5 milhões de pessoas morrem de tuberculose ano após ano e acredita-se que cerca de um quarto da população mundial esteja infectada. A Letónia é um dos países europeus de alta prioridade no controlo da tuberculose e tem uma das taxas mais altas de tuberculose multirresistente do mundo, apesar de ter um programa de controlo bem estabelecido. Prevenção, detecção precoce e resposta rápida e eficaz aos surtos são elementos essenciais para controlar a propagação da tuberculose. Durante um período de dez anos, foram colhidas sete amostras de uma família de cinco pessoas da Letónia. Realizámos análises genómicas dos sete isolados com o intuito de desvendar a cadeia de transmissão, investigar a origem de dois casos recorrentes e revelar a possível existência de resistência aos medicamentos. Preparámos bibliotecas genómicas e sequenciámos os isolados com o Ion Proton. Para analisar as sequências genómicas, efectuámos uma análise bioinformática para a detecção de variantes em todo o genoma, que incluiu o alinhamento das reads contra o genoma de referência H37Rv, o local indel realignment, variant calling e a detecção de variantes estruturais. No total, foram encontrados 6 variantes estruturais, e detectámos 1029 SNPs de alta qualidade, dos quais 9 eram filogeneticamente informativos e 17 diferenciavam os isolados. Com base Spoligotyping in silico, os isolados pertenciam à sub-família T1. Ao comparar os nossos dados com os da lista de SNPs filogeneticamente específicos, as estirpes estudadas faziam parte da sub-linhagem Haarlem. Não foram encontrados polimorfismos robustos nos genes associados à resistência aos medicamentos, pelo que os isolados foram classificados como susceptíveis a todas os medicamentos anti-tuberculose. Dois doentes tiveram casos recorrentes que definimos como reinfecções. Gerámos hipóteses para estabelecer a cadeia de transmissão, apoiadas pelos limites definidos no número de SNPs e pelos dados das árvores filogenéticas de máxima verossimilhança. Embora tenhamos utilizado um método de alta resolução, os dados do WGS não foram suficientes para determinar sem ambiguidade a direcção da transmissão do surto. Os dados da epidemiologia molecular precisavam da epidemiologia clássica e da informação clínica para investigar eficazmente este surto.Tuberculosis remains a serious public health problem even though it is a preventable and treatable disease. World Health Organization estimates that 1.5 million people die from tuberculosis year after year and roughly one quarter of the world’s population is believed to be infected. Latvia is one of Europe’s high-priority countries for tuberculosis control and has one of the highest rates of multi-drug resistant tuberculosis in the world, despite having a well-established control programme. Prevention, early detection and quick and effective response to outbreaks are essential elements to control the spread of tuberculosis. Over a period of ten years, seven samples were collected from a family of five people from Latvia. We performed genomic analysis of the seven isolates in order to unravel the chain of transmission, investigate the origin of two recurrent cases and reveal the possible existence of drug resistance. We prepared genomic libraries and we sequenced the isolates using the Ion Proton platform. To analyze the genomic sequences, we caried out bioinformatic analysis using a pipeline for genome-wide variant detection, that included alignment of the reads against the reference H37Rv genome, local indel realignment, variant calling and structural variant detection. Overall, 6 structural variants were found, and we detected 1029 high-quality SNPs, from which 9 were phylogenetically informative and 17 differentiated the isolates. Based on in silico Spoligotyping the isolates belonged to the T1 sub-family and when using phylogenetic specific SNPs, the studied strains were determined to be part of the Haarlem sub-lineage. No robust polymorphisms in genes associated with drug resistance were found, therefore the isolates were classified susceptible to all anti-tuberculosis drugs. Two patients had recurrent cases that we defined as re-infections. We generated hypotheses in order to establish the routes of transmission, supported by the defined cut-offs in the number of SNPs and the data from the maximum likelihood phylogenetic trees. Although we used a high-resolution method, the WGS data was not enough to determine the direction of transmission within the cluster unambiguously. The molecular epidemiology data needed to be combined with classical epidemiology and clinical information to effectively investigate this household transmission cluster.Com o patrocínio da Latvian Biomedical Research And Study Centre

    Small variable segments constitute a major type of diversity of bacterial genomes at the species level.

    Get PDF
    International audienceBACKGROUND: Analysis of large scale diversity in bacterial genomes has mainly focused on elements such as pathogenicity islands, or more generally, genomic islands. These comprise numerous genes and confer important phenotypes, which are present or absent depending on strains. We report that despite this widely accepted notion, most diversity at the species level is composed of much smaller DNA segments, 20 to 500 bp in size, which we call microdiversity. RESULTS: We performed a systematic analysis of the variable segments detected by multiple whole genome alignments at the DNA level on three species for which the greatest number of genomes have been sequenced: Escherichia coli, Staphylococcus aureus, and Streptococcus pyogenes. Among the numerous sites of variability, 62 to 73% were loci of microdiversity, many of which were located within genes. They contribute to phenotypic variations, as 3 to 6% of all genes harbor microdiversity, and 1 to 9% of total genes are located downstream from a microdiversity locus. Microdiversity loci are particularly abundant in genes encoding membrane proteins. In-depth analysis of the E. coli alignments shows that most of the diversity does not correspond to known mobile or repeated elements, and it is likely that they were generated by illegitimate recombination. An intriguing class of microdiversity includes small blocks of highly diverged sequences, whose origin is discussed. CONCLUSIONS: This analysis uncovers the importance of this small-sized genome diversity, which we expect to be present in a wide range of bacteria, and possibly also in many eukaryotic genomes
    corecore