622 research outputs found

    A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data

    Get PDF
    Motivation: A global map of transcription factor binding sites (TFBSs) is critical to understanding gene regulation and genome function. DNaseI digestion of chromatin coupled with massively parallel sequencing (digital genomic footprinting) enables the identification of protein-binding footprints with high resolution on a genome-wide scale. However, accurately inferring the locations of these footprints remains a challenging computational problem

    Strategies for Computational Protein Design with Application to the Development of a Biomolecular Tool-kit for Single Molecule Protein Sequencing

    Get PDF
    One of the key properties of proteins is that they exhibit remarkable affinities and specificities for small-molecule and peptide binding partners. To improve the success rate of rational, computational protein design and widen the scope of potential applications, it is useful to define generalized strategies and automated methodology to improve and/or alter the affinity and specificity of interactions. I have implemented several strategies for engineering protein-small molecule interactions including: improvement of substrate accessibility, stabilization of the bound state, truncation and surface engineering, and transplantation of residue level, native (or native-like) interactions. Each strategy was applied to one or more model protein, and the resulting changes in affinity, specificity, and activity were characterized experimentally. Finally, we designed a biomolecular tool-kit, consisting of 17 engineered proteins for amino acid side-chain recognition and a single enzyme to catalyze the Edman degradation. We profiled the affinity and specificity of each protein, and implemented a computational framework that demonstrates its utility for amino acid calling in a single molecule protein sequencing assay

    Discovery of directional and nondirectional pioneer transcription factors by modeling DNase profile magnitude and shape

    Get PDF
    We describe protein interaction quantitation (PIQ), a computational method for modeling the magnitude and shape of genome-wide DNase I hypersensitivity profiles to identify transcription factor (TF) binding sites. Through the use of machine-learning techniques, PIQ identified binding sites for >700 TFs from one DNase I hypersensitivity analysis followed by sequencing (DNase-seq) experiment with accuracy comparable to that of chromatin immunoprecipitation followed by sequencing (ChIP-seq). We applied PIQ to analyze DNase-seq data from mouse embryonic stem cells differentiating into prepancreatic and intestinal endoderm. We identified 120 and experimentally validated eight 'pioneer' TF families that dynamically open chromatin. Four pioneer TF families only opened chromatin in one direction from their motifs. Furthermore, we identified 'settler' TFs whose genomic binding is principally governed by proximity to open chromatin. Our results support a model of hierarchical TF binding in which directional and nondirectional pioneer activity shapes the chromatin landscape for population by settler TFs.National Institutes of Health (U.S.) (Common Fund 5UL1DE019581)National Institutes of Health (U.S.) (Common Fund RL1DE019021)National Institutes of Health (U.S.) (Common Fund 5TL1EB008540)National Institutes of Health (U.S.) (Grant 1U01HG007037)National Institutes of Health (U.S.) (Grant 5P01NS055923

    DeFCoM: analysis and modeling of transcription factor binding sites using a motif-centric genomic footprinter

    Get PDF
    Identifying the locations of transcription factor binding sites is critical for understanding how gene transcription is regulated across different cell types and conditions. Chromatin accessibility experiments such as DNaseI sequencing (DNase-seq) and Assay for Transposase Accessible Chromatin sequencing (ATAC-seq) produce genome-wide data that include distinct "footprint" patterns at binding sites. Nearly all existing computational methods to detect footprints from these data assume that footprint signals are highly homogeneous across footprint sites. Additionally, a comprehensive and systematic comparison of footprinting methods for specifically identifying which motif sites for a specific factor are bound has not been performed. Using DNase-seq data from the ENCODE project, we show that a large degree of previously uncharacterized site-to-site variability exists in footprint signal across motif sites for a transcription factor. To model this heterogeneity in the data, we introduce a novel, supervised learning footprinter called DeFCoM (Detecting Footprints Containing Motifs). We compare DeFCoM to nine existing methods using evaluation sets from four human cell-lines and eighteen transcription factors and show that DeFCoM outperforms current methods in determining bound and unbound motif sites. We also analyze the impact of several biological and technical factors on the quality of footprint predictions to highlight important considerations when conducting footprint analyses and assessing the performance of footprint prediction methods. Lastly, we show that DeFCoM can detect footprints using ATAC-seq data with similar accuracy as when using DNase-seq data. Python code available at https://bitbucket.org/bryancquach/defcom CONTACT: [email protected] or [email protected] SUPPLEMENTARY INFORMATION: Supplementary information available at Bioinformatics online

    Filogenetska analiza i molekularna karakterizacija virusa humane imunodeficijencije u Srbiji

    Get PDF
    Human immunodeficiency virus (HIV) is a retrovirus, the causative agent of Acquired immunodeficiency syndrome (AIDS). Since the beginning of the epidemic over 35 years ago, more than 78 million people have been infected so far and over 30 million have died. The high genetic variability and rapid evolution of HIV have been critical to its persistence and spread throughout the world. HIV-1 and HIV-2 comprise two distinct types of HIV. HIV-1 has diversified extensively into numerous genetic forms, including four groups (M, N, O, P), of which group M is causing the pandemic of HIV infection and AIDS. Group M viruses are further classified in multiple phylogenetically distinct subtypes (A-D, F, G, H, J and K), sub-subtypes (A1, A2, F1 and F2) and numerous recombinant forms. The global distribution of HIV-1 is complex and dynamic with regional epidemics representing only a subset of the global diversity. Molecular phylogenetic analysis, a method of reconstructing evolutionary relationships between nucleotide sequences, is one of the strategies for studying viral diversity and transmission dynamics. It is estimated that around half of HIV infected people are undiagnosed, making identification of transmission networks important for targeted public health intervention programs...Virus humane imunodeficijencije (HIV) je retrovirus koji uzrokuje sindrom stečene imunodeficijencije. Od početka epidemije pre 35 godina, ovim virusom je inficirano više od 78 miliona ljudi a preko 30 miliona je umrlo. Visoka genetička varijabilnost i brza evolucija HIV-a su ključni uzroci opstanka i globalnog širenjaepidemije. HIV je filogenetski klasifikovan u dva tipa: HIV-1 i HIV-2. Visoki diverzitet HIV-1 ogleda u postojanju četiri grupe (M, N, O, P) od kojih su virusi grupe M uzročnici globalne HIV-1 pandemije. Grupa M virusa je podeljena u više filogenetski različitih podtipova (A-D, F-H, J i K), pod-podtipove (A1, A2, F1 i F2) i cirkulišuće rekombinantne forme. Distribucija podtipova u svetu je složena i dinamična sa regionalnim HIV-1 epidemijama unutar globalnog diverziteta. Molekularna filogenetska analiza, metod za rekonstrukciju evolutivnih odnosa između nukleotidnih sekvenci, je tehnika za proučavanje varijabilnosti virusa i dinamike transmisije unutar regionalnih populacija. Procenjuje se da kod blizu polovine inficiranih osoba HIV infekcija nije dijagnostikovana, zbog čega je identifikacija puteva transmisije izuzetno značajna u cilju javno zdravstvenog nadzora. U ovom istraživanju primenjene su savremene filogenetske metode u analizi HIV-1 sekvenci izolata iz Srbije u cilju karakterizacije molekularne epidemiologije i dinamike transmisije, što je ključno za bolje razumevanje karakteristika aktuelne HIV-1 epidemije u Srbiji..
    corecore