42 research outputs found

    Prediction of protein-protein interactions from primary structure using a Random Forest classifier

    Get PDF
    Međusobne interakcije između proteina temelj su niza bioloških procesa, od regulacije metaboličkih puteva, specifičnosti imunoloških reakcija, replikacije DNK do sinteze proteina. Nagli razvoj visokoprotočnih metoda doveo je do velikog povećanja produkcije bioloških sekvenci, stvorivši potrebu za razvojem metoda i alata za njihovu funkcijsku analizu, te predviđanje fenotipskih svojstava, kako na molekularnoj, tako i na razini cijelog organizma. U ovom radu smo agregirali strukturalne podatke iz postojećih baza podataka, čime smo dobili skup proteinskih kvaternih struktura visoke kvalitete koji nam je omogućio primjenu metoda strojnog učenja za predviđanje interakcija između proteina. Iskoristili smo „Random Forest“ algoritam za predviđanje interakcijskih aminokiselina iz primarnih struktura proteina. Pokazali smo da, iako „Random Forest“ alogritam ima mogućnost klasifikacije visokodimenzionalnih podataka s izuzetnom točnošću, trenutno znanje o strukturalnim faktorima koji utječu na specifičnost interakcija između proteina nije na razini koja bi omogućila predviđanje interakcija na razlučivosti pojedinih aminokiselina koristeći isključivo sekvence proteina.The interaction between proteins is fundamental to a broad spectrum of biological functions, including regulation of metabolic pathways, immunological recognition, DNA replication, progression through the cell cycle, and protein synthesis. Due to the growing disparity between the amount of sequenced genomic content and functional data, there exist a pressing need for tools and methods that will enable prediction of phenotypic traits, on the molecular or organism level, based on the sequence alone. In this work we have constructed a high quality dataset of protein structures that has enabled us to use the Random Forest non-linear classificator to develop a method for prediction of interacting residues from the protein primary structure. Our results have shown that, although the Random Forest algorithm has a unique capability of accurately classifying highly dimensional data, we still have an incomplete knowledge of structural factors that determine the specificity of protein-protein interactions, thus putting an upper limit the on the usefulness of the machine learning approach in predicting protein interactions on the level of single amino-acids

    Prediction of protein-protein interactions from primary structure using a Random Forest classifier

    Get PDF
    Međusobne interakcije između proteina temelj su niza bioloških procesa, od regulacije metaboličkih puteva, specifičnosti imunoloških reakcija, replikacije DNK do sinteze proteina. Nagli razvoj visokoprotočnih metoda doveo je do velikog povećanja produkcije bioloških sekvenci, stvorivši potrebu za razvojem metoda i alata za njihovu funkcijsku analizu, te predviđanje fenotipskih svojstava, kako na molekularnoj, tako i na razini cijelog organizma. U ovom radu smo agregirali strukturalne podatke iz postojećih baza podataka, čime smo dobili skup proteinskih kvaternih struktura visoke kvalitete koji nam je omogućio primjenu metoda strojnog učenja za predviđanje interakcija između proteina. Iskoristili smo „Random Forest“ algoritam za predviđanje interakcijskih aminokiselina iz primarnih struktura proteina. Pokazali smo da, iako „Random Forest“ alogritam ima mogućnost klasifikacije visokodimenzionalnih podataka s izuzetnom točnošću, trenutno znanje o strukturalnim faktorima koji utječu na specifičnost interakcija između proteina nije na razini koja bi omogućila predviđanje interakcija na razlučivosti pojedinih aminokiselina koristeći isključivo sekvence proteina.The interaction between proteins is fundamental to a broad spectrum of biological functions, including regulation of metabolic pathways, immunological recognition, DNA replication, progression through the cell cycle, and protein synthesis. Due to the growing disparity between the amount of sequenced genomic content and functional data, there exist a pressing need for tools and methods that will enable prediction of phenotypic traits, on the molecular or organism level, based on the sequence alone. In this work we have constructed a high quality dataset of protein structures that has enabled us to use the Random Forest non-linear classificator to develop a method for prediction of interacting residues from the protein primary structure. Our results have shown that, although the Random Forest algorithm has a unique capability of accurately classifying highly dimensional data, we still have an incomplete knowledge of structural factors that determine the specificity of protein-protein interactions, thus putting an upper limit the on the usefulness of the machine learning approach in predicting protein interactions on the level of single amino-acids

    Prediction of protein-protein interactions from primary structure using a Random Forest classifier

    Get PDF
    Međusobne interakcije između proteina temelj su niza bioloških procesa, od regulacije metaboličkih puteva, specifičnosti imunoloških reakcija, replikacije DNK do sinteze proteina. Nagli razvoj visokoprotočnih metoda doveo je do velikog povećanja produkcije bioloških sekvenci, stvorivši potrebu za razvojem metoda i alata za njihovu funkcijsku analizu, te predviđanje fenotipskih svojstava, kako na molekularnoj, tako i na razini cijelog organizma. U ovom radu smo agregirali strukturalne podatke iz postojećih baza podataka, čime smo dobili skup proteinskih kvaternih struktura visoke kvalitete koji nam je omogućio primjenu metoda strojnog učenja za predviđanje interakcija između proteina. Iskoristili smo „Random Forest“ algoritam za predviđanje interakcijskih aminokiselina iz primarnih struktura proteina. Pokazali smo da, iako „Random Forest“ alogritam ima mogućnost klasifikacije visokodimenzionalnih podataka s izuzetnom točnošću, trenutno znanje o strukturalnim faktorima koji utječu na specifičnost interakcija između proteina nije na razini koja bi omogućila predviđanje interakcija na razlučivosti pojedinih aminokiselina koristeći isključivo sekvence proteina.The interaction between proteins is fundamental to a broad spectrum of biological functions, including regulation of metabolic pathways, immunological recognition, DNA replication, progression through the cell cycle, and protein synthesis. Due to the growing disparity between the amount of sequenced genomic content and functional data, there exist a pressing need for tools and methods that will enable prediction of phenotypic traits, on the molecular or organism level, based on the sequence alone. In this work we have constructed a high quality dataset of protein structures that has enabled us to use the Random Forest non-linear classificator to develop a method for prediction of interacting residues from the protein primary structure. Our results have shown that, although the Random Forest algorithm has a unique capability of accurately classifying highly dimensional data, we still have an incomplete knowledge of structural factors that determine the specificity of protein-protein interactions, thus putting an upper limit the on the usefulness of the machine learning approach in predicting protein interactions on the level of single amino-acids

    genomation: a toolkit to summarize, annotate and visualize genomic intervals

    Get PDF
    Summary: Biological insights can be obtained through computational integration of genomics data sets consisting of diverse types of information. The integration is often hampered by a large variety of existing file formats, often containing similar information, and the necessity to use complicated tools to achieve the desired results. We have built an R package, genomation, to expedite the extraction of biological information from high throughput data. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations. Availability and implementation: The software is currently distributed under MIT artistic license and freely available at http://bioinformatics.mdc-berlin.de/genomation, and through the Bioconductor framework. Contact: [email protected], [email protected], [email protected], or [email protected]

    Archaeal aminoacyl-tRNA synthetases interact with the ribosome to recycle tRNAs

    Get PDF
    Aminoacyl-tRNA synthetases (aaRS) are essential enzymes catalyzing the formation of aminoacyl-tRNAs, the immediate precursors for encoded peptides in ribosomal protein synthesis. Previous studies have suggested a link between tRNA aminoacylation and high-molecular-weight cellular complexes such as the cytoskeleton or ribosomes. However, the structural basis of these interactions and potential mechanistic implications are not well understood. To biochemically characterize these interactions we have used a system of two interacting archaeal aaRSs: an atypical methanogenic-type seryl-tRNA synthetase and an archaeal ArgRS. More specifically, we have shown by thermophoresis and surface plasmon resonance that these two aaRSs bind to the large ribosomal subunit with micromolar affinities. We have identified the L7/L12 stalk and the proteins located near the stalk base as the main sites for aaRS binding. Finally, we have performed a bioinformatics analysis of synonymous codons in the Methanothermobacter thermautotrophicus genome that supports a mechanism in which the deacylated tRNAs may be recharged by aaRSs bound to the ribosome and reused at the next occurrence of a codon encoding the same amino acid. These results suggest a mechanism of tRNA recycling in which aaRSs associate with the L7/L12 stalk region to recapture the tRNAs released from the preceding ribosome in polysome

    PiGx: reproducible genomics analysis pipelines with GNU Guix

    Get PDF
    In bioinformatics, as well as other computationally intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations and for the wider dissemination of workflows. However, providing this type of reproducibility and traceability is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally comes in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA sequencing, chromatin immunoprecipitation sequencing, bisulfite-treated DNA sequencing, and single-cell resolution RNA sequencing. All pipelines process raw experimental data and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own datasets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx Document type: Articl

    The male germ cell gene regulator CTCFL is functionally different from CTCF and binds CTCF-like consensus sites in a nucleosome composition-dependent manner.

    Get PDF
    RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are.BACKGROUND: CTCF is a highly conserved and essential zinc finger protein expressed in virtually all cell types. In conjunction with cohesin, it organizes chromatin into loops, thereby regulating gene expression and epigenetic events. The function of CTCFL or BORIS, the testis-specific paralog of CTCF, is less clear. RESULTS: Using immunohistochemistry on testis sections and fluorescence-based microscopy on intact live seminiferous tubules, we show that CTCFL is only transiently present during spermatogenesis, prior to the onset of meiosis, when the protein co-localizes in nuclei with ubiquitously expressed CTCF. CTCFL distribution overlaps completely with that of Stra8, a retinoic acid-inducible protein essential for the propagation of meiosis. We find that absence of CTCFL in mice causes sub-fertility because of a partially penetrant testicular atrophy. CTCFL deficiency affects the expression of a number of testis-specific genes, including Gal3st1 and Prss50. Combined, these data indicate that CTCFL has a unique role in spermatogenesis. Genome-wide RNA expression studies in ES cells expressing a V5- and GFP-tagged form of CTCFL show that genes that are downregulated in CTCFL-deficient testis are upregulated in ES cells. These data indicate that CTCFL is a male germ cell gene regulator. Furthermore, genome-wide DNA-binding analysis shows that CTCFL binds a consensus sequence that is very similar to that of CTCF. However, only ~3,700 out of the ~5,700 CTCFL- and ~31,000 CTCF-binding sites overlap. CTCFL binds promoters with loosely assembled nucleosomes, whereas CTCF favors consensus sites surrounded by phased nucleosomes. Finally, an ES cell-based rescue assay shows that CTCFL is functionally different from CTCF. CONCLUSIONS: Our data suggest that nucleosome composition specifies the genome-wide binding of CTCFL and CTCF. We propose that the transient expression of CTCFL in spermatogonia and preleptotene spermatocytes serves to occupy a subset of promoters and maintain the expression of male germ cell genes

    PHF3 regulates neuronal gene expression through the Pol II CTD reader domain SPOC

    Get PDF
    The C-terminal domain (CTD) of the largest subunit of RNA polymerase II (Pol II) is a regulatory hub for transcription and RNA processing. Here, we identify PHD-finger protein 3 (PHF3) as a regulator of transcription and mRNA stability that docks onto Pol II CTD through its SPOC domain. We characterize SPOC as a CTD reader domain that preferentially binds two phosphorylated Serine-2 marks in adjacent CTD repeats. PHF3 drives liquid-liquid phase separation of phosphorylated Pol II, colocalizes with Pol II clusters and tracks with Pol II across the length of genes. PHF3 knock-out or SPOC deletion in human cells results in increased Pol II stalling, reduced elongation rate and an increase in mRNA stability, with marked derepression of neuronal genes. Key neuronal genes are aberrantly expressed in Phf3 knock-out mouse embryonic stem cells, resulting in impaired neuronal differentiation. Our data suggest that PHF3 acts as a prominent effector of neuronal gene regulation by bridging transcription with mRNA decay

    Widespread activation of antisense transcription of the host genome during herpes simplex virus 1 infection

    Get PDF
    Background: Herpesviruses can infect a wide range of animal species. Herpes simplex virus 1 (HSV-1) is one of the eight herpesviruses that can infect humans and is prevalent worldwide. Herpesviruses have evolved multiple ways to adapt the infected cells to their needs, but knowledge about these transcriptional and post-transcriptional modifications is sparse. Results: Here, we show that HSV-1 induces the expression of about 1000 antisense transcripts from the human host cell genome. A subset of these is also activated by the closely related varicella zoster virus. Antisense transcripts originate either at gene promoters or within the gene body, and they show different susceptibility to the inhibition of early and immediate early viral gene expression. Overexpression of the major viral transcription factor ICP4 is sufficient to turn on a subset of antisense transcripts. Histone marks around transcription start sites of HSV-1-induced and constitutively transcribed antisense transcripts are highly similar, indicating that the genetic loci are already poised to transcribe these novel RNAs. Furthermore, an antisense transcript overlapping with the BBC3 gene (also known as PUMA) transcriptionally silences this potent inducer of apoptosis in cis. Conclusions: We show for the first time that a virus induces widespread antisense transcription of the host cell genome. We provide evidence that HSV-1 uses this to downregulate a strong inducer of apoptosis. Our findings open new perspectives on global and specific alterations of host cell transcription by viruses

    Evolution and function of rodent specific MT transposons

    No full text
    Retrotranspozoni MT pripadaju klasi LTR retrotranspozona, prisutnih isključivo kod glodavaca. Nakon retrotranspozona SINE i LINE, MT elementi su treći najučestaliji ponavljajući elementi u genomima glodavaca. Za razliku od klasičnih LTR retrotranspozona sisavaca, sekvenca između LTR krajeva kod MT elemenata nije homologna retroviralnim GAG, POL i PRO proteinima, već sadržava niz regulatornih sekvenci. One MT elementima omogućavaju tkivno - specifično reprogramiranje transkriptoma a time i doprinos u regulaciji genske ekspresije. MT elementi dijele se u pet skupina, MTA – MTE. MTC element u intronu 6 gena Dicer 1, stvara oocitno specifičnu, skraćenu izoformu proteina – nazvanu DicerO. Takav skraćeni protein pokazuje veću procesivnost malih interferirajućih RNA molekula (siRNA) u usporedbi sa somatskom izoformom proteina, DicerS. Ciljano uklanjanje elementa MTC iz introna 6 gena Dicer 1 ima za posljedicu poremećaje u organizaciji diobenog vretena i sterilnost ženki miševa, iz čega proizlazi da je MTC element esencijalni dio genoma miša. U ovom radu opisao sam podrijetlo, širenje i funkcionalnu ulogu (funkciju) retrotranspozona MT u liniji glodavaca.MT retrotransposons are LTR retrotransposons found specifically in the rodent lineage. After SINE and LINE retrotransposons, MT elements are the third most abundant repetitive class in the rodent genomes. Unlike classical mammalian LTR retrotransposons, the internal sequence of MT elements lacks homology to the retroviral GAG, POL and PRO proteins; rather it contains sets of regulatory sequences. These regulatory sequences enable MT elements to reprogram the transcriptome in an oocyte-specific manner. MT elements are subdivided into five groups, MTA – MTE. An MTC element in the intron 6 of the Dicer 1 gene creates an oocyte specific, truncated Dicer isoform – DicerO. Such truncated isoform has higher processivity for small interfering RNAs, when compared to the somatic isoform, DicerS. Targeted excision of the MTC causes meiotic spindle deformation and female sterility, thereby illustrating its essential function of the retroelement for in mice. In this work I have described the origin and exaptation of MT elements in the rodent lineage
    corecore