79 research outputs found
Refining intra-protein contact prediction by graph analysis
<p>Abstract</p> <p>Background</p> <p>Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence.</p> <p>Results</p> <p>We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred) was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%.</p> <p>Conclusion</p> <p>Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.</p
One-Block CYRCA: an automated procedure for identifying multiple-block alignments from single block queries
One-Block CYRCA is an automated procedure for identifying multiple-block alignments from single block queries (). It is based on the LAMA and CYRCA block-to-block alignment methods. The procedure identifies whether the query blocks can form new multiple-block alignments (block sets) with blocks from a database or join pre-existing database block sets. Using pre-computed LAMA block alignments and CYRCA sets from the Blocks database reduces the computation time. LAMA and CYRCA are highly sensitive and selective methods that can augment many other sequence analysis approaches
ChiPPI: a novel method for mapping chimeric protein–protein interactions uncovers selection principles of protein fusion events in cancer
Fusion proteins, comprising peptides deriving from the translation of two parental genes, are produced in cancer by chromosomal aberrations. The expressed fusion protein incorporates domains of both parental proteins. Using a methodology that treats discrete protein domains as binding sites for specific domains of interacting proteins, we have cataloged the protein interaction networks for 11 528 cancer fusions (ChiTaRS-3.1). Here, we present our novel method, chimeric protein–protein interactions (ChiPPI) that uses the domain–domain co-occurrence scores in order to identify preserved interactors of chimeric proteins. Mapping the influence of fusion proteins on cell metabolism and pathways reveals that ChiPPI networks often lose tumor suppressor proteins and gain oncoproteins. Furthermore, fusions often induce novel connections between non-interactors skewing interaction networks and signaling pathways. We compared fusion protein PPI networks in leukemia/lymphoma, sarcoma and solid tumors finding distinct enrichment patterns for each disease type. While certain pathways are enriched in all three diseases (Wnt, Notch and TGF β), there are distinct patterns for leukemia (EGFR signaling, DNA replication and CCKR signaling), for sarcoma (p53 pathway and CCKR signaling) and solid tumors (FGFR and EGFR signaling). Thus, the ChiPPI method represents a comprehensive tool for studying the anomaly of skewed cellular networks produced by fusion proteins in cancer.This work is funded by the Project Retos BFU2015-71241-R of the Spanish Ministry of Economy, Industry and Competitiveness (MEIC), co-funded by European Regional
Development Fund (ERDF) and by the Project
PT13/0001/0030, Instituto de Salud Carlos III (ISCIII), Strategic Action in Health, co-funded by European Regional Development Fund (ERDF). The work of MFM is supported by the Israel Cancer Association (ICA) fund, the
work of ST is supported by the VaTaT Postdoctoral Fellowship for excellent students [22351, 20027, 26912]. AV is supported by the Joint BSC-CRG-IRB Programme in Computational Biology. Funding for open access charge: ICA [e-cancer-diagnosis].Peer ReviewedPostprint (published version
Computational analysis of sense-antisense chimeric transcripts reveals their potential regulatory features and the landscape of expression in human cells
Many human genes are transcribed from both strands and produce sense-antisense gene pairs. Sense-antisense (SAS) chimeric transcripts are produced upon the coalescing of exons/introns from both sense and antisense transcripts of the same gene. SAS chimera was first reported in prostate cancer cells. Subsequently, numerous SAS chimeras have been reported in the ChiTaRS-2.1 database. However, the landscape of their expression in human cells and functional aspects are still unknown. We found that longer palindromic sequences are a unique feature of SAS chimeras. Structural analysis indicates that a long hairpin-like structure formed by many consecutive Watson-Crick base pairs appears because of these long palindromic sequences, which possibly play a similar role as double-stranded RNA (dsRNA), interfering with gene expression. RNA–RNA interaction analysis suggested that SAS chimeras could significantly interact with their parental mRNAs, indicating their potential regulatory features. Here, 267 SAS chimeras were mapped in RNA-seq data from 16 healthy human tissues, revealing their expression in normal cells. Evolutionary analysis suggested the positive selection favoring sense-antisense fusions that significantly impacted the evolution of their function and structure. Overall, our study provides detailed insight into the expression landscape of SAS chimeras in human cells and identifies potential regulatory features.Israeli Council for Higher Education [PBC Fellowship for Outstanding Post-Doctoral Fellows, 2019-2021 to S.M.]; Israel Innovation Authority [66824, 2019–2021 to M.F-M.]; RSF [18–14-00240 to Y.A.M. (in part)].Peer ReviewedPostprint (published version
Genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels
Most cell cycle-regulated genes adopt non-optimal codon usage, namely, their translation involves wobbly matching codons. Here, the authors show that tRNA expression is cyclic and that codon usage, therefore, can give rise to cell-cycle regulation of proteins
tRNA methylation resolves codon usage bias at the limit of cell viability.
Codon usage of each genome is closely correlated with the abundance of tRNA isoacceptors. How codon usage bias is resolved by tRNA post-transcriptional modifications is largely unknown. Here we demonstrate that the N1-methylation of guanosine at position 37 (m1G37) on the 3'-side of the anticodon, while not directly responsible for reading of codons, is a neutralizer that resolves differential decoding of proline codons. A genome-wide suppressor screen of a non-viable Escherichia coli strain, lacking m1G37, identifies proS suppressor mutations, indicating a coupling of methylation with tRNA prolyl-aminoacylation that sets the limit of cell viability. Using these suppressors, where prolyl-aminoacylation is decoupled from tRNA methylation, we show that m1G37 neutralizes differential translation of proline codons by the major isoacceptor. Lack of m1G37 inactivates this neutralization and exposes the need for a minor isoacceptor for cell viability. This work has medical implications for bacterial species that exclusively use the major isoacceptor for survival
Dynamic Proteomics: a database for dynamics and localizations of endogenous fluorescently-tagged proteins in living human cells
Recent advances allow tracking the levels and locations of a thousand proteins in individual living human cells over time using a library of annotated reporter cell clones (LARC). This library was created by Cohen et al. to study the proteome dynamics of a human lung carcinoma cell-line treated with an anti-cancer drug. Here, we report the Dynamic Proteomics database for the proteins studied by Cohen et al. Each cell-line clone in LARC has a protein tagged with yellow fluorescent protein, expressed from its endogenous chromosomal location, under its natural regulation. The Dynamic Proteomics interface facilitates searches for genes of interest, downloads of protein fluorescent movies and alignments of dynamics following drug addition. Each protein in the database is displayed with its annotation, cDNA sequence, fluorescent images and movies obtained by the time-lapse microscopy. The protein dynamics in the database represents a quantitative trace of the protein fluorescence levels in nucleus and cytoplasm produced by image analysis of movies over time. Furthermore, a sequence analysis provides a search and comparison of up to 50 input DNA sequences with all cDNAs in the library. The raw movies may be useful as a benchmark for developing image analysis tools for individual-cell dynamic-proteomics. The database is available at http://www.dynamicproteomics.net/
Novel domain combinations in proteins encoded by chimeric transcripts
Motivation: Chimeric RNA transcripts are generated by different mechanisms including pre-mRNA trans-splicing, chromosomal translocations and/or gene fusions. It was shown recently that at least some of chimeric transcripts can be translated into functional chimeric proteins
- …