3,914 research outputs found

    Consistent prediction of GO protein localization

    Get PDF
    The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC+, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC+ classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC+ classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC+ classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Arce, Debora Pamela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; ArgentinaFil: Bulacio, Pilar. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Tapia, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

    CLIP and complementary methods

    Get PDF
    RNA molecules start assembling into ribonucleoprotein (RNP) complexes during transcription. Dynamic RNP assembly, largely directed by cis-acting elements on the RNA, coordinates all processes in which the RNA is involved. To identify the sites bound by a specific RNA-binding protein on endogenous RNAs, cross-linking and immunoprecipitation (CLIP) and complementary, proximity-based methods have been developed. In this Primer, we discuss the main variants of these protein-centric methods and the strategies for their optimization and quality assessment, as well as RNA-centric methods that identify the protein partners of a specific RNA. We summarize the main challenges of computational CLIP data analysis, how to handle various sources of background and how to identify functionally relevant binding regions. We outline the various applications of CLIP and available databases for data sharing. We discuss the prospect of integrating data obtained by CLIP with complementary methods to gain a comprehensive view of RNP assembly and remodelling, unravel the spatial and temporal dynamics of RNPs in specific cell types and subcellular compartments and understand how defects in RNPs can lead to disease. Finally, we present open questions in the field and give directions for further development and applications

    Determination of protein localization and RNA kinetics in human cells

    Get PDF
    In dieser Dissertation haben wir das Verhalten menschlicher Zellen in Raum und Zeit untersucht. Hochwertige Datensätze subzellulärer Regionen in HEK293-Zellen wurden mit Hilfe der BirA* Proximity-Labelling-Aktivität erstellt, wobei die Lokalisierung auf zelluläre Regionen beschränkt wurde, die mit herkömmlichen Methoden nur schwer zu reinigen sind (d. h. die dem Zytosol zugewandten Seiten des ER, Mitochondrien und Plasma-membranen). Wir entwickelten daraufhin einen Ansatz zur Kartierung der Verteilung von Proteinen, die aktiv an RNA binden, und nannten ihn f-XRNAX. Wir stellten hintergrundkorrigierte Proteome für Zellkerne, Zytoplasma und Membranen von HEK293-Zellen her. Überraschenderweise wurden viele nicht-kanonische RBPs in der Membranfraktion identifiziert, und ihre Peptidprofile waren in Regionen mit hoher Dichte an intrinsisch ungeordneten Regionen angereichert, was auf eine möglicherweise schwache, durch diese nicht-strukturellen Motive vermittelte Interaktion mit RNA hinweist. Schließlich konnten wir die unterschiedliche Bindung desselben Proteins an RNA in verschiedenen HEK293-Kompartimenten nachweisen. Im zweiten Teil dieser Arbeit konzentrierten wir uns auf die Bestimmung und Quantifizierung von neu transkribierten RNAs auf Einzelzellebene. Die Kinetik der RNA-Transkription und -Degradation war bis vor kurzem auf Einzelzellebene nicht messbar. Daher haben wir einen neuen Ansatz (SLAM-Drop-seq genannt) entwickelt, indem wir die veröffentlichte SLAM-seq-Methode an Einzelzellen angepasst haben. Wir haben SLAM-Drop-seq verwendet, um die zeitabhängigen RNA-Kinetikraten der Transkription und des Umsatzes für Hunderte von oszillierenden Transkripten während des Zellzyklus von HEK293-Zellen zu schätzen. Wir fanden heraus, dass Gene ihre Expression mit unterschiedlichen Strategien regulieren und spezifische Modi zur Feinabstimmung ihrer kinetischen Raten entlang des Zellzyklus haben.In this PhD dissertation we investigated the behaviour of human cells through space and time. High quality datasets of subcellular regions in HEK293 cells were generated using BirA* proximity labelling activity and restricting its localization at cellular regions difficult to purified with traditional methods (i.e., the cytosol-facing sides of the endoplasmic reticulum, mitochondria, and plasma membranes). We then developed an approach to map the distribution of proteins actively binding to RNA, and named it f-XRNAX. We recovered background-corrected proteomes for nuclei, cytoplasm and membranes of HEK293 cells. Surprisingly, many non-canonical RBPs were identified in the membrane fraction, and their peptide profiles were enriched in regions with high density of intrinsically disordered regions, indicating a possibly weak interaction with RNA mediated by these non-structural motives. Lastly, we provided evidence of the differential binding to RNA of the same protein in different HEK293 compartments. In the second part of this thesis, we focused on the determination and quantification of newly transcribed RNAs at the single-cell level. The kinetics of RNA transcription, processing and degradation were until recently not measurable at the single-cell level. Thus, we have developed a novel approach (called SLAM-Drop-seq ) by adapting the published SLAM-seq method to single cells. We used SLAM Drop-seq to estimate time-dependent RNA kinetics rates of transcription and turnover for hundreds of oscillating transcripts during the cell cycle of HEK293 cells. We found that genes regulate their expression with different strategies and have specific modes to fine-tune their kinetic rates along the cell cycle

    Profiling ArabidopsisArogenate Dehydratases: Dimerization and Subcellular Localization Patterns

    Get PDF
    In Arabidopsis, a family of six arogenate dehydratases (ADTs) has been identified which catalyze the terminal step of phenylalanine biosynthesis. ADTs share considerable sequence similarity to bacterial prephenate dehydratases, which form homodimers. The protein-protein interaction profiles of Arabidopsis ADTs were characterized using Yeast-2-Hybrid and Bi-molecular Fluorescence Complementation approaches. Results show that ADT1, but not ADT2, is able to form homo- and heterodimers with all other ADTs in yeast. In contrast, all six ADTs form all possible homo- and heterodimer combinations in ptanta, where they display two different subcellular localization patterns. Most ADT dimers localize to the chloroplast in a stromule-like pattern, but ADT5 dimers also localize in a nuclear-like pattern. Large scale cDNA library screens also identified a number of other putative interactors, suggesting that ADTs may be part of a larger protein complex. This study is the first to characterize the protein-protein interaction profiles of plant ADTs

    The big and intricate dreams of little organelles: Embracing complexity in the study of membrane traffic

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/138421/1/tra12497_am.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/138421/2/tra12497-sup-0001-EditorialProcess.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/138421/3/tra12497.pd

    Deep Learning for Genomics: A Concise Overview

    Full text link
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.Comment: Invited chapter for Springer Book: Handbook of Deep Learning Application

    An FPT Approach for Predicting Protein Localization from Yeast Genomic Data

    Get PDF
    Accurately predicting the localization of proteins is of paramount importance in the quest to determine their respective functions within the cellular compartment. Because of the continuous and rapid progress in the fields of genomics and proteomics, more data are available now than ever before. Coincidentally, data mining methods been developed and refined in order to handle this experimental windfall, thus allowing the scientific community to quantitatively address long-standing questions such as that of protein localization. Here, we develop a frequent pattern tree (FPT) approach to generate a minimum set of rules (mFPT) for predicting protein localization. We acquire a series of rules according to the features of yeast genomic data. The mFPT prediction accuracy is benchmarked against other commonly used methods such as Bayesian networks and logistic regression under various statistical measures. Our results show that mFPT gave better performance than other approaches in predicting protein localization. Meanwhile, setting 0.65 as the minimum hit-rate, we obtained 138 proteins that mFPT predicted differently than the simple naive bayesian method (SNB). In our analysis of these 138 proteins, we present novel predictions for the location for 17 proteins, which currently do not have any defined localization. These predictions can serve as putative annotations and should provide preliminary clues for experimentalists. We also compared our predictions against the eukaryotic subcellular localization database and related predictions by others on protein localization. Our method is quite generalized and can thus be applied to discover the underlying rules for protein-protein interactions, genomic interactions, and structure-function relationships, as well as those of other fields of research

    Machine vision-assisted analysis of structure-localization relationships in a combinatorial library of prospective bioimaging probes

    Full text link
    With a combinatorial library of bioimaging probes, it is now possible to use machine vision to analyze the contribution of different building blocks of the molecules to their cell-associated visual signals. For this purpose, cell-permeant, fluorescent styryl molecules were synthesized by condensation of 168 aldehyde with 8 pyridinium/quinolinium building blocks. Images of cells incubated with fluorescent molecules were acquired with a high content screening instrument. Chemical and image feature analysis revealed how variation in one or the other building block of the styryl molecules led to variations in the molecules' visual signals. Across each pair of probes in the library, chemical similarity was significantly associated with spectral and total signal intensity similarity. However, chemical similarity was much less associated with similarity in subcellular probe fluorescence patterns. Quantitative analysis and visual inspection of pairs of images acquired from pairs of styryl isomers confirm that many closely-related probes exhibit different subcellular localization patterns. Therefore, idiosyncratic interactions between styryl molecules and specific cellular components greatly contribute to the subcellular distribution of the styryl probes' fluorescence signal. These results demonstrate how machine vision and cheminformatics can be combined to analyze the targeting properties of bioimaging probes, using large image data sets acquired with automated screening systems. © 2009 International Society for Advancement of CytometryPeer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63004/1/20713_ftp.pd

    RNA syntax and semantics: investigating the transcriptome complexity

    Get PDF
    The basic idea of this thesis is to reconstruct an heterogeneous network depicting lncRNA-protein interactions that would summarize what is currently known, allow the prediction of lacking features and thus give a complete mechanistic understanding of the functions of lncRNAs by the network topological analysis. Unfortunately, this approach raised problems related to different aspects. Firstly, even if recent studies show that a growing number of lncRNAs play critical roles in complex cellular processes and that they are implicated in a wide range of human diseases, the fraction of annotated lncRNAs is still small. Secondly, as of today, most databases are highly inhomogeneous in terms of the type of the provided information, and analytical and experimental approaches to investigate them have been hampered by the lack of comprehensive annotation. Thirdly, the standard bioinformatics solution to fill the gaps due to lacking information is based on machine learning techniques that usually lead to myriad problems related to the preprocessing of data and the input dataset format, both aspects that oftentimes are conducted by trial and error. Finally, a challenging problem that arises in this domain is the data visualization. A common strategy used to overcome the problem is constructing interaction networks, whose analytical but also visual inspection can offer important biological insights, however one primary drawback with this approach is to develop an efficient and scalable algorithm to produce easily interpretable layouts for sparse graphs when the number of nodes is very large. The thesis deals with a multidisciplinary approach to unravel the complexity of lncRNAs regulatory networks and investigate their functions. The objective is to demonstrate the feasibility of using machine learning techniques as well as network analysis to find hidden patterns in the data and to predict new features

    Phenotypic Variation and Bistable Switching in Bacteria

    Get PDF
    Microbial research generally focuses on clonal populations. However, bacterial cells with identical genotypes frequently display different phenotypes under identical conditions. This microbial cell individuality is receiving increasing attention in the literature because of its impact on cellular differentiation, survival under selective conditions, and the interaction of pathogens with their hosts. It is becoming clear that stochasticity in gene expression in conjunction with the architecture of the gene network that underlies the cellular processes can generate phenotypic variation. An important regulatory mechanism is the so-called positive feedback, in which a system reinforces its own response, for instance by stimulating the production of an activator. Bistability is an interesting and relevant phenomenon, in which two distinct subpopulations of cells showing discrete levels of gene expression coexist in a single culture. In this chapter, we address techniques and approaches used to establish phenotypic variation, and relate three well-characterized examples of bistability to the molecular mechanisms that govern these processes, with a focus on positive feedback.
    corecore