7 research outputs found

    Vitis OneGenE: a causality-based approach to generate gene networks in Vitis vinifera sheds light on the laccase and dirigent gene families

    Get PDF
    9openInternationalBothThe abundance of transcriptomic data and the development of causal inference methods have paved the way for gene network analyses in grapevine. Vitis OneGenE is a transcriptomic data mining tool that finds direct correlations between genes, thus producing association networks. As a proof of concept, the stilbene synthase gene regulatory network obtained with OneGenE has been compared with published co-expression analysis and experimental data, including cistrome data for MYB stilbenoid regulators. As a case study, the two secondary metabolism pathways of stilbenoids and lignin synthesis were explored. Several isoforms of laccase, peroxidase, and dirigent protein genes, putatively involved in the final oxidative oligomerization steps, were identified as specifically belonging to either one of these pathways. Manual curation of the predicted sequences exploiting the last available genome assembly, and the integration of phylogenetic and OneGenE analyses, identified a group of laccases exclusively present in grapevine and related to stilbenoids. Here we show how network analysis by OneGenE can accelerate knowledge discovery by suggesting new candidates for functional characterization and application in breeding programs.openPilati, Stefania; Malacarne, Giulia; Navarro-Payá, David; Tomè, Gabriele; Riscica, Laura; Cavecchia, Valter; Matus, José Tomás; Moser, Claudio; Blanzieri, EnricoPilati, S.; Malacarne, G.; Navarro-Payá, D.; Tomè, G.; Riscica, L.; Cavecchia, V.; Matus, J.T.; Moser, C.; Blanzieri, E

    A COMPASS for VESPUCCI: a FAIR way to explore the grapevine transcriptomic landscape

    Get PDF
    7openInternationalSuccessfully integrating transcriptomic experiments is a challenging task with the ultimate goal of analyzing gene expression data in the broader context of all available measurements, all from a single point of access. In its second major release VESPUCCI, the integrated database of gene expression data for grapevine, has been updated to be FAIR-compliant, employing standards and created with open-source technologies. It includes all public grapevine gene expression experiments from both microarray and RNA-seq platforms. Transcriptomic data can be accessed in multiple ways through the newly developed COMPASS GraphQL interface, while the expression values are normalized using different methodologies to flexibly satisfy different analysis requirements. Sample annotations are manually curated and use standard formats and ontologies. The updated version of VESPUCCI provides easy querying and analyzing of integrated grapevine gene expression (meta)data and can be seamlessly embedded in any analysis workflow or tools. VESPUCCI is freely accessible and offers several ways of interaction, depending on the specific goals and purposes and/or user expertise; an overview can be found at https://vespucci.readthedocs.io/.openMoretto M.; Sonego P.; Pilati S.; Matus J.T.; Costantini L.; Malacarne G.; Engelen K.Moretto, M.; Sonego, P.; Pilati, S.; Matus, J.T.; Costantini, L.; Malacarne, G.; Engelen, K

    NES2RA: a tool for grapevine transcriptomic data mining

    Get PDF
    The development of “omics” technologies to study gene expression has revolutionized our perspective from the single gene to the gene network level. However, the complexity of the system biology approach requires appropriate mathematical, computational and statistical tools to analyze data and extract information. Grapevine transcriptomic data are currently collected in two databases: the ViTis Co-expression DataBase (VTCdb, Wong et al., 2013) dedicated to data obtained with microarray technology and the Vitis Expression Studies Platform Using COLOMBOS Compendia Instances (VESPUCCI, Moretto et al., 2016) including data from both microarrays and RNAseq experiments. Here, we present the application of the algorithm of Network Expansion by Subsetting and Ranking Aggregation (NES2RA, Asnicar et al., 2016) to expand Local Gene Netowrks (LGN) in grapevine using transcriptomic data stored in the VESPUCCI compendium. NES2RA is based on the PC-algorithm (Spirtes and Glymour, 1991), a gaussian graphical model (GGM) that finds causal relationships from observational data. It is based on a systematic test for conditional independence to retain significant relations between pairs of genes. It starts from a fully connected network and removes interactions between genes, whenever it finds a set of genes that supports that interaction (i.e., separation set). Due to the computational power requirement of NES2RA algorithm, it has been running as part of the gene@home project, a distributed computation project which relies on thousands of volunteers’ computers by means of the TN-Grid, an infrastructure based on BOINC system (Asnicar et al., 2015). NES2RA has been used to expand four LGNs related to the grapevine response to climate changes (Malacarne et al., 2018). The obtained expansion gene lists have been analyzed by means of statistical tools - such as gene annotation and functional categories enrichment to assess the functional coherence between LGNs and expansion gene lists and promoter analysis to test co-regulation among output genes - and compared with experimental results, when available, and literature. These analyses produced promising results in support of the meaningfulness of this approach. Moreover, the LGNs expansions can be visualized as networks, thus providing the biologist with a prompt information about the significant relationships retained by NES2RA, highlighting positive or negative correlations within gene pairs. We are currently developing NES2RA algorithm to make it available as a web tool to be used in real time and exploring new applications

    NES2RA: network expansion by stratified variable subsetting and ranking aggregation

    Get PDF
    Gene network expansion is a task of the foremost importance in computational biology. Gene network expansion aims at finding new genes to expand a given known gene network. To this end, we developed gene@home, a BOINC-based project that finds candidate genes that expand known local gene networks using NESRA. In this paper, we present NES2RA, a novel approach that extends and improves NESRA by modeling, using a probability vector, the confidence of the presence of the genes belonging to the local gene network. NES2RA adopts intensive variable-subsetting strategies, enabled by the computational power provided by gene@home volunteers. In particular, we use the skeleton procedure of the PC-algorithm to discover candidate causal relationships within each subset of variables. Finally, we use state-of-the-art aggregators to combine the results into a single ranked candidate genes list. The resulting ranking guides the discovery of unknown relations between genes and a priori known local gene networks. Our experimental results show that NES2RA outperforms the PC-algorithm and its order-independent PC-stable version, ARACNE, and our previous approach, NESRA. In this paper we extensively discuss the computational aspects of the NES2RA approach and we also present and validate expansions performed on the model plant Arabidopsis thaliana and the model bacteria Escherichia coli

    Multi-target Prediction Methods for Bioinformatics: Approaches for Protein Function Prediction and Candidate Discovery for Gene Regulatory Network Expansion

    Get PDF
    Biology is experiencing a paradigm shift since the advent of next generation sequencing technologies. The retrieved data largely exceeds the capability of biologists to investigate all possibilities in the laboratories, hence predictive tools able to guide the research are now a fundamental component of their workflow. Given the central role of proteins in living organisms, in this thesis we focus on their functional analysis and the intrinsic multi-target nature of this task. To this end, we propose different predictive methods, specifically developed to exploit side knowledge among target variables and examples. As a first contribution we face the task of protein-function prediction and more in general of hierarchical-multilabel classification (HMC). We present Ocelot a predictive pipeline for genome-wide protein characterization. It relies on a statistical-relational-learning tool, where the knowledge on the input examples is coded by the combination of multiple kernel matrices, while relations among target variables are expressed as logical constraints. Both, the mislabeling of examples and the infringement of logical rules are penalized by the loss function, but Ocelot do not forces hierarchical consistency. To overcome this limitation, we present AWX, a neural-networks output-layer that guarantees the formal consistency of HMC predictions. The second contribution is VSC, a binary classifier designed to incorporate the concepts of subsampling and locality in the definition of features to be used as the input of a perceptron. A locality-based confidence measure is used to weight the contribution of maximum-margin hyper-planes built by subsampling pairs of examples of opposite class. The rationale is that local methods can be exploited when a multi-target task is expected, but not reflected in the annotation space. The third and last contribution are NES2RA and OneGenE, two approaches for finding candidates to expand known gene regulatory networks. NES2RA adopts variable-subsetting strategies, enabled by volunteer distributed computing, and the PC algorithm to discover candidate causal relationships within each subset of variables. Then, ranking aggregators combine the partial results into a single ranked candidate genes list. OneGenE overcomes the main limitation of NES2RA, i.e. latency, by precomputing candidate expansion lists for each transcript of an organism that are then aggregated on-demand

    Finding functional interactions among grapevine genes using transcriptomic data and NES2RA algorithm

    Get PDF
    More than two hundred transcriptomic studies are currently publicly available for grapevine. They have been collected, normalized and annotated into the Vitis Expression Studies Platform Using COLOMBOS Compendia Instances (VESPUCCI updated version, Moretto et al., in preparation). Mining all this information to extract novel findings, such as gene networks that control agronomically relevant traits, remains a challenge. In particular, climatic changes and the shift to more sustainable practices affect diseases and yield behaviors in grape production, thus urging the scientific community to propose new strategies to cope with them. System biology approaches can represent an opportunity to boost our knowledge of the grapevine physiology. Gene networks are a convenient way of representing as graphs the functional interactions (edges) among the genes (nodes) of an organism. Gene networks can be co-expression networks, based on Pearson’s correlation, or association and regulatory networks, in which direct and possibly causal relationships are represented. We would like to present the tool NES2RA (Network Expansion by Sub-Setting and Ranking Aggregation) - based on the PC-algorithm (Spirtes and Glymour, 1991)- that finds causal relationships from observational data. It performs a systematic test for conditional independence to retain significant relations between pairs of genes. It starts from a fully connected network and removes interactions between genes, whenever it finds a set of genes that supports that interaction. Due to the computational power requirements of the NES2RA algorithm, it has been implemented on a distributed computation platform, as part of the gene@home project, which relies on thousands of volunteers’ computers by means of TN-Grid, an infrastructure based on the BOINC system (Asnicar et al., 2015). In order to accomplish to the FAIR (Findable, Accessible, Interoperable and Reusable) requirements for the information produced by NES2RA, the expansion gene list of each single gene has been pre-computed and annotated and can be downloaded from our website (http://ibdm.disi.unitn.it/, in preparation). The user can consider the lists can as such or analyze them further for example by aggregating them to reconstruct a gene network. A case study example concerning the regulatory network and biosynthetic pathway of the grapevine leaf cuticle will be presented to show how this information can help the biologist in gene function discovery, candidate gene prioritization and planning functional studies in grapevine

    Discovering causal relationships in grapevine expression data to expand gene networks: a case study: four networks related to climate change

    Get PDF
    In recent years the scientific community has been heavily engaged in studying the grapevine response to climate change. Final goal is the identification of key genetic traits to be used in grapevine breeding and the setting of agronomic practices to improve climatic resilience. The increasing availability of transcriptomic studies, describing gene expression in many tissues and developmental, or treatment conditions, have allowed the implementation of gene expression compendia, which enclose a huge amount of information. The mining of transcriptomic data represents an effective approach to expand a known local gene network (LGN) by finding new related genes. We recently published a pipeline based on the iterative application of the PC-algorithm, named NES2RA, to expand gene networks in Escherichia coli and Arabidopsis thaliana. Here, we propose the application of this method to the grapevine transcriptomic compendium Vespucci, in order to expand four LGNs related to the grapevine response to climate change. Two networks are related to the secondary metabolic pathways for anthocyanin and stilbenoid synthesis, involved in the response to solar radiation, whereas the other two are signaling networks, related to the hormones abscisic acid and ethylene, possibly involved in the regulation of cell water balance and cuticle transpiration. The expansion networks produced by NES2RA algorithm have been evaluated by comparison with experimental data and biological knowledge on the identified genes showing fairly good consistency of the results. In addition, the algorithm was effective in retaining only the most significant interactions among the genes providing a useful framework for experimental validation. The application of the NES2RA to Vitis vinifera expression data by means of the BOINC-based implementation is available upon request ([email protected]
    corecore