12 research outputs found

    IWGSC Sequence Repository: Moving towards tools to facilitate data integration for the reference sequence of wheat

    Get PDF
    URGI is a genomics and bioinformatics research unit at INRA (French National institute for Agricultural Research), dedicated to plants and crop parasites. We develop and maintain a genomic and genetic Information System called GnpIS that manages multiple types of wheat data. Under the umbrella of the IWGSC (International Wheat Genome Sequencing Consortium), we have set up a Sequence Repository on the Wheat@URGI website to store, browse and BLAST the data being generated by the wheat genome project: http://wheat-urgi.versailles.inra.fr/Seq-Repository. The repository holds the wheat physical maps, the chromosome survey sequence data for the individual chromosomes of breadwheat, draft sequences for diploid and tetraploid wheats and provides browsable access to the BAC-based reference sequence for chromosome 3B, the first of the chromosomes to be completed by the consortium. I will highlight the new features and data available in the Sequence Repository (e.g., new BLAST functionalities) and, in particular, present what we have done to address needs and concerns raised during the IWGSC S&P workshop last year. In addition, I will open the discussion about the future needs for tools to facilitate the integration of data to produce the reference sequence

    Development of a knowledge graph framework to ease and empower translational approaches in plant research: a use-case on grain legumes

    No full text
    While the continuing decline in genotyping and sequencing costs has largely benefited plant research, some key species for meeting the challenges of agriculture remain mostly understudied. As a result, heterogeneous datasets for di erent traits are available for a significant number of these species. As gene structures and functions are to some extent conserved through evolution, comparative genomics can be used to transfer available knowledge from one species to another. However, such a translational research approach is complex due to the multiplicity of data sources and the non-harmonized description of the data. Here, we provide two pipelines, referred to as structural and functional pipelines, to create a framework for a NoSQL graph-database (Neo j) to integrate and query heterogeneous data from multiple species. We call this framework Orthology-driven knowledge base framework for translational research (Ortho_KB). The structural pipeline builds bridges across species based on orthology. The functional pipeline integrates biological information, including QTL, and RNA-sequencing datasets, and uses the backbone from the structural pipeline to connect orthologs in the database. Queries can be written using the Neo j Cypher language and can, for instance, lead to identify genes controlling a common trait across species. To explore the possibilities o ered by such a framework, we populated Ortho_KB to obtain OrthoLegKB, an instance dedicated to legumes. The proposed model was evaluated by studying the conservation of a flowering-promoting gene. Through a series of queries, we have demonstrated that our knowledge graph base provides an intuitive and powerful platform to support research and development programmes

    Data integration in the agronomic domain : national and international data discovery system

    No full text
    National audienceCurrent research in Agronomy has produced a vast amount of genomic, genetic and phenomic data. To deal with the Volume, Variety and Velocity of those data, it is necessary to first refine to candidate datasets through data discovery then to integrate them through semantic web technologies. Data discovery is an approach that allows to easily search for datasets based on keywords and metadata. The plant bioinformatic node of the Institut Français de Bioinformatique (IFB) gives access to several public information systems hosting domain specific data. It is composed of five bioinformatics platforms : the South Green platform, the LIPM platform, the Roscoff platform ABiMS, the platform for Arthopods for Agroecosystems Arthropods and the URGI platform. The later one plays a key role in several national an international projects like the Whea Initiative. Those platforms integrate several plant genomic, genetic and phenomic data, which they need to expose in data discovery and integration systems. The distributed data discovery system need an ETL (Extraction, Transformation and Loading) based integration pipeline implemented on each platform. This ETL can either be developed from scratch or be based on existing technologies such as KarmaWeb, Talend or Open Refine. The pipeline is being developed at the URGI, and will be deployed on all IFB plant nodes. The data discovery system is based on SolR (implemented in the Transplant portal http://www.transplantdb.eu) which uses the Lucene search framework at its core for full-text indexing. Here, we will present the data discovery system architecture and the ETL solutions evaluation and comparison. Work funded by IFB investment for the future infrastructure project, IFB_Plant node

    GnpIS-Asso : A Generic Database for Managing and Exploiting Plant Genetic Association Studies Results Using High Throughput Genotyping and Phenotyping Data

    No full text
    We will present a new functionality developed in GnpIS information system to manage association studies data (GWAS). A query form is available on http://urgi.versailles.inra.fr/gnpis portal and directly at this URL https://urgi.versailles.inra.fr/association/association/viewer.do#form. The tool allows plant scientists or breeders to get associations values between traits and markers obtained in several association studies. Several filters are available to refine the query, such as the species, trait (s), or marker (s) or panel of germplasms. The tool gives also the possibility in a second round of analysis, to add a new set of filters, i) on statistical values, or ii) on treatment done in the study, or iii) on the location and iv) year of the phenotyping trial or according to the v) statistical model chosen to do the analysis. It allows to view graphically the results with dedicated plots (QQPlot, Manhattan Plot), generated dynamically and to extract data in files to continue the analysis with external tools. After selecting the best markersassociated to trait of interest, the tool allows also to automatically jump on the genome to find where this marker is located on chromosomes and to identify in a very simpler way which genes or other markers or features of interest are near it. This tool is already now used for dealing GWAS studies (association, genotyping and phenotyping data) for 2 species: tomato and maize, data that are already referenced in two 2 publications and will also manage wheat data in 2015

    Reconciling the evolutionary origin of bread wheat (Triticum aestivum)

    No full text
    In PressThe origin of bread wheat (Triticum aestivum; AABBDD) has been a subject of controversy and of intense debate in the scientific community over the last few decades. In 2015, three articles published in New Phytologist discussed the origin of hexaploid bread wheat (AABBDD) from the diploid progenitors Triticum urartu (AA), a relative of Aegilops speltoides (BB) and Triticum tauschii (DD). Access to new genomic resources since 2013 has offered the opportunity to gain novel insights into the paleohistory of modern bread wheat, allowing characterization of its origin from its diploid progenitors at unprecedented resolution. We propose a reconciled evolutionary scenario for the modern bread wheat genome based on the complementary investigation of transposable element and mutation dynamics between diploid, tetraploid and hexaploid wheat. In this scenario, the structural asymmetry observed between the A, B and D subgenomes in hexaploid bread wheat derives from the cumulative effect of diploid progenitor divergence, the hybrid origin of the D subgenome, and subgenome partitioning following the polyploidization events
    corecore