Search CORE

43 research outputs found

Design, Implementation and Maintenance of a Model Organism Database for Arabidopsis thaliana

Author: Garcia-Hernandez Margarita
Huala Eva
Miller Neil
Rhee Seung Y.
Weems Danforth
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2004
Field of study

The Arabidopsis Information Resource (TAIR) is a web-based community database for the model plant Arabidopsis thaliana. It provides an integrated view of genes, sequences, proteins, germplasms, clones, metabolic pathways, gene expression, ecotypes, polymorphisms, publications, maps and community information. TAIR is developed and maintained by collaboration between software developers and biologists. Biologists provide specification and use cases for the system, acquire, analyse and curate data, interact with users and test the software. Software developers design, implement and test the database and software. In this review, we briefly describe how TAIR was built and is being maintained

Crossref

Directory of Open Access Journals

PubMed Central

Text mining in the biocuration workflow: applications for literature curation at WormBase, dictyBase and TAIR

Author: Basu Siddhartha
Berardini Tanya Z.
Chan Juancarlos
Chisholm Rex
Cooper Laurel
Dodson Robert
Fey Petra
Huala Eva
Li Donghui
Li Yuling
Muller Hans-Michael
Sternberg Paul W.
Van Auken Kimberly
Publication venue: 'Oxford University Press (OUP)'
Publication date: 17/11/2012
Field of study

WormBase, dictyBase and The Arabidopsis Information Resource (TAIR) are model organism databases containing information about Caenorhabditis elegans and other nematodes, the social amoeba Dictyostelium discoideum and related Dictyostelids and the flowering plant Arabidopsis thaliana, respectively. Each database curates multiple data types from the primary research literature. In this article, we describe the curation workflow at WormBase, with particular emphasis on our use of text-mining tools (BioCreative 2012, Workshop Track II). We then describe the application of a specific component of that workflow, Textpresso for Cellular Component Curation (CCC), to Gene Ontology (GO) curation at dictyBase and TAIR (BioCreative 2012, Workshop Track III). We find that, with organism-specific modifications, Textpresso can be used by dictyBase and TAIR to annotate gene productions to GO's Cellular Component (CC) ontology

Caltech Authors

Text mining for the biocuration workflow

Author: Arighi Cecilia
Burns Gully A. P. C
Chatr-Aryamontri Andrew
Cohen K. Bretonnel
Dowell Karen G.
Hirschman Lynette
Huala Eva
Krallinger Martin
Lourenço Anália
Nash Robert
Valencia Alfonso
Veuthey Anne-Lise
Wiegers Thomas
Winter Andrew G.
Wu Cathy H.
Publication venue: Oxford University Press
Publication date: 01/01/2012
Field of study

Molecular biology has become heavily dependent on biological knowledge encoded in expert curated biological databases. As the volume of biological literature increases, biocurators need help in keeping up with the literature; (semi-) automated aids for biocuration would seem to be an ideal application for natural language processing and text mining. However, to date, there have been few documented successes for improving biocuration throughput using text mining. Our initial investigations took place for the workshop on ‘Text Mining for the BioCuration Workflow’ at the third International Biocuration Conference (Berlin, 2009). We interviewed biocurators to obtain workflows from eight biological databases. This initial study revealed high-level commonalities, including (i) selection of documents for curation; (ii) indexing of documents with biologically relevant entities (e.g. genes); and (iii) detailed curation of specific relations (e.g. interactions); however, the detailed workflows also showed many variabilities. Following the workshop, we conducted a survey of biocurators. The survey identified biocurator priorities, including the handling of full text indexed with biological entities and support for the identification and prioritization of documents for curation. It also indicated that two-thirds of the biocuration teams had experimented with text mining and almost half were using text mining at that time. Analysis of our interviews and survey provide a set of requirements for the integration of text mining into the biocuration workflow. These can guide the identification of common needs across curated databases and encourage joint experimentation involving biocurators, text mining developers and the larger biomedical research community

Universidade do Minho: RepositoriUM

Crossref

PubMed Central

Finding Our Way through Phenotypes

Author: Anzaldo Salvatore S.
Ashburner Michael
Balhoff James P.
Blackburn David C.
Blake Judith A.
Burleigh J. Gordon
Chanet Bruno
Cooper Laurel D.
Courtot Mélanie
Csösz Sándor
Cui Hong
Deans Andrew R.
Huala Eva
Lewis Suzanna E.
Others
Smith Barry
Publication venue
Publication date: 01/01/2015
Field of study

Despite a large and multifaceted effort to understand the vast landscape of phenotypic data, their current form inhibits productive data analysis. The lack of a community-wide, consensus-based, human- and machine-interpretable language for describing phenotypes and their genomic and environmental contexts is perhaps the most pressing scientific bottleneck to integration across many key fields in biology, including genomics, systems biology, development, medicine, evolution, ecology, and systematics. Here we survey the current phenomics landscape, including data resources and handling, and the progress that has been made to accurately capture relevant data descriptions for phenotypes. We present an example of the kind of integration across domains that computable phenotypes would enable, and we call upon the broader biology community, publishers, and relevant funding agencies to support efforts to surmount today's data barriers and facilitate analytical reproducibility

PhilPapers

An ontology approach to comparative phenomics in plants

Author: Cannon Ethalinda K. S.
Cannon Steven B.
Cooper Laurel
Gardiner Jack
Gkoutos Georgios V.
Harper Lisa
He Mingze
Hoehndorf Robert
Huala Eva
Jaiswal Pankaj
Kalberer Scott R.
Lawrence Carolyn J.
Lloyd John P.
Meinke David
Menda Naama
Moore Laura
Nelson Rex T.
Oellrich Anika
Pujar Anuradha
Walls Ramona L.
Publication venue
Publication date: 01/01/2015
Field of study

BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

Crossref

Aberystwyth Research Portal

University of Birmingham Research Portal

Springer - Publisher Connector

PubMed Central

The University of Arizona

Recommended from our members

Taking the Next Step: Building an Arabidopsis Information Portal

Author: Bastow Ruth
Beynon Jim
Brendel Volker
Dooley Rion
Friesner Joanna
Grotewold Erich
Huala Eva
International Arabidopsis Informatics Consortium
Loraine Ann
Meyers Blake
Pires J. Chris
Provart Nicholas
Stanzione Dan
Town Chris
Ware Doreen
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date
Field of study

The Arabidopsis Information Portal (AIP), a resource expected to provide access to all community data and combine outputs into a single user-friendly interface, has emerged from community discussions over the last 23 months. These discussions began during two closely linked workshops in early 2010 that established the International Arabidopsis Informatics Consortium (IAIC). The design of the AIP will provide core functionality while remaining flexible to encourage multiple contributors and constant innovation. An IAIC-hosted Design Workshop in December 2011 proposed a structure for the AIP to provide a framework for the minimal components of a functional community portal while retaining flexibility to rapidly extend the resource to other species. We now invite broader participation in the AIP development process so that the resource can be implemented in a timely manner.This is the publisher’s final pdf. The published article is copyrighted by the American Society of Plant Biologists and can be found at: http://www.plantcell.org/

ScholarsArchive@OSU

Arabidopsis bioinformatics resources: the current state, challenges, and priorities for the future

Author: Assmann Sarah
Axtell Michael
Berardini Tanya
Chen Sixue
Doherty Colleen
Friesner Joanna
Gehan Malia
Gregory Brian
Huala Eva
Jaiswal Pankaj
Keith Slotkin R.
Larson Stephen
Li Song
Loraine Ann
May Sean
Megraw Molly
Meyers Blake
Michael Todd
Pires J.
Provart Nicholas
Topp Christopher
Town Chris
Walley Justin
Wurtele Eve
Publication venue: 'Wiley'
Publication date: 04/01/2019
Field of study

Effective research, education, and outreach efforts by the Arabidopsis thaliana community, as well as other scientific communities that depend on Arabidopsis resources, depend vitally on easily available and publicly-shared resources. These resources include reference genome sequence data and an ever-increasing number of diverse data sets and data types. TAIR (The Arabidopsis Information Resource) and Araport (originally named the Arabidopsis Information Portal) are community informatics resources that provide tools, data, and applications to the more than 30,000 researchers worldwide that use in their work either Arabidopsis as a primary system of study or data derived from Arabidopsis. Four years after Araport’s establishment, the IAIC held another workshop to evaluate the current status of Arabidopsis Informatics and chart a course for future research and development. The workshop focused on several challenges, including the need for reliable and current annotation, community-defined common standards for data and metadata, and accessible and user-friendly repositories / tools / methods for data integration and visualization. Solutions envisioned included (1) a centralized annotation authority to coalesce annotation from new groups, establish a consistent naming scheme, distribute this format regularly and frequently, and encourage and enforce its adoption. (2) Standards for data and metadata formats, which are essential, but challenging when comparing across diverse genotypes and in areas with less-established standards (e.g. phenomics, metabolomics). Community-established guidelines need to be developed. (3) A searchable, central repository for analysis and visualization tools. Improved versioning and user access would make tools more accessible. Workshop participants proposed a “one-stop shop” website, an Arabidopsis “Super-Portal” to link tools, data resources, programmatic standards, and best practice descriptions for each data type. This must have community buy-in and participation in its establishment and development to encourage adoption

Repository@Nottingham

Recommended from our members

The Plant Ontology as a Tool for Comparative Plant Anatomy and Genomic Analyses

Author: Arnaud Elizabeth
Athreya Balaji
Berardini Tanya Z.
Cooper Laurel
Elser Justin
Gandolfo Maria A.
Hiss Manuel
Huala Eva
Jaiswal Pankaj
Lang Daniel
Li Donghui
Menda Naama
Mungall Christopher J.
Preece Justin
Rensing Stefan
Reski Ralf
Schaeffer Mary
Shrestha Rosemary
Smith Barry
Stevenson Dennis W.
Walls Ramona L.
Yamazaki Yukiko
Publication venue: Oxford University Press on behalf of Japanese Society of Plant Physiologists
Publication date
Field of study

The Plant Ontology (PO;http://www.plantontology.org/" is a publicly available, collaborative effort to develop and maintain a controlled, structured vocabulary ('ontology') of terms to describe plant anatomy, morphology and the stages of plant development. The goals of the PO are to link (annotate) gene expression and phenotype data to plant structures and stages of plant development, using the data model adopted by the Gene Ontology. From its original design covering only rice, maize and Arabidopsis, the scope of the PO has been expanded to include all green plants. The PO was the first multispecies anatomy ontology developed for the annotation of genes and phenotypes. Also, to our knowledge, it was one of the first biological ontologies that provides translations (via synonyms) in non-English languages such as Japanese and Spanish. As of Release #18 (July 2012), there are about 2.2 million annotations linking PO terms to > 110,000 unique data objects representing genes or gene models, proteins, RNAs, germplasm and quantitative trait loci (QTLs) from 22 plant species. In this paper, we focus on the plant anatomical entity branch of the PO, describing the organizing principles, resources available to users and examples of how the PO is integrated into other plant genomics databases and web portals. We also provide two examples of comparative analyses, demonstrating how the ontology structure and PO-annotated data can be used to discover the patterns of expression of the LEAFY (LFY) and terpene synthase (TPS) gene homologs.Keywords: Plant anatomy, Terpene synthase, Bioinformatics, Comparative genomics, Genome annotation, Ontolog

ScholarsArchive@OSU

Recommended from our members

An ontology approach to comparative phenomics in plants

Author: Cannon Ethalinda K. S.
Cannon Steven B.
Cooper Laurel
Gardiner Jack
Gkoutos Georgios V.
Harper Lisa
He Mingze
Hoehndorf Robert
Huala Eva
Jaiswal Pankaj
Kalberer Scott R.
Lawrence Carolyn J.
Lloyd John P.
Meinke David
Menda Naama
Moore Laura
Nelson Rex T.
Oellrich Anika
Pujar Anuradha
Walls Ramona L.
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

ScholarsArchive@OSU

BioCreative III interactive task: an overview

The BioCreative challenge evaluation is a community-wide effort for evaluating text mining and information extraction systems applied to the biological domain. The biocurator community, as an active user of biomedical literature, provides a diverse and engaged end user group for text mining tools. Earlier BioCreative challenges involved many text mining teams in developing basic capabilities relevant to biological curation, but they did not address the issues of system usage, insertion into the workflow and adoption by curators. Thus in BioCreative III (BC-III), the InterActive Task (IAT) was introduced to address the utility and usability of text mining tools for real-life biocuration tasks. To support the aims of the IAT in BC-III, involvement of both developers and end users was solicited, and the development of a user interface to address the tasks interactively was requested

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

Springer

Springer - Publisher Connector

PubMed Central

ZORA

ART

NORA - Norwegian Open Research Archives