1,602 research outputs found
Enhanced Integrated Scoring for Cleaning Dirty Texts
An increasing number of approaches for ontology engineering from text are
gearing towards the use of online sources such as company intranet and the
World Wide Web. Despite such rise, not much work can be found in aspects of
preprocessing and cleaning dirty texts from online sources. This paper presents
an enhancement of an Integrated Scoring for Spelling error correction,
Abbreviation expansion and Case restoration (ISSAC). ISSAC is implemented as
part of a text preprocessing phase in an ontology engineering system. New
evaluations performed on the enhanced ISSAC using 700 chat records reveal an
improved accuracy of 98% as compared to 96.5% and 71% based on the use of only
basic ISSAC and of Aspell, respectively.Comment: More information is available at
http://explorer.csse.uwa.edu.au/reference
Multi-source and ontology-based retrieval engine for maize mutant phenotypes
Model Organism Databases, including the various plant genome databases, collect and enable access to massive amounts of heterogeneous information, including sequence data, gene product information, images of mutant phenotypes, etc, as well as textual descriptions of many of these entities. While a variety of basic browsing and search capabilities are available to allow researchers to query and peruse the names and attributes of phenotypic data, next-generation search mechanisms that allow querying and ranking of text descriptions are much less common. In addition, the plant community needs an innovative way to leverage the existing links in these databases to search groups of text descriptions simultaneously. Furthermore, though much time and effort have been afforded to the development of plant-related ontologies, the knowledge embedded in these ontologies remains largely unused in available plant search mechanisms. Addressing these issues, we have developed a unique search engine for mutant phenotypes from MaizeGDB. This advanced search mechanism integrates various text description sources in MaizeGDB to aid a user in retrieving desired mutant phenotype information. Currently, descriptions of mutant phenotypes, loci and gene products are utilized collectively for each search, though expansion of the search mechanism to include other sources is straightforward. The retrieval engine, to our knowledge, is the first engine to exploit the content and structure of available domain ontologies, currently the Plant and Gene Ontologies, to expand and enrich retrieval results in major plant genomic databases
An architecture for the autonomic curation of crowdsourced knowledge
Human knowledge curators are intrinsically better than their digital counterparts at providing relevant answers to queries. That is mainly due to the fact that an experienced biological brain will account for relevant community expertise as well as exploit the underlying connections between knowledge pieces when offering suggestions pertinent to a specific question, whereas most automated database managers will not. We address this problem by proposing an architecture for the autonomic curation of crowdsourced knowledge, that is underpinned by semantic technologies. The architecture is instantiated in the career data domain, thus yielding Aviator, a collaborative platform capable of producing complete, intuitive and relevant answers to career related queries, in a time effective manner. In addition to providing numeric and use case based evidence to support these research claims, this extended work also contains a detailed architectural analysis of Aviator to outline its suitability for automatically curating knowledge to a high standard of quality
Annotation-based storage and retrieval of models and simulation descriptions in computational biology
This work aimed at enhancing reuse of computational biology models by identifying and formalizing relevant meta-information. One type of meta-information investigated in this thesis is experiment-related meta-information attached to a model, which is necessary to accurately recreate simulations. The main results are: a detailed concept for model annotation, a proposed format for the encoding of simulation experiment setups, a storage solution for standardized model representations and the development of a retrieval concept.Die vorliegende Arbeit widmete sich der besseren Wiederverwendung biologischer Simulationsmodelle. Ziele waren die Identifikation und Formalisierung relevanter Modell-Meta-Informationen, sowie die Entwicklung geeigneter Modellspeicherungs- und Modellretrieval-Konzepte.
Wichtigste Ergebnisse der Arbeit sind ein detailliertes Modellannotationskonzept, ein Formatvorschlag für standardisierte Kodierung von Simulationsexperimenten in XML, eine Speicherlösung für Modellrepräsentationen sowie ein Retrieval-Konzept
Recommended from our members
HOLMES: A Hybrid Ontology-Learning Materials Engineering System
Designing and discovering novel materials is challenging problem in many domains such as fuel additives, composites, pharmaceuticals, and so on. At the core of all this are models that capture how the different domain-specific data, information, and knowledge regarding the structures and properties of the materials are related to one another. This dissertation explores the difficult task of developing an artificial intelligence-based knowledge modeling environment, called Hybrid Ontology-Learning Materials Engineering System (HOLMES) that can assist humans in populating a materials science and engineering ontology through automatic information extraction from journal article abstracts. While what we propose may be adapted for a generic materials engineering application, our focus in this thesis is on the needs of the pharmaceutical industry. We develop the Columbia Ontology for Pharmaceutical Engineering (COPE), which is a modification of the Purdue Ontology for Pharmaceutical Engineering. COPE serves as the basis for HOLMES.
The HOLMES framework starts with journal articles that are in the Portable Document Format (PDF) and ends with the assignment of the entries in the journal articles into ontologies. While this might seem to be a simple task of information extraction, to fully extract the information such that the ontology is filled as completely and correctly as possible is not easy when considering a fully developed ontology.
In the development of the information extraction tasks, we note that there are new problems that have not arisen in previous information extraction work in the literature. The first is the necessity to extract auxiliary information in the form of concepts such as actions, ideas, problem specifications, properties, etc. The second problem is in the existence of multiple labels for a single token due to the existence of the aforementioned concepts. These two problems are the focus of this dissertation.
In this work, the HOLMES framework is presented as a whole, describing our successful progress as well as unsolved problems, which might help future research on this topic. The ontology is then presented to help in the identification of the relevant information that needs to be retrieved. The annotations are next developed to create the data sets necessary for the machine learning algorithms to perform. Then, the current level of information extraction for these concepts is explored and expanded. This is done through the introduction of entity feature sets that are based on previously extracted entities from the entity recognition task. And finally, the new task of handling multiple labels for tagging a single entity is also explored by the use of multiple-label algorithms used primarily in image processing
- …