221 research outputs found

    Data driven ontology evaluation

    Get PDF
    The evaluation of ontologies is vital for the growth of the Semantic Web. We consider a number of problems in evaluating a knowledge artifact like an ontology. We propose in this paper that one approach to ontology evaluation should be corpus or data driven. A corpus is the most accessible form of knowledge and its use allows a measure to be derived of the 'fit' between an ontology and a domain of knowledge. We consider a number of methods for measuring this 'fit' and propose a measure to evaluate structural fit, and a probabilistic approach to identifying the best ontology

    Ontologies, taxonomies, thesauri:learning from texts

    Get PDF
    The use of ontologies as representations of knowledge is widespread but their construction, until recently, has been entirely manual. We argue in this paper for the use of text corpora and automated natural language processing methods for the construction of ontologies. We delineate the challenges and present criteria for the selection of appropriate methods. We distinguish three ma jor steps in ontology building: associating terms, constructing hierarchies and labelling relations. A number of methods are presented for these purposes but we conclude that the issue of data-sparsity still is a ma jor challenge. We argue for the use of resources external tot he domain specific corpus

    Knowledge acquisition for knowledge management: position paper

    Get PDF
    With this paper, we propose a set of techniques to largely automate the process of KA, by using technologies based on Information Extraction (IE) , Information Retrieval and Natural Language Processing. We aim to reduce all the impeding factors mention above and thereby contribute to the wider utility of the knowledge management tools. In particular we intend to reduce the introspection of knowledge engineers or the extended elicitations of knowledge from experts by extensive textual analysis using a variety of methods and tools, as texts are largely available and in them - we believe - lies most of an organization's memory

    CGHub: Kick-starting the Worldwide Genome Web

    Get PDF
    The University of California, Santa Cruz (UCSC) is under contract with the National Cancer Institute (NCI) to construct and operate the Cancer Genomics Hub (CGHub), a nation-scale library and user portal for cancer genomics data. Ā This contract covers growth of the library to 5 Petabytes. The NCI programs that feed into the library currently produce about 20 terabytes of data each month. We discuss the receiver-driven file transfer mechanism Annai GeneTorrent (GT) for use with the library. Annai GT uses multiple TCP streams from multiple computers at the library site to parallelize genome downloads. Ā We review our performance experience with the new transfer mechanism and also explain additions to the transfer protocol to support the security required in handling patient cancer genomics data

    The ontology: Chimaera or Pegasus

    Get PDF
    In the context of the needs of the Semantic Web and Knowledge Management, we consider what the requirements are of ontologies. The ontology as an artifact of knowledge representation is in danger of becoming a Chimera. We present a series of facts concerning the foundations on which automated ontology construction must build. We discuss a number of different functions that an ontology seeks to fulfill, and also a wish list of ideal functions. Our objective is to stimulate discussion as to the real requirements of ontology engineering and take the view that only a selective and restricted set of requirements will enable the beast to fly

    ENABLING EFFICIENT AND STREAMLINED ACCESS TO LARGE SCALE GENOMIC EXPRESSION AND SPLICING DATA

    Get PDF
    As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. We focus primarily on nearly 20,000 RNA-sequencing studies in human and mouse, consisting of more than 750,000 sequencing runs, and the coverage summaries derived from their alignment to their respective gnomes. In addition to the summarized RNA-seq derived data itself we present tools (Snaptron, Monorail, Megadepth, and recount3) that can be used by downstream researchers both to process their own data into comparable summaries as well as access and query our processed, publicly available data. Additionally we present a related study of errors in the splicing of long read transcriptomic alignments, including comparison to the existing splicing summaries from short reads already described (LongTron)

    An incremental tri-partite approach to ontology learning

    Get PDF
    In this paper we present a new approach to ontology learning. Its basis lies in a dynamic and iterative view of knowledge acquisition for ontologies. The Abraxas approach is founded on three resources, a set of texts, a set of learning patterns and a set of ontological triples, each of which must remain in equilibrium. As events occur which disturb this equilibrium various actions are triggered to re-establish a balance between the resources. Such events include acquisition of a further text from external resources such as the Web or the addition of ontological triples to the ontology. We develop the concept of a knowledge gap between the coverage of an ontology and the corpus of texts as a measure triggering actions. We present an overview of the algorithm and its functionalities

    ENABLING EFFICIENT AND STREAMLINED ACCESS TO LARGE SCALE GENOMIC EXPRESSION AND SPLICING DATA

    Get PDF
    As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. We focus primarily on nearly 20,000 RNA-sequencing studies in human and mouse, consisting of more than 750,000 sequencing runs, and the coverage summaries derived from their alignment to their respective gnomes. In addition to the summarized RNA-seq derived data itself we present tools (Snaptron, Monorail, Megadepth, and recount3) that can be used by downstream researchers both to process their own data into comparable summaries as well as access and query our processed, publicly available data. Additionally we present a related study of errors in the splicing of long read transcriptomic alignments, including comparison to the existing splicing summaries from short reads already described (LongTron)

    Knowledge Representation with Ontologies: The Present and Future

    No full text
    Recently, we have seen an explosion of interest in ontologies as artifacts to represent human knowledge and as critical components in knowledge management, the semantic Web, business-to-business applications, and several other application areas. Various research communities commonly assume that ontologies are the appropriate modeling structure for representing knowledge. However, little discussion has occurred regarding the actual range of knowledge an ontology can successfully represent

    Image annotation with Photocopain

    Get PDF
    Photo annotation is a resource-intensive task, yet is increasingly essential as image archives and personal photo collections grow in size. There is an inherent conflict in the process of describing and archiving personal experiences, because casual users are generally unwilling to expend large amounts of effort on creating the annotations which are required to organise their collections so that they can make best use of them. This paper describes the Photocopain system, a semi-automatic image annotation system which combines information about the context in which a photograph was captured with information from other readily available sources in order to generate outline annotations for that photograph that the user may further extend or amend
    • ā€¦
    corecore