885 research outputs found

    Applying semantic web technologies to knowledge sharing in aerospace engineering

    Get PDF
    This paper details an integrated methodology to optimise Knowledge reuse and sharing, illustrated with a use case in the aeronautics domain. It uses Ontologies as a central modelling strategy for the Capture of Knowledge from legacy docu-ments via automated means, or directly in systems interfacing with Knowledge workers, via user-defined, web-based forms. The domain ontologies used for Knowledge Capture also guide the retrieval of the Knowledge extracted from the data using a Semantic Search System that provides support for multiple modalities during search. This approach has been applied and evaluated successfully within the aerospace domain, and is currently being extended for use in other domains on an increasingly large scale

    Tailored semantic annotation for semantic search

    Get PDF
    This paper presents a novel method for semantic annotation and search of a target corpus using several knowledge resources (KRs). This method relies on a formal statistical framework in which KR concepts and corpus documents are homogeneously represented using statistical language models. Under this framework, we can perform all the necessary operations for an efficient and effective semantic annotation of the corpus. Firstly, we propose a coarse tailoring of the KRs w.r.t the target corpus with the main goal of reducing the ambiguity of the annotations and their computational overhead. Then, we propose the generation of concept profiles, which allow measuring the semantic overlap of the KRs as well as performing a finer tailoring of them. Finally, we propose how to semantically represent documents and queries in terms of the KRs concepts and the statistical framework to perform semantic search. Experiments have been carried out with a corpus about web resources which includes several Life Sciences catalogs and Wikipedia pages related to web resources in general (e.g., databases, tools, services, etc.). Results demonstrate that the proposed method is more effective and efficient than state-of-the-art methods relying on either context-free annotation or keyword-based search.We thank anonymous reviewers for their very useful comments and suggestions. The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves

    From Texts to Prerequisites. Identifying and Annotating Propaedeutic Relations in Educational Textual Resources

    Get PDF
    openPrerequisite Relations (PRs) are dependency relations established between two distinct concepts expressing which piece(s) of information a student has to learn first in order to understand a certain target concept. Such relations are one of the most fundamental in Education, playing a crucial role not only for what concerns new knowledge acquisition, but also in the novel applications of Artificial Intelligence to distant and e-learning. Indeed, resources annotated with such information could be used to develop automatic systems able to acquire and organize the knowledge embodied in educational resources, possibly fostering educational applications personalized, e.g., on students' needs and prior knowledge. The present thesis discusses the issues and challenges of identifying PRs in educational textual materials with the purpose of building a shared understanding of the relation among the research community. To this aim, we present a methodology for dealing with prerequisite relations as established in educational textual resources which aims at providing a systematic approach for uncovering PRs in textual materials, both when manually annotating and automatically extracting the PRs. The fundamental principles of our methodology guided the development of a novel framework for PR identification which comprises three components, each tackling a different task: (i) an annotation protocol (PREAP), reporting the set of guidelines and recommendations for building PR-annotated resources; (ii) an annotation tool (PRET), supporting the creation of manually annotated datasets reflecting the principles of PREAP; (iii) an automatic PR learning method based on machine learning (PREL). The main novelty of our methodology and framework lies in the fact that we propose to uncover PRs from textual resources relying solely on the content of the instructional material: differently from other works, rather than creating de-contextualised PRs, we acknowledge the presence of a PR between two concepts only if emerging from the way they are presented in the text. By doing so, we anchor relations to the text while modelling the knowledge structure entailed in the resource. As an original contribution of this work, we explore whether linguistic complexity of the text influences the task of manual identification of PRs. To this aim, we investigate the interplay between text and content in educational texts through a crowd-sourcing experiment on concept sequencing. Our methodology values the content of educational materials as it incorporates the evidence acquired from such investigation which suggests that PR recognition is highly influenced by the way in which concepts are introduced in the resource and by the complexity of the texts. The thesis reports a case study dealing with every component of the PR framework which produced a novel manually-labelled PR-annotated dataset.openXXXIII CICLO - DIGITAL HUMANITIES. TECNOLOGIE DIGITALI, ARTI, LINGUE, CULTURE E COMUNICAZIONE - Lingue, culture e tecnologie digitaliAlzetta, Chiar
    • …
    corecore