1,406 research outputs found

    Knowledge Sharing from Domain-specific Documents

    Get PDF
    Recently, collaborative discussions based on the participant generated documents, e.g., customer questionnaires, aviation reports and medical records, are required in various fields such as marketing, transport facilities and medical treatment, in order to share useful knowledge which is crucial to maintain various kind of securities, e.g., avoiding air-traffic accidents and malpractice. We introduce several techniques in natural language processing for extracting information from such text data and verify the validity of such techniques by using aviation documents as an example. We automatically and statistically extract from the documents related words that have not only taxonomical relations like synonyms but also thematic (non-taxonomical) relations including causal and entailment relations. These related words are useful for sharing information among participants. Moreover, we acquire domain-specific terms and phrases from the documents in order to pick up and share important topics from such reports

    Extraction of Word Set for Increasing Human-Computer Interaction in Information Retrieval

    Get PDF
    We present a mechanism that provides word sets which can make human-computer interaction more active in the course of information retrieval, with natural language processing technology and a mathematic measure for calculating degree of inclusion. We show what type of words should be added to the current query, i.e. keywords which previously had been input, in order to make human-computer interaction more creative. We try to extract related word sets with taxonomical and non-taxonomical relations from documents by employing case-marking particles derived from syntactic analysis. Then, we verify which kind of related words is more useful as an additional word for retrieval support and makes human-computer interaction more fruitful

    Ontology-based Why-Question Analysis Using Lexico-Syntactic Patterns

    Get PDF
    This research focuses on developing a method to analyze why-questions.  Some previous researches on the why-question analysis usually used the morphological and the syntactical approach without considering the expected answer types. Moreover, they rarely involved domain ontology to capture the semantic or conceptualization of the content. Consequently, some semantic mismatches occurred and then resulting not appropriate answers. The proposed method considers the expected answer types and involves domain ontology. It adapts the simple, the bag-of-words like model, by using semantic entities (i.e., concepts/entities and relations) instead of words to represent a query. The proposed method expands the question by adding the additional semantic entities got by executing the constructed SPARQL query of the why-question over the domain ontology. The major contribution of this research is in developing an ontology-based why-question analysis method by considering the expected answer types. Some experiments have been conducted to evaluate each phase of the proposed method. The results show good performance for all performance measures used (i.e., precision, recall, undergeneration, and overgeneration). Furthermore, comparison against two baseline methods, the keyword-based ones (i.e., the term-based and the phrase-based method), shows that the proposed method obtained better performance results in terms of MRR and P@10 values

    Improving automation standards via semantic modelling: Application to ISA88

    Get PDF
    Standardization is essential for automation. Extensibility, scalability, and reusability are important features for automation software that rely in the efficient modelling of the addressed systems. The work presented here is from the ongoing development of a methodology for semi-automatic ontology construction methodology from technical documents. The main aim of this work is to systematically check the consistency of technical documents and support the improvement of technical document consistency. The formalization of conceptual models and the subsequent writing of technical standards are simultaneously analyzed, and guidelines proposed for application to future technical standards. Three paradigms are discussed for the development of domain ontologies from technical documents, starting from the current state of the art, continuing with the intermediate method presented and used in this paper, and ending with the suggested paradigm for the future. The ISA88 Standard is taken as a representative case study. Linguistic techniques from the semi-automatic ontology construction methodology is applied to the ISA88 Standard and different modelling and standardization aspects that are worth sharing with the automation community is addressed. This study discusses different paradigms for developing and sharing conceptual models for the subsequent development of automation software, along with presenting the systematic consistency checking methodPeer ReviewedPostprint (author's final draft

    Ontology Learning and Semantic Annotation: a Necessary Symbiosis

    Get PDF
    Semantic annotation of text requires the dynamic merging of linguistically structured information and a ?world model?, usually represented as a domain-specific ontology. On the other hand, the process of engineering a domain-ontology through semi-automatic ontology learning system requires the availability of a considerable amount of semantically annotated documents. Facing this bootstrapping paradox requires an incremental process of annotation-acquisition-annotation, whereby domain-specific knowledge is acquired from linguistically-annotated texts and then projected back onto texts for extra linguistic information to be annotated and further knowledge layers to be extracted. The presented methodology is a first step in the direction of a full ?virtuous? circle where the semantic annotation platform and the evolving ontology interact in symbiosis. As a case study we have chosen the semantic annotation of product catalogues. We propose a hybrid approach, combining pattern matching techniques to exploit the regular structure of product descriptions in catalogues, and Natural Language Processing techniques which are resorted to analyze natural language descriptions. The semantic annotation involves the access to the ontology, semi-automatically bootstrapped with an ontology learning tool from annotated collections of catalogues

    Automatic enrichment of WordNet with common-sense knowledge

    Get PDF

    Building a Portal for Scientific Collections at the University of Lisbon

    Get PDF
    Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2020As coleções científicas, reunindo uma enorme quantidade e diversidade de objetos e os dados que lhes estão associados, constituem um valioso património histórico, científico e cultural. Estas coleções estão, geralmente, sob a responsabilidade dos museus e dos seus respetivos curadores, sendo importante que exista uma plataforma sobre a qual os responsáveis das mesmas possam efetuar operações de gestão e de manutenção das mesmas. Atendendo a diversidade das coleções, estes dados, pertencentes a diferentes domínios científicos e com propriedades distintas, colocam problemas de integração, disponibilização e manutenção, problemas estes cada vez mais pertinentes numa realidade que vive de dados e da análise e partilha dos mesmos. Este projeto, centrado neste desafio, pretendeu desenvolver, para o Museu Nacional de História Natural e da Ciência da Universidade de Lisboa, uma plataforma que agregasse as variadíssimas coleções desta instituição, tirando partido de uma plataforma open-source base chamada CollectiveAccess. No decorrer do mesmo, foi desenvolvida uma metodologia generalizada para qualquer coleção que cobre os processos desde a aquisição dos dados, o seu processamento e correção ate a sua importação e disponibilização dentro da plataforma. Foram, também, desenvolvidas e implementadas funcionalidades especificas que visaram resolver determinadas características particulares dos diferentes conjuntos de dados como e o caso da implementação de um sistema hierárquico para dados relacionados com taxonomia, sistema de introdução de dados geográficos utilizando uma API externa e desenvolvimento das funcionalidades de pesquisa de modo a satisfazerem as necessidades de cada conjunto de dados. Estas funcionalidades e o desempenho do sistema foram avaliados através de dois questionários de usabilidade (System Usability Scale), atraves de dois Google Form diferentes. Estes questionários foram direcionados para dois tipos principais de utilizadores do sistema: curadores e publico, em geral. Para alem disto, foram pedidos comentários e sugestões de melhorias ou acrescento de funcionalidades. Os resultados dos questionários foram satisfatórios obtendo-se uma classificação de A e B, por parte dos testes do publico e dos curadores respetivamente, na escala de usabilidade. A analise dos comentários e sugestões também permitiu obter uma ideia sobre possíveis melhoramentos e novas funcionalidades a implementar.With scientific collections bringing together a huge number and diversity of objects and the data associated with them, they constitute a valuable historical, scientific and cultural heritage. These collections are generally under the responsibility of museums and their respective curators, and it is important that there is a platform on which those responsible for them can carry out management and maintenance operations. Given the diversity of the collections, these data, belonging to different scientific domains and with different properties, pose problems of integration, availability and maintenance, problems that are increasingly relevant in a data-centric world that relies on the analysis and sharing of the data. This project, focused on this challenge, aimed to develop, for the Museu Nacional de Historia Natural e da Ciência da Universidade de Lisboa, a platform that aggregates the very diverse collections of this institution, taking advantage of an open-source base platform called CollectiveAccess. In the course of the same, a generalized methodology was developed for any collection, covering the processes from the acquisition of the data, its processing and correction to its import and availability within the platform. Specific features were also developed and implemented that aimed at solving certain particular characteristics of different data sets, such as the implementation of a hierarchical system for taxonomyrelated data, geographic data entry system using an external API and development of the base search features, meeting the requirements for each collection. These functionalities and the overall performance of the system were evaluated through two usability questionnaires (System Usability Scale), via two different Google Forms. These questionnaires were aimed at two main types of users of the system: curators and the general public. In addition, comments and suggestions for improvements or addition of features were requested. The results of the questionnaires were satisfactory, obtaining a classification of A and B, by the tests of the public and the curators, respectively, on the usability scale. The analysis of comments and suggestions also provided an idea of possible improvements and new features to be implemented

    Ontology network analysis for safety learning in the railway domain

    Get PDF
    Ontologies have been used in diverse areas such as Knowledge Management (KM), Artificial Intelligence (AI), Natural Language Processing (NLP) and Semantic Web as they allow software applications to integrate, query and reason about concepts and relations within a knowledge domain. For Big Data Risk Analysis (BDRA) in railways, ontologies are a key enabler for obtaining valuable insights into safety from the large amount of data available from the railway. Traditionally, the ontology building has been an entirely manual process that has required a considerable human effort and development time. During the last decade, the in-formation explosion due to the Internet and the need to develop large-scale methods to extract patterns in a systematic way, has given rise the research area of “ontology learning”. Despite recent research efforts, ontol-ogy learning systems are still struggling with extracting terms (words or multiple-words) from text-based data. This manuscript explores the benefits of visual analytics to support the construction of ontologies for a particular part of railway safety management: possessions. In railways, possession operations are the protection arrangements for engineering work that ensure track workers remain separated from moving trains. A network of terms from possession operations standards is represented to extract the concepts of the ontology that enable the safety learning from events related to possession operations

    Where the bugs are: analyzing distributions of bacterial phyla by descriptor keyword search in the nucleotide database

    Get PDF
    Background The associations between bacteria and environment underlie their preferential interactions with given physical or chemical conditions. Microbial ecology aims at extracting conserved patterns of occurrence of bacterial taxa in relation to defined habitats and contexts. Results In the present report the NCBI nucleotide sequence database is used as dataset to extract information relative to the distribution of each of the 24 phyla of the bacteria superkingdom and of the Archaea. Over two and a half million records are filtered in their cross-association with each of 48 sets of keywords, defined to cover natural or artificial habitats, interactions with plant, animal or human hosts, and physical-chemical conditions. The results are processed showing: (a) how the different descriptors enrich or deplete the proportions at which the phyla occur in the total database; (b) in which order of abundance do the different keywords score for each phylum (preferred habitats or conditions), and to which extent are phyla clustered to few descriptors (specific) or spread across many (cosmopolitan); (c) which keywords individuate the communities ranking highest for diversity and evenness. Conclusions A number of cues emerge from the results, contributing to sharpen the picture on the functional systematic diversity of prokaryotes. Suggestions are given for a future automated service dedicated to refining and updating such kind of analyses via public bioinformatic engines
    corecore