46 research outputs found

    AQUEOS: A system for question answering over semantic data

    No full text
    This paper presents a methodology to automatically answer natural language questions by querying an underlying domain ontology. This methodology is made up of a three-phase process, where the ontological data is firstly read and indexed, the input question is then processed by means of lexical analysis and associated with a specific question type, and finally the corresponding SPARQL queries are generated and executed in order to return the answer to the original question. This process focuses on single-verb phrases in order to guarantee a highest level of precision in providing its answer, and deals with critical lexical aspects like comparatives and superlatives by relying upon language-specific lexicons, nonetheless, it is also able to take into account more complex questions with multiple verbs, provided they meet certain specific criteria. Such a methodology has been implemented in a research prototype and is being currently experimented upon by asking questions either in the English or the Italian language, and could be applied on a number of ontology-driven applications, including advanced help desk support systems, biomedical knowledge bases and intelligent e-learning solutions

    Assessment of the E3C corpus for the recognition of disorders in clinical texts

    No full text
    Disorder named entity recognition (DNER) is a fundamental task of biomedical natural language processing, which has attracted plenty of attention. This task consists in extracting named entities of disorders such as diseases, symptoms, and pathological functions from unstructured text. The European Clinical Case Corpus (E3C) is a freely available multilingual corpus (English, French, Italian, Spanish, and Basque) of semantically annotated clinical case texts. The entities of type disorder in the clinical cases are annotated at both mention and concept level. At mention -level, the annotation identifies the entity text spans, for example, abdominal pain. At concept level, the entity text spans are associated with their concept identifiers in Unified Medical Language System, for example, C0000737. This corpus can be exploited as a benchmark for training and assessing information extraction systems. Within the context of the present work, multiple experiments have been conducted in order to test the appropriateness of the mention-level annotation of the E3C corpus for training DNER models. In these experiments, traditional machine learning models like conditional random fields and more recent multilingual pre-trained models based on deep learning were compared with standard baselines. With regard to the multilingual pre-trained models, they were fine-tuned (i) on each language of the corpus to test per-language performance, (ii) on all languages to test multilingual learning, and (iii) on all languages except the target language to test cross-lingual transfer learning. Results show the appropriateness of the E3C corpus for training a system capable of mining disorder entities from clinical case texts. Researchers can use these results as the baselines for this corpus to compare their own models. The implemented models have been made available through the European Language Grid platform for quick and easy access

    Maeo: An ontology for modeling agents, experts and expertise within an open online materials modeling marketplace

    No full text
    Thiswork describes theMAEOontology, which models agents, experts, expertise and knowledge providers in general within the context of an online, open marketplace for materials modeling and related collaboration activities. Such a marketplace is meant as a one-stop shop for enabling and accelerating materials modeling in industry, and is currently under development within the Horizon 2020 "Market- Place" project. The MAEO ontology has been developed as the underlying basis for the online platform provided by the project, where users may look for experts with a certain expertise, and the latter may subscribe and be made visible and searchable by the users. As it stands, the MAEO ontology is part of a larger effort of ontological modeling for the Materials Modeling area having the EMMO (European Materials Modeling Ontology) as its root, and as such can be seen as an EMMO-based, application-level ontology. This work thus details the ontology's domain, purpose and structure, and underlines the connection with external ontologies, while also providing a brief description of its usage and technical implementation

    On the road to speed-reading and fast learning with CONCEPTUM

    No full text
    This work introduces CONCEPTUM, an advanced knowledge discovery system for speed-reading natural language texts and allowing faster and more effective learning. CONCEPTUM sports a huge plethora of features, ranging from language detection and conceptualization, up to semantic categorization, named entity recognition and automatic ontology building, effectively turning an unstructured textual source into concepts, topics, relationships and summaries to quickly and easily browse it and classify it. The system does not require any training or configuration and at present can be applied as-is on general-purpose English and Italian texts, providing disparate kinds of users with a powerful means to significantly speed up and improve their learning and research activities. In this work, a challenging experimentation on the Biochemistry field is reported to highlight and discuss the arising critical issues in the application of the system on a highly-technical domain

    A semantic knowledge discovery framework for detecting online terrorist networks

    No full text
    This paper presents a knowledge discovery framework, with the purpose of detecting terrorist presence in terms of potential suspects and networks on the open and Deep Web. The framework combines information extraction methods and tools and natural language processing techniques, together with semantic information derived from social network analysis, in order to automatically process online content coming from disparate sources and identify people and relationships that may be linked to terrorist activities. This framework has been developed within the context of the DANTE Horizon 2020 project, as part of a larger international effort to detect and analyze terrorist-related content from online sources and help international police organizations in their investigations against crime and terrorism

    SEMANTO: a graphical ontology management system for knowledge discovery

    No full text
    This work describes a visual system for managing ontologies in the RDF formalism, providing a number of features for creating, updating and deleting elements and instances via a user-friendly graphical interface, along with a set of advanced operators that can be applied upon them. These operators implement mechanisms for ontology instance matching and integration, ontology enrichment with semantically-related concepts, as well as question answering in natural language, with the purpose of discovering knowledge from the underlying ontologies. SEMANTO may display and manage RDF ontologies via SPARQL endpoints, including user-defined ontologies and subsets of Linked Open Data. SEMANTO has been experimented upon against ontological schema and instances derived from a knowledge model for learning management systems and from a learning application for online dispute resolution

    A visual ontology management system for handling, integrating and enriching semantic repositories

    No full text
    This paper presents a prototype system for managing ontological data from RDF semantic repositories, via an intuitive, graph-based visual interface. The core of the system provides basic editing functionalities for the managed ontologies, while at the same time allowing for more advanced operations to be plugged-in and applied on them, including the execution of ontology integration algorithms or the enrichment of the ontological knowledge bases via conceptualization mechanisms. The system is able to handle and visualize any ontology accessible from a SPARQL endpoint, and as such could be used to visualize portions of Linked Open Data repositories as well. The prototype has been applied on a case study revolving around a learning application for lawyers within the context of a larger software framework

    Semi-automatic generation of an Object-Oriented API framework over semantic repositories

    No full text
    This paper presents a system able to generate an abstraction framework over a RDF-based, semantic triplestore, offering Object-Oriented Application Programming Interfaces to be made available for external applications. The system only requires a well-defined RDF schema and a minimal supervision by the user, and is able to produce all of the components of the API framework at their different layers, ranging from data source classes up to higher-level modules in terms of web service interfaces, in order to provide CRUD operations over the underlying semantic data. The system is sufficiently generic to accept any RDF repository with its schema as input, and can be easily configured to fine-tune the automatic generation of the API components to suit the needs of specific applications. The system has been deployed and tested on top of a large semantic repository featuring a schema where multiple real-world conceptualizations are defined, including one representing a learning model specifically designed for advanced e-learning management platforms

    RAN-Map: a system for automatically producing API layers from RDF schemas

    No full text
    This work describes a system for the automatic generation of full-fledged API layers from RDF schemas, providing the whole set of Object-Oriented functionalities to retrieve, store, edit and delete the corresponding data in a semantic Triplestore. The layers the system is capable of producing range from an underlying domain model, resulting from the classes, data properties and object properties of the input schema, to the related lower-level data source and access components, up to higher-level facades and web service interfaces, all of which are immediately operational and can be used out-of-the-box for development purposes either as stand-alone components or integrated into external applications. A user-friendly graphical interface allows for an easy configuration and customization of the generation process to suit specific development needs. Once configured, the execution of the generation process takes place almost instantaneously, bringing about a full set of API components in a matter of seconds and thus dramatically saving design and development time and effort. Experimentation of the system has been carried out within the context of a EU-funded research project featuring a large semantic schema, a significant portion of which represented a Learning Model specifically engineered to be used for a plethora of e-learning solutions; nevertheless, the system is generic enough to be employed for a variety of applications relying upon semantic schemas and data

    Experimentation of an automatic resolution method for protein abbreviations in full-text papers

    No full text
    We report and comment the experimental results of the PRAISED system, which implements an automatic method for discovering and resolving a wide range of protein name abbreviations from the full-text versions of scientific articles. This system has been recently proposed as part of a framework for creating and maintaining a publicly-accessible abbreviation repository. The testing phase was carried out against the widely used Medstract Gold Standard Corpus and a relevant subset of real scientific papers extracted from the PubMed database. As far as the Medstract corpus is concerned, we obtained significantly high results in terms of recall, precision and overall correctness. As for the fulltext papers, results inevitably varied, due to the complex and often chaotic nature of the confronted domain; even so, we detected encouraging levels of recall and extremely fast execution times. The major strength of the system lies in addressing the unstructuredness of the scientific publications and being able to save time and effort for extracting protein-related information in an automatic fashion, while at the same time keeping computational overhead to a minimum thanks to its light-weight approach. Copyright \ua9 2011 ACM
    corecore