7,591 research outputs found

    Local matching learning of large scale biomedical ontologies

    Get PDF
    Les larges ontologies biomĂ©dicales dĂ©crivent gĂ©nĂ©ralement le mĂȘme domaine d'intĂ©rĂȘt, mais en utilisant des modĂšles de modĂ©lisation et des vocabulaires diffĂ©rents. Aligner ces ontologies qui sont complexes et hĂ©tĂ©rogĂšnes est une tĂąche fastidieuse. Les systĂšmes de matching doivent fournir des rĂ©sultats de haute qualitĂ© en tenant compte de la grande taille de ces ressources. Les systĂšmes de matching d'ontologies doivent rĂ©soudre deux problĂšmes: (i) intĂ©grer la grande taille d'ontologies, (ii) automatiser le processus d'alignement. Le matching d'ontologies est une tĂąche difficile en raison de la large taille des ontologies. Les systĂšmes de matching d'ontologies combinent diffĂ©rents types de matcher pour rĂ©soudre ces problĂšmes. Les principaux problĂšmes de l'alignement de larges ontologies biomĂ©dicales sont: l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle, l'espace de recherche Ă©levĂ© et la qualitĂ© rĂ©duite des alignements rĂ©sultants. Les systĂšmes d'alignement d'ontologies combinent diffĂ©rents matchers afin de rĂ©duire l'hĂ©tĂ©rogĂ©nĂ©itĂ©. Cette combinaison devrait dĂ©finir le choix des matchers Ă  combiner et le poids. DiffĂ©rents matchers traitent diffĂ©rents types d'hĂ©tĂ©rogĂ©nĂ©itĂ©. Par consĂ©quent, le paramĂ©trage d'un matcher devrait ĂȘtre automatisĂ© par les systĂšmes d'alignement d'ontologies afin d'obtenir une bonne qualitĂ© de correspondance. Nous avons proposĂ© une approche appele "local matching learning" pour faire face Ă  la fois Ă  la grande taille des ontologies et au problĂšme de l'automatisation. Nous divisons un gros problĂšme d'alignement en un ensemble de problĂšmes d'alignement locaux plus petits. Chaque problĂšme d'alignement local est indĂ©pendamment alignĂ© par une approche d'apprentissage automatique. Nous rĂ©duisons l'Ă©norme espace de recherche en un ensemble de taches de recherche de corresondances locales plus petites. Nous pouvons aligner efficacement chaque tache de recherche de corresondances locale pour obtenir une meilleure qualitĂ© de correspondance. Notre approche de partitionnement se base sur une nouvelle stratĂ©gie Ă  dĂ©coupes multiples gĂ©nĂ©rant des partitions non volumineuses et non isolĂ©es. Par consĂ©quence, nous pouvons surmonter le problĂšme de l'hĂ©tĂ©rogĂ©nĂ©itĂ© conceptuelle. Le nouvel algorithme de partitionnement est basĂ© sur le clustering hiĂ©rarchique par agglomĂ©ration (CHA). Cette approche gĂ©nĂšre un ensemble de tĂąches de correspondance locale avec un taux de couverture suffisant avec aucune partition isolĂ©e. Chaque tĂąche d'alignement local est automatiquement alignĂ©e en se basant sur les techniques d'apprentissage automatique. Un classificateur local aligne une seule tĂąche d'alignement local. Les classificateurs locaux sont basĂ©s sur des features Ă©lĂ©mentaires et structurelles. L'attribut class de chaque set de donne d'apprentissage " training set" est automatiquement Ă©tiquetĂ© Ă  l'aide d'une base de connaissances externe. Nous avons appliquĂ© une technique de sĂ©lection de features pour chaque classificateur local afin de sĂ©lectionner les matchers appropriĂ©s pour chaque tĂąche d'alignement local. Cette approche rĂ©duit la complexitĂ© d'alignement et augmente la prĂ©cision globale par rapport aux mĂ©thodes d'apprentissage traditionnelles. Nous avons prouvĂ© que l'approche de partitionnement est meilleure que les approches actuelles en terme de prĂ©cision, de taux de couverture et d'absence de partitions isolĂ©es. Nous avons Ă©valuĂ© l'approche d'apprentissage d'alignement local Ă  l'aide de diverses expĂ©riences basĂ©es sur des jeux de donnĂ©es d'OAEI 2018. Nous avons dĂ©duit qu'il est avantageux de diviser une grande tĂąche d'alignement d'ontologies en un ensemble de tĂąches d'alignement locaux. L'espace de recherche est rĂ©duit, ce qui rĂ©duit le nombre de faux nĂ©gatifs et de faux positifs. L'application de techniques de sĂ©lection de caractĂ©ristiques Ă  chaque classificateur local augmente la valeur de rappel pour chaque tĂąche d'alignement local.Although a considerable body of research work has addressed the problem of ontology matching, few studies have tackled the large ontologies used in the biomedical domain. We introduce a fully automated local matching learning approach that breaks down a large ontology matching task into a set of independent local sub-matching tasks. This approach integrates a novel partitioning algorithm as well as a set of matching learning techniques. The partitioning method is based on hierarchical clustering and does not generate isolated partitions. The matching learning approach employs different techniques: (i) local matching tasks are independently and automatically aligned using their local classifiers, which are based on local training sets built from element level and structure level features, (ii) resampling techniques are used to balance each local training set, and (iii) feature selection techniques are used to automatically select the appropriate tuning parameters for each local matching context. Our local matching learning approach generates a set of combined alignments from each local matching task, and experiments show that a multiple local classifier approach outperforms conventional, state-of-the-art approaches: these use a single classifier for the whole ontology matching task. In addition, focusing on context-aware local training sets based on local feature selection and resampling techniques significantly enhances the obtained results

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains

    Get PDF
    Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services. Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings

    BlogForever D2.6: Data Extraction Methodology

    Get PDF
    This report outlines an inquiry into the area of web data extraction, conducted within the context of blog preservation. The report reviews theoretical advances and practical developments for implementing data extraction. The inquiry is extended through an experiment that demonstrates the effectiveness and feasibility of implementing some of the suggested approaches. More specifically, the report discusses an approach based on unsupervised machine learning that employs the RSS feeds and HTML representations of blogs. It outlines the possibilities of extracting semantics available in blogs and demonstrates the benefits of exploiting available standards such as microformats and microdata. The report proceeds to propose a methodology for extracting and processing blog data to further inform the design and development of the BlogForever platform

    The Computer Science Ontology: A Comprehensive Automatically-Generated Taxonomy of Research Areas

    Get PDF
    Ontologies of research areas are important tools for characterising, exploring, and analysing the research landscape. Some fields of research are comprehensively described by large-scale taxonomies, e.g., MeSH in Biology and PhySH in Physics. Conversely, current Computer Science taxonomies are coarse-grained and tend to evolve slowly. For instance, the ACM classification scheme contains only about 2K research topics and the last version dates back to 2012. In this paper, we introduce the Computer Science Ontology (CSO), a large-scale, automatically generated ontology of research areas, which includes about 14K topics and 162K semantic relationships. It was created by applying the Klink-2 algorithm on a very large dataset of 16M scientific articles. CSO presents two main advantages over the alternatives: i) it includes a very large number of topics that do not appear in other classifications, and ii) it can be updated automatically by running Klink-2 on recent corpora of publications. CSO powers several tools adopted by the editorial team at Springer Nature and has been used to enable a variety of solutions, such as classifying research publications, detecting research communities, and predicting research trends. To facilitate the uptake of CSO, we have also released the CSO Classifier, a tool for automatically classifying research papers, and the CSO Portal, a web application that enables users to download, explore, and provide granular feedback on CSO. Users can use the portal to navigate and visualise sections of the ontology, rate topics and relationships, and suggest missing ones. The portal will support the publication of and access to regular new releases of CSO, with the aim of providing a comprehensive resource to the various research communities engaged with scholarly data

    A hybrid semantic approach to building dynamic maps of research communities

    Get PDF
    In the last ten years, ontology-based recommender systems have been shown to be effective tools for predicting user preferences and suggesting items. There are however some issues associated with the ontologies adopted by these approaches, such as: 1) their crafting is not a cheap process, being time consuming and calling for specialist expertise; 2) they may not represent accurately the viewpoint of the targeted user community; 3) they tend to provide rather static models, which fail to keep track of evolving user perspectives. To address these issues, we propose Klink UM, an approach for extracting emergent semantics from user feedbacks, with the aim of tailoring the ontology to the users and improving the recommendations accuracy. Klink UM uses statistical and machine learning techniques for finding hierarchical and similarity relationships between keywords associated with rated items and can be used for: 1) building a conceptual taxonomy from scratch, 2) enriching and correcting an existing ontology, 3) providing a numerical estimate of the intensity of semantic relationships according to the users. The evaluation shows that Klink UM performs well with respect to handcrafted ontologies and can significantly increase the accuracy of suggestions in content-based recommender systems
    • 

    corecore