Search CORE

877 research outputs found

Framework for Enhanced Ontology Alignment using BERT-Based

Author: Taye Mohammad Mustafa
Publication venue: Auricle Global Society of Education and Research
Publication date: 01/03/2024
Field of study

This framework combines a few approaches to improve ontology alignment by using the data mining method with BERT. The method utilizes data mining techniques to identify the optimal characteristics for picking the data attributes of instances to match ontologies. Furthermore, this framework was developed to improve current precision and recall measures for ontology matching techniques. Since knowledge integration began, the main requirement for ontology alignment has always been syntactic and structural matching. This article presents a new approach that employs advanced methods like data mining and BERT embeddings to produce more expansive and contextually aware ontology alignment. The proposed system exploits contextual representation of BERT, semantic understanding, feature extraction, and pattern recognition through data mining techniques. The objective is to combine data-driven insights with semantic representation advantages to enhance accuracy and efficiency in the ontology alignment process. The evaluation conducted using annotated datasets as well as traditional approaches demonstrates how effective and adaptable, according to domains, our proposed framework is across several domains

International Journal on Recent and Innovation Trends in Computing and Communication

Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain

Author: Buscaldi Davide
Dessì Danilo
Motta Enrico
Osborne Francesco
Reforgiato Recupero Diego
Publication venue: 'Elsevier BV'
Publication date: 28/10/2020
Field of study

The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, ii) describe an approach for integrating entities and relationships generated by these tools, iii) show the advantage of such an hybrid system over alternative approaches, and vi) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge

arXiv.org e-Print Archive

Open Research Online (The Open University)

Archivio istituzionale della ricerca - Università di Cagliari

HAL-Paris 13

Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion

Author: Han Jiawei
Huang Jiaxin
Meng Yu
Shen Jiaming
Xie Yiqing
Zhang Yunyi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/01/2020
Field of study

Given a small set of seed entities (e.g., ``USA'', ``Russia''), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.Comment: WWW 202

arXiv.org e-Print Archive

Crossref

requirements and use cases

Author: Coskun Gökhan
Heese Ralf
Luczak-Rösch Markus
Oldakowski Radoslaw
Schäfermeier Ralph
Streibel Olga
Publication venue
Publication date: 01/01/2008
Field of study

In this report, we introduce our initial vision of the Corporate Semantic Web as the next step in the broad field of Semantic Web research. We identify requirements of the corporate environment and gaps between current approaches to tackle problems facing ontology engineering, semantic collaboration, and semantic search. Each of these pillars will yield innovative methods and tools during the project runtime until 2013. Corporate ontology engineering will improve the facilitation of agile ontology engineering to lessen the costs of ontology development and, especially, maintenance. Corporate semantic collaboration focuses the human-centered aspects of knowledge management in corporate contexts. Corporate semantic search is settled on the highest application level of the three research areas and at that point it is a representative for applications working on and with the appropriately represented and delivered background knowledge. We propose an initial layout for an integrative architecture of a Corporate Semantic Web provided by these three core pillars

Institutional Repository of the Freie Universität Berlin

Recommended from our members

Triplétoile: Extraction of knowledge from microblogging text

Author: Angioni Simone
Buscaldi Davide
Consoli Sergio
Dessí Danilo
Fenu Gianni
Osborne Francesco
Reforgiato Recupero Diego
Zavarella Vanni
Publication venue
Publication date: 30/06/2024
Field of study

Numerous methods and pipelines have recently emerged for the automatic extraction of knowledge graphs from documents such as scientific publications and patents. However, adapting these methods to incorporate alternative text sources like micro-blogging posts and news has proven challenging as they struggle to model open-domain entities and relations, typically found in these sources. In this paper, we propose an enhanced information extraction pipeline tailored to the extraction of a knowledge graph comprising open-domain entities from micro-blogging posts on social media platforms. Our pipeline leverages dependency parsing and classifies entity relations in an unsupervised manner through hierarchical clustering over word embeddings. We provide a use case on extracting semantic triples from a corpus of 100 thousand tweets about digital transformation and publicly release the generated knowledge graph. On the same dataset, we conduct two experimental evaluations, showing that the system produces triples with precision over 95% and outperforms similar pipelines of around 5% in terms of precision, while generating a comparatively higher number of triples

Open Research Online (The Open University)

Exploiting Inter-Sample Information and Exploring Visualization in Data Mining: from Bioinformatics to Anthropology and Aesthetics Disciplines

Author: Jung-Hua Liu
Kuan-Ming Lin
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

IntechOpen

Large Language Models and Knowledge Graphs: Opportunities and Challenges

Author: Biswas Russa
Bonifati Angela
Chen Jiaoyan
de Melo Gerard
Dietze Stefan
Dragoni Mauro
Graux Damien
Jabeen Hajira
Kalo Jan-Christoph
Lissandrini Matteo
Omeliyanenko Janna
Pan Jeff Z.
Razniewski Simon
Singhania Sneha
Vakaj Edlira
Zhang Wen
Publication venue
Publication date: 11/08/2023
Field of study

Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and Knowledge Graphs (explicit knowledge) and speculate on opportunities and visions that the renewed focus brings, as well as related research topics and challenges.Comment: 30 page

arXiv.org e-Print Archive

Automation of companies’ recruitment process: development of an algorithm capable of ranking CVs according to job offers

Author: Rocha Beatriz de Freitas
Publication venue
Publication date: 01/01/2022
Field of study

Dissertação de mestrado integrado em Informatics EngineeringThis document presents a Thesis and describes the underlying work which was developed along the second year of the Master Degree in Informatics Engineering offered by Departamento de Informática of Universidade do Minho and accomplished at Syone SBS Software – Tecnologia e Serviços de Informática, S.A.. In the past few years, some attempts to automatically screening CVs with resource to Natural Language Processing have been made not only to save recruiters’ time, but also to spare them the most tedious task of the recruitment process and, consequently, smooth their job. However, the majority is still very primitive, misclassifies a lot of CVs and needs a deeper study. Therefore, the aim of this Master’s Project is precisely to develop an algorithm that is capable of automatically ranking candidates’ CVs according to their similarity regarding the job offer they applied for. Thus, a general architecture was proposed where CVs and job offers are preprocessed, in order to obtain the respective texts proper to be further processed. That said, two different approaches were followed, in order to find the similarity between the documents in question. To do so, the first approach resorted to several Machine Learning algorithms and similarity measures, while the second approach structured the initial documents to compare their respective information. After that, tests were conducted to evaluate both approaches and enable the comparison between them. Finally, the conclusions were drawn and also reported in this dissertation.Este documento apresenta uma Tese e descreve o trabalho subjacente que foi desenvolvido ao longo do segundo ano do Mestrado em Engenharia Informática do Departamento de Informática da Universidade do Minho e realizado na Syone SBS Software – Tecnologia e Serviços de Informática, S.A.. Nos últimos anos, algumas tentativas de triagem automática de currículos com recurso a Processamento de Linguagem Natural foram feitas não só para economizar o tempo dos recrutadores, mas também para os poupar da tarefa mais entediante do processo de recrutamento e, consequentemente, suavizar o seu trabalho. Contudo, a maioria ainda é muito primitiva, classifica incorretamente muitos currículos e necessita de um estudo mais aprofundado. Sendo assim, o objetivo deste Projeto de Mestrado é precisamente desenvolver um algoritmo capaz de classificar automaticamente os currículos dos candidatos de acordo com a sua similaridade relativamente à oferta de emprego a que se candidataram. Deste modo, foi proposta uma arquitetura geral onde os CVs e as ofertas de emprego são pré-processados, de forma a obter os respetivos textos adequados para posterior processamento. Dito isto, foram seguidas duas abordagens distintas, de forma a encontrar a semelhança entre os documentos em questão. Para tal, a primeira abordagem recorreu a diversos algoritmos de Aprendizagem Automática e medidas de similaridade, enquanto a segunda abordagem estruturou os documentos iniciais para comparar as suas respetivas informações. De seguida, foram realizados testes para avaliar ambas as abordagens e possibilitar a comparação entre elas. Por fim, as conclusões foram tiradas e também relatadas nesta dissertação

Universidade do Minho: RepositoriUM

Social and Semantic Contexts in Tourist Mobile Applications

Author: Pitassi Emanuela
Publication venue: place:Udine
Publication date: 30/03/2015
Field of study

The ongoing growth of the World Wide Web along with the increase possibility of access information through a variety of devices in mobility, has defi nitely changed the way users acquire, create, and personalize information, pushing innovative strategies for annotating and organizing it. In this scenario, Social Annotation Systems have quickly gained a huge popularity, introducing millions of metadata on di fferent Web resources following a bottom-up approach, generating free and democratic mechanisms of classi cation, namely folksonomies. Moving away from hierarchical classi cation schemas, folksonomies represent also a meaningful mean for identifying similarities among users, resources and tags. At any rate, they suff er from several limitations, such as the lack of specialized tools devoted to manage, modify, customize and visualize them as well as the lack of an explicit semantic, making di fficult for users to bene fit from them eff ectively. Despite appealing promises of Semantic Web technologies, which were intended to explicitly formalize the knowledge within a particular domain in a top-down manner, in order to perform intelligent integration and reasoning on it, they are still far from reach their objectives, due to di fficulties in knowledge acquisition and annotation bottleneck. The main contribution of this dissertation consists in modeling a novel conceptual framework that exploits both social and semantic contextual dimensions, focusing on the domain of tourism and cultural heritage. The primary aim of our assessment is to evaluate the overall user satisfaction and the perceived quality in use thanks to two concrete case studies. Firstly, we concentrate our attention on contextual information and navigation, and on authoring tool; secondly, we provide a semantic mapping of tags of the system folksonomy, contrasted and compared to the expert users' classi cation, allowing a bridge between social and semantic knowledge according to its constantly mutual growth. The performed user evaluations analyses results are promising, reporting a high level of agreement on the perceived quality in use of both the applications and of the speci c analyzed features, demonstrating that a social-semantic contextual model improves the general users' satisfactio

Archivio istituzionale della ricerca - Università degli Studi di Udine