34 research outputs found
Named Entity Resolution in Personal Knowledge Graphs
Entity Resolution (ER) is the problem of determining when two entities refer
to the same underlying entity. The problem has been studied for over 50 years,
and most recently, has taken on new importance in an era of large,
heterogeneous 'knowledge graphs' published on the Web and used widely in
domains as wide ranging as social media, e-commerce and search. This chapter
will discuss the specific problem of named ER in the context of personal
knowledge graphs (PKGs). We begin with a formal definition of the problem, and
the components necessary for doing high-quality and efficient ER. We also
discuss some challenges that are expected to arise for Web-scale data. Next, we
provide a brief literature review, with a special focus on how existing
techniques can potentially apply to PKGs. We conclude the chapter by covering
some applications, as well as promising directions for future research.Comment: To appear as a book chapter by the same name in an upcoming (Oct.
2023) book `Personal Knowledge Graphs (PKGs): Methodology, tools and
applications' edited by Tiwari et a
Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries
Os dicionários de língua geral apresentam inconsistências de uniformização e cientificidade no tratamento do conteúdo lexicográfico especializado. Analisando a presença e o tratamento de termos em dicionários de língua geral, propomos um tratamento mais uniforme e cientificamente rigoroso desse conteúdo, considerando também a necessidade de compilar e alinhar futuros recursos lexicais em consonância com padrões interoperáveis. Partimos da premissa de que o tratamento dos itens lexicais, sejam unidades lexicais (palavras em geral) ou unidades terminológicas (termos ou palavras pertencentes a determinados domínios), deve ser diferenciado, e recorremos a métodos terminológicos para tratar os termos dicionarizados. A nossa abordagem assume que a terminologia – na sua dupla dimensão linguística e conceptual – e a lexicografia, como domínios interdisciplinares, podem ser complementares. Assim, apresentamos objetivos teóricos (aperfeiçoamento da metalinguagem e descrição lexicográfica a partir de pressupostos terminológicos) e práticos (representação consistente de dados lexicográficos), que visam facilitar a organização, descrição e modelização consistente de componentes lexicográficos, nomeadamente a hierarquização das etiquetas de domínio, que são marcadores de identificação de léxico especializados. Queremos ainda facilitar a redação de definições, as quais podem ser otimizadas e elaboradas com maior precisão científica ao seguir uma abordagem terminológica no tratamento dos termos. Analisámos os dicionários desenvolvidos por três instituições académicas distintas: a Academia das Ciências de Lisboa, a Real Academia Española e a Académie Française, que representam um valioso legado da tradição lexicográfica académica europeia. A análise inicial inclui um levantamento exaustivo e a comparação das etiquetas de domínio usadas, bem como um debate sobre as opções escolhidas e um estudo comparativo do tratamento dos termos. Elaborámos, depois, uma proposta metodológica para o tratamento de termos em dicionários de língua geral, tomando como exemplo dois domínios, GEOLOGIA e FUTEBOL, extraídos da edição de 2001 do dicionário da Academia das Ciências de Lisboa. Revimos os termos selecionados de acordo com os princípios terminológicos defendidos, dando origem a sentidos especializados revistos/novos para a primeira edição digital deste dicionário. Representamos e anotamos os dados usando as especificações da TEI Lex-0, uma extensão da TEI (Text Encoding Initiative), dedicada à codificação de dados lexicográficos. Destacamos também a importância de ter etiquetas de domínio hierárquicas em vez de uma lista simples de domínios, vantajosas para a organização dos dados, correspondência e possíveis futuros alinhamentos entre diferentes recursos lexicográficos. A investigação revelou que a) os modelos estruturais dos recursos lexicais são complexos e contêm informação de natureza diversa; b) as etiquetas de domínio nos dicionários gerais da língua são planas, desequilibradas, inconsistentes e, muitas vezes, estão desatualizadas, havendo necessidade de as hierarquizar para organizar o conhecimento especializado; c) os critérios adotados para a marcação dos termos e as fórmulas utilizadas na definição são díspares; d) o tratamento dos termos é heterogéneo e formulado de diferentes formas, pelo que o recurso a métodos terminológicos podem ajudar os lexicógrafos a redigir definições; e) a aplicação de métodos terminológicos e lexicográficos interdisciplinares, e também de padrões, é vantajosa porque permite a construção de bases de dados lexicais estruturadas, concetualmente organizadas, apuradas do ponto de vista linguístico e interoperáveis. Em suma, procuramos contribuir para a questão urgente de resolver problemas que afetam a partilha, o alinhamento e vinculação de dados lexicográficos.General language dictionaries show inconsistencies in terms of uniformity and scientificity in the treatment of specialised lexicographic content. By analysing the presence and treatment of terms in general language dictionaries, we propose a more uniform and scientifically rigorous treatment of this content, considering the necessity of compiling and aligning future lexical resources according to interoperable standards. We begin from the premise that the treatment of lexical items, whether lexical units (words in general) or terminological units (terms or words belonging to particular subject fields), must be differentiated, and resort to terminological methods to treat dictionary terms. Our approach assumes that terminology – in its dual dimension, both linguistic and conceptual – and lexicography, as interdisciplinary domains, can be complementary. Thus, we present theoretical (improvement of metalanguage and lexicographic description based on terminological assumptions) and practical (consistent representation of lexicographic data) objectives that aim to facilitate the organisation, description and consistent modelling of lexicographic components, namely the hierarchy of domain labels, as they are specialised lexicon identification markers. We also want to facilitate the drafting of definitions, which can be optimised and elaborated with greater scientific precision by following a terminological approach for the treatment of terms. We analysed the dictionaries developed by three different academic institutions: the Academia das Ciências de Lisboa, the Real Academia Española and the Académie Française, which represent a valuable legacy of the European academic lexicographic tradition. The initial analysis includes an exhaustive survey and comparison of the domain labels used, as well as a debate on the chosen options and a comparative study of the treatment of the terms. We then developed a methodological proposal for the treatment of terms in general language dictionaries, exemplified using terms from two domains, GEOLOGY and FOOTBALL, taken from the 2001 edition of the dictionary of the Academia das Ciências de Lisboa. We revised the selected terms according to the defended terminological principles, giving rise to revised/new specialised meanings for the first digital edition of this dictionary. We represent and annotate the data using the TEI Lex-0 specifications, a TEI (Text Encoding Initiative) subset for encoding lexicographic data. We also highlight the importance of having hierarchical domain labels instead of a simple list of domains, which are beneficial to the data organisation itself, correspondence and possible future alignments between different lexicographic resources. Our investigation revealed the following: a) structural models of lexical resources are complex and contain information of a different nature; b) domain labels in general language dictionaries are flat, unbalanced, inconsistent and often outdated, requiring the need to hierarchise them for organising specialised knowledge; c) the criteria adopted for marking terms and the formulae used in the definition are disparate; d) the treatment of terms is heterogeneous and formulated differently, whereby terminological methods can help lexicographers to draft definitions; e) the application of interdisciplinary terminological and lexicographic methods, and of standards, is advantageous because it allows the construction of structured, conceptually organised, linguistically accurate and interoperable lexical databases. In short, we seek to contribute to the urgent issue of solving problems that affect the sharing, alignment and linking of lexicographic data
Knowledge Graphs Evolution and Preservation -- A Technical Report from ISWS 2019
One of the grand challenges discussed during the Dagstuhl Seminar "Knowledge
Graphs: New Directions for Knowledge Representation on the Semantic Web" and
described in its report is that of a: "Public FAIR Knowledge Graph of
Everything: We increasingly see the creation of knowledge graphs that capture
information about the entirety of a class of entities. [...] This grand
challenge extends this further by asking if we can create a knowledge graph of
"everything" ranging from common sense concepts to location based entities.
This knowledge graph should be "open to the public" in a FAIR manner
democratizing this mass amount of knowledge." Although linked open data (LOD)
is one knowledge graph, it is the closest realisation (and probably the only
one) to a public FAIR Knowledge Graph (KG) of everything. Surely, LOD provides
a unique testbed for experimenting and evaluating research hypotheses on open
and FAIR KG. One of the most neglected FAIR issues about KGs is their ongoing
evolution and long term preservation. We want to investigate this problem, that
is to understand what preserving and supporting the evolution of KGs means and
how these problems can be addressed. Clearly, the problem can be approached
from different perspectives and may require the development of different
approaches, including new theories, ontologies, metrics, strategies,
procedures, etc. This document reports a collaborative effort performed by 9
teams of students, each guided by a senior researcher as their mentor,
attending the International Semantic Web Research School (ISWS 2019). Each team
provides a different perspective to the problem of knowledge graph evolution
substantiated by a set of research questions as the main subject of their
investigation. In addition, they provide their working definition for KG
preservation and evolution
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-‐it 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall “Cavallerizza Reale”. The CLiC-‐it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Sustainable Human Resource Management
The concept of sustainability is important for companies both in the case of SMEs and worldwide multinational companies. Some key factors to help a company achieve its sustainability objectives are based on human resource management. Sustainable human resource management is a typical cross-functional task that becomes increasingly important at the strategic level of a company. Industry 4.0 technologies, Internet of Things, and competitive demands, as signs of globalization, have led to significant changes across the organizational structures and human resource strategies of companies. The increasing importance of sophisticated human resource strategies in the life of companies and the intention to find optimal design and operation strategies for sustainable human resource management were a motivation for launching this book. This book offers a selection of papers which explain the impact of smart human resource management on economy. Authors from 14 countries published working examples and case studies resulting from their research in this field. The aim of this book is to help students at the level of BSc, MSc, and PhD level, as well as managers and researchers, to understand and appreciate the concept, design, and implementation of sustainable human resource management solutions
A context -and template- based data compression approach to improve resource-constrained IoT systems interoperability.
170 p.El objetivo del Internet de las Cosas (the Internet of Things, IoT) es el de interconectar todo tipo de cosas, desde dispositivos simples, como una bombilla o un termostato, a elementos más complejos y abstractoscomo una máquina o una casa. Estos dispositivos o elementos varían enormemente entre sí, especialmente en las capacidades que poseen y el tipo de tecnologías que utilizan. Esta heterogeneidad produce una gran complejidad en los procesos integración en lo que a la interoperabilidad se refiere.Un enfoque común para abordar la interoperabilidad a nivel de representación de datos en sistemas IoT es el de estructurar los datos siguiendo un modelo de datos estándar, así como formatos de datos basados en texto (e.g., XML). Sin embargo, el tipo de dispositivos que se utiliza normalmente en sistemas IoT tiene capacidades limitadas, así como recursos de procesamiento y de comunicación escasos. Debido a estas limitaciones no es posible integrar formatos de datos basados en texto de manera sencilla y e1ciente en dispositivos y redes con recursos restringidos. En esta Tesis, presentamos una novedosa solución de compresión de datos para formatos de datos basados en texto, que está especialmente diseñada teniendo en cuenta las limitaciones de dispositivos y redes con recursos restringidos. Denominamos a esta solución Context- and Template-based Compression (CTC). CTC mejora la interoperabilidad a nivel de los datos de los sistemas IoT a la vez que requiere muy pocos recursos en cuanto a ancho de banda de las comunicaciones, tamaño de memoria y potencia de procesamiento