Search CORE

1,315 research outputs found

CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

Author: Jain Prince
Talukdar Partha
Vashishth Shikhar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/01/2019
Field of study

Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.Comment: Accepted at WWW 201

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

XML Matchers: approaches and challenges

Author: Agreste Santa
De Meo Pasquale
Ferrara Emilio
Ursino Domenico
Publication venue: 'Elsevier BV'
Publication date: 10/07/2014
Field of study

Schema Matching, i.e. the process of discovering semantic correspondences between concepts adopted in different data source schemas, has been a key topic in Database and Artificial Intelligence research areas for many years. In the past, it was largely investigated especially for classical database models (e.g., E/R schemas, relational databases, etc.). However, in the latest years, the widespread adoption of XML in the most disparate application fields pushed a growing number of researchers to design XML-specific Schema Matching approaches, called XML Matchers, aiming at finding semantic matchings between concepts defined in DTDs and XSDs. XML Matchers do not just take well-known techniques originally designed for other data models and apply them on DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical structure of a DTD/XSD) to improve the performance of the Schema Matching process. The design of XML Matchers is currently a well-established research area. The main goal of this paper is to provide a detailed description and classification of XML Matchers. We first describe to what extent the specificities of DTDs/XSDs impact on the Schema Matching task. Then we introduce a template, called XML Matcher Template, that describes the main components of an XML Matcher, their role and behavior. We illustrate how each of these components has been implemented in some popular XML Matchers. We consider our XML Matcher Template as the baseline for objectively comparing approaches that, at first glance, might appear as unrelated. The introduction of this template can be useful in the design of future XML Matchers. Finally, we analyze commercial tools implementing XML Matchers and introduce two challenging issues strictly related to this topic, namely XML source clustering and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure

arXiv.org e-Print Archive

IRIS UniversitÃ Politecnica delle Marche

Recommended from our members

Detecting Personal Life Events from Social Media

Author: Dickinson Thomas Kier
Publication venue
Publication date: 22/11/2019
Field of study

Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves. But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online. The advantages of this type of technology also have direct merits within marketing, allowing companies to target customers with better products. We also suggest that the techniques and methodologies built throughout this thesis also have opportunities to support research within other areas such as cyber bullying, and radicalisation detection. The aim of this thesis is to build upon the under researched area of life event detection, specifically targeting Twitter, and Instagram. Our goal is to develop classifiers that identify a list of life events inspired by cognitive psychology, where we target a total of seven within this thesis. To achieve this we look to answer three research questions covered in each of our empirical chapters. In our first empirical chapter, we ask; What features would improve the classification of important life events. To answer this, we look at first extracting a new dataset from Twitter targeting the following events: Getting Married, Having Children, Starting School, Falling in Love, and Death of a Parent. We look at three new feature sets: interactions, content, and semantic features, and compare against a current state of the art technique. In our second empirical chapter, we draw inspiration from cheminformatics, and frequent sub-graph mining to ask; Could the inclusion of semantic and syntactic patterns improve performance in our life event classifier. Here we look at expanding our tweets into semantic networks, as well as consider two forms of syntactic relationships between tokens. We then mine for frequent sub-graphs amongst our tweet graphs, and use these as features in our classifier. Our results produce F1 scores of between 0.65 and 0.77, providing an improvement between 0.01 and 0.04 against the current state of the art. In our final empirical chapter, we look to answer our third research question; How can we detect important life events from other social media sites, such as Instagram?. We ask this question, as we believe Instagram to be a preferred environment to share personal life events. In this chapter, we extract a new dataset, targeting the following events: Getting Married, Having Children, Starting School, Graduation, and Buying a House. Our results find that our methodology provides F1 scores between 0.78, and 0.82, an improvement in F1 score between 0.01 and 0.04 against the current state of the art

Open Research Online (The Open University)

Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness

Author: Schockaert Steven
Shah Julie A.
Zhou Yilun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/02/2019
Field of study

In many applications, it is important to characterize the way in which two concepts are semantically related. Knowledge graphs such as ConceptNet provide a rich source of information for such characterizations by encoding relations between concepts as edges in a graph. When two concepts are not directly connected by an edge, their relationship can still be described in terms of the paths that connect them. Unfortunately, many of these paths are uninformative and noisy, which means that the success of applications that use such path features crucially relies on their ability to select high-quality paths. In existing applications, this path selection process is based on relatively simple heuristics. In this paper we instead propose to learn to predict path quality from crowdsourced human assessments. Since we are interested in a generic task-independent notion of quality, we simply ask human participants to rank paths according to their subjective assessment of the paths' naturalness, without attempting to define naturalness or steering the participants towards particular indicators of quality. We show that a neural network model trained on these assessments is able to predict human judgments on unseen paths with near optimal performance. Most notably, we find that the resulting path selection method is substantially better than the current heuristic approaches at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201

arXiv.org e-Print Archive

DSpace@MIT

Knowledge-based identification of music suited for places of interest

Author: Cantador Iván
Fernández-Tobías Ignacio
Kaminskas Marius
Ricci Francesco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/s40558-014-0004-xPlace is a notion closely linked with the wealth of human experience, and invested by values, attitudes, and cultural influences. In particular, many places are strongly related to music, which contributes to shaping the perception and meaning of a place. In this paper we propose a computational approach to identify musicians and music suited for a place of interest (POI)––which is based on a knowledge-based framework built upon the DBpedia ontology––and a graph-based algorithm that scores musicians with respect to their semantic relatedness with a POI and suggests the top scoring ones. Through empirical experiments we show that users appreciate and judge the musician recommendations generated by the proposed approach as valuable, and perceive compositions of the suggested musicians as suited for the POIs.This work was supported by the Spanish Government (TIN201128538C02) and the Regional Government of Madrid (S2009TIC1542)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo

An analysis and comparison of predominant word sense disambiguation algorithms

Author: Craggs David J
Publication venue: Edith Cowan University, Research Online, Perth, Western Australia
Publication date: 01/01/2011
Field of study

This thesis investigates research performed in the area of natural language processing. It is the aim of this research to compare a selection of predominant word sense disambiguation algorithms, and also determine if they can be optimised by small changes to the parameters used by the algorithms. To perform this research, several word sense disambiguation algorithms will be implemented in Java, and run on a range of test corpora. The algorithms will be judged on metrics such as speed and accuracy, and any other results obtained; while an algorithm may be fast and accurate, there may be other factors making it less desirable. Finally, to demonstrate the purpose and usefulness of using better algorithms, the algorithms will be used in conjunction with a real world application. Five algorithms were used in this research: The standard Lesk algorithm, the simplified Lesk algorithm, a Lesk algorithm variant using hypernyms, a Lesk algorithm variant using synonyms, and a baseline performance algorithm. While the baseline algorithm should have been less accurate than the other algorithms, testing found that it could disambiguate words more accurately than any of the other algorithms, seemingly because the baseline makes use of statistical data in WordNet, the machine readable dictionary used for testing; data unable to be used by the other algorithms. However, with a few modifications, the Simplified Lesk algorithm was able to reach performance just a few percent lower than that of the baseline algorithm. It is the aim of this research to apply word sense disambiguation to automatic concept mapping, to determine if more accurate algorithms are able to display noticeably better results in a real world application. It was found in testing, that the overall accuracy of the algorithm had little effect on the quality of concept maps produced, but rather depended on the text being examined

Research Online @ ECU

Semantic Measures for Keywords Extraction

Author: A Lieto
HP Edmundson
L Jean-Louis
MH Haggag
R Navigli
SR El-Beltagy
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Institutional Research Information System University of Turin

Natural Language Processing in-and-for Design Research

Author: Blessing Lucienne T. M.
Luo Jianxi
Siddharth L
Publication venue
Publication date: 27/11/2021
Field of study

We review the scholarly contributions that utilise Natural Language Processing (NLP) methods to support the design process. Using a heuristic approach, we collected 223 articles published in 32 journals and within the period 1991-present. We present state-of-the-art NLP in-and-for design research by reviewing these articles according to the type of natural language text sources: internal reports, design concepts, discourse transcripts, technical publications, consumer opinions, and others. Upon summarizing and identifying the gaps in these contributions, we utilise an existing design innovation framework to identify the applications that are currently being supported by NLP. We then propose a few methodological and theoretical directions for future NLP in-and-for design research

arXiv.org e-Print Archive