1,315 research outputs found
CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information
Open Information Extraction (OpenIE) methods extract (noun phrase, relation
phrase, noun phrase) triples from text, resulting in the construction of large
Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in
such Open KBs are not canonicalized, leading to the storage of redundant and
ambiguous facts. Recent research has posed canonicalization of Open KBs as
clustering over manuallydefined feature spaces. Manual feature engineering is
expensive and often sub-optimal. In order to overcome this challenge, we
propose Canonicalization using Embeddings and Side Information (CESI) - a novel
approach which performs canonicalization over learned embeddings of Open KBs.
CESI extends recent advances in KB embedding by incorporating relevant NP and
relation phrase side information in a principled manner. Through extensive
experiments on multiple real-world datasets, we demonstrate CESI's
effectiveness.Comment: Accepted at WWW 201
XML Matchers: approaches and challenges
Schema Matching, i.e. the process of discovering semantic correspondences
between concepts adopted in different data source schemas, has been a key topic
in Database and Artificial Intelligence research areas for many years. In the
past, it was largely investigated especially for classical database models
(e.g., E/R schemas, relational databases, etc.). However, in the latest years,
the widespread adoption of XML in the most disparate application fields pushed
a growing number of researchers to design XML-specific Schema Matching
approaches, called XML Matchers, aiming at finding semantic matchings between
concepts defined in DTDs and XSDs. XML Matchers do not just take well-known
techniques originally designed for other data models and apply them on
DTDs/XSDs, but they exploit specific XML features (e.g., the hierarchical
structure of a DTD/XSD) to improve the performance of the Schema Matching
process. The design of XML Matchers is currently a well-established research
area. The main goal of this paper is to provide a detailed description and
classification of XML Matchers. We first describe to what extent the
specificities of DTDs/XSDs impact on the Schema Matching task. Then we
introduce a template, called XML Matcher Template, that describes the main
components of an XML Matcher, their role and behavior. We illustrate how each
of these components has been implemented in some popular XML Matchers. We
consider our XML Matcher Template as the baseline for objectively comparing
approaches that, at first glance, might appear as unrelated. The introduction
of this template can be useful in the design of future XML Matchers. Finally,
we analyze commercial tools implementing XML Matchers and introduce two
challenging issues strictly related to this topic, namely XML source clustering
and uncertainty management in XML Matchers.Comment: 34 pages, 8 tables, 7 figure
Recommended from our members
Detecting Personal Life Events from Social Media
Social media has become a dominating force over the past 15 years, with the rise of sites such as Facebook, Instagram, and Twitter. Some of us have been with these sites since the start, posting all about our personal lives and building up a digital identify of ourselves.
But within this myriad of posts, what actually matters to us, and what do our digital identities tell people about ourselves? One way that we can start to filter through this data, is to build classifiers that can identify posts about our personal life events, allowing us to start to self reflect on what we share online.
The advantages of this type of technology also have direct merits within marketing, allowing companies to target customers with better products. We also suggest that the techniques and methodologies built throughout this thesis also have opportunities to support research within other areas such as cyber bullying, and radicalisation detection.
The aim of this thesis is to build upon the under researched area of life event detection, specifically targeting Twitter, and Instagram. Our goal is to develop classifiers that identify a list of life events inspired by cognitive psychology, where we target a total of seven within this thesis.
To achieve this we look to answer three research questions covered in each of our empirical chapters. In our first empirical chapter, we ask; What features would improve the classification of important life events. To answer this, we look at first extracting a new dataset from Twitter targeting the following events: Getting Married, Having Children, Starting School, Falling in Love, and Death of a Parent. We look at three new feature sets: interactions, content, and semantic features, and compare against a current state of the art technique.
In our second empirical chapter, we draw inspiration from cheminformatics, and frequent sub-graph mining to ask; Could the inclusion of semantic and syntactic patterns improve performance in our life event classifier. Here we look at expanding our tweets into semantic networks, as well as consider two forms of syntactic relationships between tokens. We then mine for frequent sub-graphs amongst our tweet graphs, and use these as features in our classifier. Our results produce F1 scores of between 0.65 and 0.77, providing an improvement between 0.01 and 0.04 against the current state of the art.
In our final empirical chapter, we look to answer our third research question; How can we detect important life events from other social media sites, such as Instagram?. We ask this question, as we believe Instagram to be a preferred environment to share personal life events. In this chapter, we extract a new dataset, targeting the following events: Getting Married, Having Children, Starting School, Graduation, and Buying a House. Our results find that our methodology provides F1 scores between 0.78, and 0.82, an improvement in F1 score between 0.01 and 0.04 against the current state of the art
Predicting ConceptNet Path Quality Using Crowdsourced Assessments of Naturalness
In many applications, it is important to characterize the way in which two
concepts are semantically related. Knowledge graphs such as ConceptNet provide
a rich source of information for such characterizations by encoding relations
between concepts as edges in a graph. When two concepts are not directly
connected by an edge, their relationship can still be described in terms of the
paths that connect them. Unfortunately, many of these paths are uninformative
and noisy, which means that the success of applications that use such path
features crucially relies on their ability to select high-quality paths. In
existing applications, this path selection process is based on relatively
simple heuristics. In this paper we instead propose to learn to predict path
quality from crowdsourced human assessments. Since we are interested in a
generic task-independent notion of quality, we simply ask human participants to
rank paths according to their subjective assessment of the paths' naturalness,
without attempting to define naturalness or steering the participants towards
particular indicators of quality. We show that a neural network model trained
on these assessments is able to predict human judgments on unseen paths with
near optimal performance. Most notably, we find that the resulting path
selection method is substantially better than the current heuristic approaches
at identifying meaningful paths.Comment: In Proceedings of the Web Conference (WWW) 201
Knowledge-based identification of music suited for places of interest
The final publication is available at Springer via http://dx.doi.org/10.1007/s40558-014-0004-xPlace is a notion closely linked with the wealth of human experience, and invested by values, attitudes, and cultural influences. In particular, many places are strongly related to music, which contributes to shaping the perception and meaning of a place. In this paper we propose a computational approach to identify musicians and music suited for a place of interest (POI)––which is based on a knowledge-based framework built upon the DBpedia ontology––and a graph-based algorithm that scores musicians with respect to their semantic relatedness with a POI and suggests the top scoring ones. Through empirical experiments we show that users appreciate and judge the musician recommendations generated by the proposed approach as valuable, and perceive compositions of the suggested musicians as suited for the POIs.This work was supported by the Spanish Government (TIN201128538C02)
and the
Regional Government of Madrid (S2009TIC1542)
An analysis and comparison of predominant word sense disambiguation algorithms
This thesis investigates research performed in the area of natural language processing. It is the aim of this research to compare a selection of predominant word sense disambiguation algorithms, and also determine if they can be optimised by small changes to the parameters used by the algorithms. To perform this research, several word sense disambiguation algorithms will be implemented in Java, and run on a range of test corpora. The algorithms will be judged on metrics such as speed and accuracy, and any other results obtained; while an algorithm may be fast and accurate, there may be other factors making it less desirable. Finally, to demonstrate the purpose and usefulness of using better algorithms, the algorithms will be used in conjunction with a real world application. Five algorithms were used in this research: The standard Lesk algorithm, the simplified Lesk algorithm, a Lesk algorithm variant using hypernyms, a Lesk algorithm variant using synonyms, and a baseline performance algorithm. While the baseline algorithm should have been less accurate than the other algorithms, testing found that it could disambiguate words more accurately than any of the other algorithms, seemingly because the baseline makes use of statistical data in WordNet, the machine readable dictionary used for testing; data unable to be used by the other algorithms. However, with a few modifications, the Simplified Lesk algorithm was able to reach performance just a few percent lower than that of the baseline algorithm. It is the aim of this research to apply word sense disambiguation to automatic concept mapping, to determine if more accurate algorithms are able to display noticeably better results in a real world application. It was found in testing, that the overall accuracy of the algorithm had little effect on the quality of concept maps produced, but rather depended on the text being examined
Natural Language Processing in-and-for Design Research
We review the scholarly contributions that utilise Natural Language
Processing (NLP) methods to support the design process. Using a heuristic
approach, we collected 223 articles published in 32 journals and within the
period 1991-present. We present state-of-the-art NLP in-and-for design research
by reviewing these articles according to the type of natural language text
sources: internal reports, design concepts, discourse transcripts, technical
publications, consumer opinions, and others. Upon summarizing and identifying
the gaps in these contributions, we utilise an existing design innovation
framework to identify the applications that are currently being supported by
NLP. We then propose a few methodological and theoretical directions for future
NLP in-and-for design research
- …