Search CORE

6,890 research outputs found

TiFi: Taxonomy Induction for Fictional Domains [Extended version]

Author: Chu C.
Razniewski S.
Weikum G.
Publication venue
Publication date: 01/01/2019
Field of study

Taxonomies are important building blocks of structured knowledge bases, and their construction from text sources and Wikipedia has received much attention. In this paper we focus on the construction of taxonomies for fictional domains, using noisy category systems from fan wikis or text extraction as input. Such fictional domains are archetypes of entity universes that are poorly covered by Wikipedia, such as also enterprise-specific knowledge bases or highly specialized verticals. Our fiction-targeted approach, called TiFi, consists of three phases: (i) category cleaning, by identifying candidate categories that truly represent classes in the domain of interest, (ii) edge cleaning, by selecting subcategory relationships that correspond to class subsumption, and (iii) top-level construction, by mapping classes onto a subset of high-level WordNet categories. A comprehensive evaluation shows that TiFi is able to construct taxonomies for a diverse range of fictional domains such as Lord of the Rings, The Simpsons or Greek Mythology with very high precision and that it outperforms state-of-the-art baselines for taxonomy induction by a substantial margin

MPG.PuRe

Web 2.0, language resources and standards to automatically build a multilingual named entity lexicon

Author: Ferrández Sergio
Monachini Monica
Muñoz Rafael
Toral Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 17/06/2011
Field of study

This paper proposes to advance in the current state-of-the-art of automatic Language Resource (LR) building by taking into consideration three elements: (i) the knowledge available in existing LRs, (ii) the vast amount of information available from the collaborative paradigm that has emerged from the Web 2.0 and (iii) the use of standards to improve interoperability. We present a case study in which a set of LRs for diﬀerent languages (WordNet for English and Spanish and Parole-Simple-Clips for Italian) are extended with Named Entities (NE) by exploiting Wikipedia and the aforementioned LRs. The practical result is a multilingual NE lexicon connected to these LRs and to two ontologies: SUMO and SIMPLE. Furthermore, the paper addresses an important problem which aﬀects the Computational Linguistics area in the present, interoperability, by making use of the ISO LMF standard to encode this lexicon. The diﬀerent steps of the procedure (mapping, disambiguation, extraction, NE identiﬁcation and postprocessing) are comprehensively explained and evaluated. The resulting resource contains 974,567, 137,583 and 125,806 NEs for English, Spanish and Italian respectively. Finally, in order to check the usefulness of the constructed resource, we apply it into a state-of-the-art Question Answering system and evaluate its impact; the NE lexicon improves the system’s accuracy by 28.1%. Compared to previous approaches to build NE repositories, the current proposal represents a step forward in terms of automation, language independence, amount of NEs acquired and richness of the information represented

DCU Online Research Access Service

Recommended from our members

Linking Data Across Universities: An Integrated Video Lectures Dataset

Author: d'Aquin Mathieu
Fernandez Miriam
Motta Enrico
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

This paper presents our work and experience interlinking educational information across universities through the use of Linked Data principles and technologies. More specifically this paper is focused on selecting, extracting, structuring and interlinking information of video lectures produced by 27 different educational institutions. For this purpose, selected information from several websites and YouTube channels have been scraped and structured according to well-known vocabularies, like FOAF 1, or the W3C Ontology for Media Resources 2. To integrate this information, the extracted videos have been categorized under a common classification space, the taxonomy defined by the Open Directory Project 3. An evaluation of this categorization process has been conducted obtaining a 98% degree of coverage and 89% degree of correctness. As a result of this process a new Linked Data dataset has been released containing more than 14,000 video lectures from 27 different institutions and categorized under a common classification scheme

Open Research Online (The Open University)

A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

Author: A. Ballatore
A. Buccella
A. Burton-Jones
A. Gangemi
A. Gore
A. Gómez-Pérez
A. Polleres
A. Schwering
A. Turner
B. Smith
C. Bizer
C. Jones
C. Keßler
C. Keßler
C. Manning
C.B. Jones
D. Buscaldi
D. Coleman
D. Nadeau
D. Strasunskas
D. Sui
F. Baader
F. Fonseca
F. Giunchiglia
F. Giunchiglia
F. Harvey
F.. Gey
F.J. Lopez-Pellicer
G. Bordogna
G. Fu
G. Tré De
G. Weikum
J. Giles
J. Goodwin
J. Howe
J. Leveling
K. Janowicz
K. Janowicz
L. Vaccari
L.L. Hill
M. Egenhofer
M. Goodchild
M. Goodchild
M. Grassi
M. Haklay
M. Haklay
M. Haklay
M. Kitsuregawa
M. Lutz
N. Choi
N. Guarino
N. Guarino
P. Burrough
P. Magnus
P. Roget
P. Singh
P.D. Smart
R. Fouad
R. Rada
S. Auer
S. Auer
S. Freitas
S. Hahmann
S. Overell
S. Schade
S. Staab
S. Vaid
S. Winter
T. Berners-Lee
T. Mandl
T. Mandl
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

arXiv.org e-Print Archive

Crossref

TechMiner: Extracting Technologies from Academic Publications

Author: A Bandrowski
C Bizer
C Fellbaum
F Osborne
F Osborne
F Ronzano
K Scanning Douw
P Corbett
R Usbeck
S Peroni
T Groza
W Huang
Publication venue
Publication date: 01/01/2016
Field of study

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

Crossref

Online Research @ Cardiff

Open Research Online (The Open University)

An effective, low-cost measure of semantic relatedness obtained from Wikipedia links

Author: Milne David N.
Witten Ian H.
Publication venue: AAAI Press
Publication date: 01/01/2008
Field of study

This paper describes a new technique for obtaining measures of semantic relatedness. Like other recent approaches, it uses Wikipedia to provide structured world knowledge about the terms of interest. Out approach is unique in that it does so using the hyperlink structure of Wikipedia rather than its category hierarchy or textual content. Evaluation with manually defined measures of semantic relatedness reveals this to be an effective compromise between the ease of computation of the former approach and the accuracy of the latter

CiteSeerX

Research Commons@Waikato

Extracting common sense knowledge from Wikipedia

Author: Halpin Harry
Klein Ewan
Suh Sangweon
Publication venue
Publication date: 01/01/2006
Field of study

Edinburgh Research Explorer