1,308 research outputs found
Use of Wikipedia Categories in Entity Ranking
Wikipedia is a useful source of knowledge that has many applications in
language processing and knowledge representation. The Wikipedia category graph
can be compared with the class hierarchy in an ontology; it has some
characteristics in common as well as some differences. In this paper, we
present our approach for answering entity ranking queries from the Wikipedia.
In particular, we explore how to make use of Wikipedia categories to improve
entity ranking effectiveness. Our experiments show that using categories of
example entities works significantly better than using loosely defined target
categories
On the Impact of Entity Linking in Microblog Real-Time Filtering
Microblogging is a model of content sharing in which the temporal locality of
posts with respect to important events, either of foreseeable or unforeseeable
nature, makes applica- tions of real-time filtering of great practical
interest. We propose the use of Entity Linking (EL) in order to improve the
retrieval effectiveness, by enriching the representation of microblog posts and
filtering queries. EL is the process of recognizing in an unstructured text the
mention of relevant entities described in a knowledge base. EL of short pieces
of text is a difficult task, but it is also a scenario in which the information
EL adds to the text can have a substantial impact on the retrieval process. We
implement a start-of-the-art filtering method, based on the best systems from
the TREC Microblog track realtime adhoc retrieval and filtering tasks , and
extend it with a Wikipedia-based EL method. Results show that the use of EL
significantly improves over non-EL based versions of the filtering methods.Comment: 6 pages, 1 figure, 1 table. SAC 2015, Salamanca, Spain - April 13 -
17, 201
A Survey of Location Prediction on Twitter
Locations, e.g., countries, states, cities, and point-of-interests, are
central to news, emergency events, and people's daily lives. Automatic
identification of locations associated with or mentioned in documents has been
explored for decades. As one of the most popular online social network
platforms, Twitter has attracted a large number of users who send millions of
tweets on daily basis. Due to the world-wide coverage of its users and
real-time freshness of tweets, location prediction on Twitter has gained
significant attention in recent years. Research efforts are spent on dealing
with new challenges and opportunities brought by the noisy, short, and
context-rich nature of tweets. In this survey, we aim at offering an overall
picture of location prediction on Twitter. Specifically, we concentrate on the
prediction of user home locations, tweet locations, and mentioned locations. We
first define the three tasks and review the evaluation metrics. By summarizing
Twitter network, tweet content, and tweet context as potential inputs, we then
structurally highlight how the problems depend on these inputs. Each dependency
is illustrated by a comprehensive review of the corresponding strategies
adopted in state-of-the-art approaches. In addition, we also briefly review two
related problems, i.e., semantic location prediction and point-of-interest
recommendation. Finally, we list future research directions.Comment: Accepted to TKDE. 30 pages, 1 figur
Entity-Linking via Graph-Distance Minimization
Entity-linking is a natural-language-processing task that consists in
identifying the entities mentioned in a piece of text, linking each to an
appropriate item in some knowledge base; when the knowledge base is Wikipedia,
the problem comes to be known as wikification (in this case, items are
wikipedia articles). One instance of entity-linking can be formalized as an
optimization problem on the underlying concept graph, where the quantity to be
optimized is the average distance between chosen items. Inspired by this
application, we define a new graph problem which is a natural variant of the
Maximum Capacity Representative Set. We prove that our problem is NP-hard for
general graphs; nonetheless, under some restrictive assumptions, it turns out
to be solvable in linear time. For the general case, we propose two heuristics:
one tries to enforce the above assumptions and another one is based on the
notion of hitting distance; we show experimentally how these approaches perform
with respect to some baselines on a real-world dataset.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.7671. The second and third
authors were supported by the EU-FET grant NADINE (GA 288956
Using Wikipedia Categories and Links in Entity Ranking
This paper describes the participation of the INRIA group in the INEX 2007 XML entity ranking and ad hoc tracks. We developed a system for ranking Wikipedia entities in answer to a query. Our approach utilises the known categories, the link structure of Wikipedia, as well as the link co-occurrences with the examples (when provided) to improve the effectiveness of entity ranking. Our experiments on the training data set demonstrate that the use of categories and the link structure of Wikipedia, together with entity examples, can significantly improve entity retrieval effectiveness. We also use our system for the ad hoc tasks by inferring target categories from the title of the query. The results were worse than when using a full-text search engine, which confirms our hypothesis that ad hoc retrieval and entity retrieval are two different tasks
Knowledge Enabled Location Prediction of Twitter Users
As the popularity of online social networking sites such as Twitter and Facebook continues to rise, the volume of textual content generated on the web is increasing rapidly. The mining of user generated content in social media has proven effective in domains ranging from personalization and recommendation systems to crisis management. These applications stand to be further enhanced by incorporating information about the geo-position of social media users in their analysis. Due to privacy concerns, users are largely reluctant to share their location information. As a consequence of this, researchers have focused on automatic inferencing of location information from the contents of a user\u27s tweets. Existing approaches are purely data-driven and require large training data sets of geotagged tweets. Furthermore, these approaches rely solely on social media features or probabilistic language models and fail to capture the underlying semantics of the tweets. In this thesis, we propose a novel knowledge based approach that does not require any training data. Our approach uses Wikipedia, a crowd sourced knowledge base, to extract entities that are relevant to a location. We refer to these entities as local entities. Additionally, we score the relevance of each local entity with respect to the city, using the Wikipedia Hyperlink Graph. We predict the most likely location of the user by matching the scored entities of a city and the entities mentioned by users in their tweets. We evaluate our approach on a publicly available data set consisting of 5119 Twitter users across continental United States and show comparable accuracy to the state-of-the-art approaches. Our results demonstrate the ability to pinpoint the location of a Twitter user to a state and a city using Wikipedia, without needing to train a probabilistic model
Multiple Models for Recommending Temporal Aspects of Entities
Entity aspect recommendation is an emerging task in semantic search that
helps users discover serendipitous and prominent information with respect to an
entity, of which salience (e.g., popularity) is the most important factor in
previous work. However, entity aspects are temporally dynamic and often driven
by events happening over time. For such cases, aspect suggestion based solely
on salience features can give unsatisfactory results, for two reasons. First,
salience is often accumulated over a long time period and does not account for
recency. Second, many aspects related to an event entity are strongly
time-dependent. In this paper, we study the task of temporal aspect
recommendation for a given entity, which aims at recommending the most relevant
aspects and takes into account time in order to improve search experience. We
propose a novel event-centric ensemble ranking method that learns from multiple
time and type-dependent models and dynamically trades off salience and recency
characteristics. Through extensive experiments on real-world query logs, we
demonstrate that our method is robust and achieves better effectiveness than
competitive baselines.Comment: In proceedings of the 15th Extended Semantic Web Conference (ESWC
2018
- …