38,629 research outputs found
Temporal word embeddings for dynamic user profiling in Twitter
The research described in this paper focused on exploring
the domain of user profiling, a nascent and contentious technology which
has been steadily attracting increased interest from the research community as its potential for providing personalised digital services is realised.
An extensive review of related literature revealed that limited research
has been conducted into how temporal aspects of users can be captured
using user profiling techniques. This, coupled with the notable lack of
research into the use of word embedding techniques to capture temporal
variances in language, revealed an opportunity to extend the Random Indexing word embedding technique such that the interests of users could
be modelled based on their use of language. To achieve this, this work
concerned itself with extending an existing implementation of Temporal
Random Indexing to model Twitter users across multiple granularities of
time based on their use of language. The product of this is a novel technique for temporal user profiling, where a set of vectors is used to describe
the evolution of a Twitter user’s interests over time through their use of
language. The vectors produced were evaluated against a temporal implementation of another state-of-the-art word embedding technique, the
Word2Vec Dynamic Independent Skip-gram model, where it was found
that Temporal Random Indexing outperformed Word2Vec in the generation of temporal user profiles
Buzz monitoring in word space
This paper discusses the task of tracking mentions of some topically interesting textual entity from a continuously and dynamically changing flow of text, such as a news feed, the output from an Internet crawler or a similar text source - a task sometimes referred to as buzz monitoring. Standard approaches from the field of information access for identifying salient textual entities are reviewed, and it is argued that the dynamics of buzz monitoring calls for more accomplished analysis mechanisms than the typical text analysis tools provide today. The notion of word space is introduced, and it is argued that word spaces can be used to select the most salient markers for topicality, find associations those observations engender, and that they constitute an attractive foundation for building a representation well suited for the tracking and monitoring of mentions of the entity under consideration
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
A Formal Framework for Linguistic Annotation
`Linguistic annotation' covers any descriptive or analytic notations applied
to raw language data. The basic data may be in the form of time functions --
audio, video and/or physiological recordings -- or it may be textual. The added
notations may include transcriptions of all sorts (from phonetic features to
discourse structures), part-of-speech and sense tagging, syntactic analysis,
`named entity' identification, co-reference annotation, and so on. While there
are several ongoing efforts to provide formats and tools for such annotations
and to publish annotated linguistic databases, the lack of widely accepted
standards is becoming a critical problem. Proposed standards, to the extent
they exist, have focussed on file formats. This paper focuses instead on the
logical structure of linguistic annotations. We survey a wide variety of
existing annotation formats and demonstrate a common conceptual core, the
annotation graph. This provides a formal framework for constructing,
maintaining and searching linguistic annotations, while remaining consistent
with many alternative data structures and file formats.Comment: 49 page
The DIGMAP geo-temporal web gazetteer service
This paper presents the DIGMAP geo-temporal Web gazetteer service, a system providing access to names of places, historical periods, and associated geo-temporal information. Within the DIGMAP project, this gazetteer serves as the unified repository of geographic and temporal information, assisting in the recognition and disambiguation of geo-temporal expressions over text, as well as in resource searching and indexing. We describe the data integration methodology, the handling of temporal information and some of the applications that use the gazetteer. Initial evaluation results show that the proposed system can adequately support several tasks related to geo-temporal information extraction and retrieval
A logic programming framework for modeling temporal objects
Published versio
- …