86,983 research outputs found
Extracting Temporal and Causal Relations between Events
Structured information resulting from temporal information processing is
crucial for a variety of natural language processing tasks, for instance to
generate timeline summarization of events from news documents, or to answer
temporal/causal-related questions about some events. In this thesis we present
a framework for an integrated temporal and causal relation extraction system.
We first develop a robust extraction component for each type of relations, i.e.
temporal order and causality. We then combine the two extraction components
into an integrated relation extraction system, CATENA---CAusal and Temporal
relation Extraction from NAtural language texts---, by utilizing the
presumption about event precedence in causality, that causing events must
happened BEFORE resulting events. Several resources and techniques to improve
our relation extraction systems are also discussed, including word embeddings
and training data expansion. Finally, we report our adaptation efforts of
temporal information processing for languages other than English, namely
Italian and Indonesian.Comment: PhD Thesi
The Development of a Temporal Information Dictionary for Social Media Analytics
Dictionaries have been used to analyze text even before the emergence of social media and the use of dictionaries for sentiment analysis there. While dictionaries have been used to understand the tonality of text, so far it has not been possible to automatically detect if the tonality refers to the present, past, or future. In this research, we develop a dictionary containing time-indicating words in a wordlist (T-wordlist). To test how the dictionary performs, we apply our T-wordlist on different disaster related social media datasets. Subsequently we will validate the wordlist and results by a manual content analysis. So far, in this research-in-progress, we were able to develop a first dictionary and will also provide some initial insight into the performance of our wordlist
Italian Event Detection Goes Deep Learning
This paper reports on a set of experiments with different word embeddings to
initialize a state-of-the-art Bi-LSTM-CRF network for event detection and
classification in Italian, following the EVENTI evaluation exercise. The net-
work obtains a new state-of-the-art result by improving the F1 score for
detection of 1.3 points, and of 6.5 points for classification, by using a
single step approach. The results also provide further evidence that embeddings
have a major impact on the performance of such architectures.Comment: to appear at CLiC-it 201
Fast Data in the Era of Big Data: Twitter's Real-Time Related Query Suggestion Architecture
We present the architecture behind Twitter's real-time related query
suggestion and spelling correction service. Although these tasks have received
much attention in the web search literature, the Twitter context introduces a
real-time "twist": after significant breaking news events, we aim to provide
relevant results within minutes. This paper provides a case study illustrating
the challenges of real-time data processing in the era of "big data". We tell
the story of how our system was built twice: our first implementation was built
on a typical Hadoop-based analytics stack, but was later replaced because it
did not meet the latency requirements necessary to generate meaningful
real-time results. The second implementation, which is the system deployed in
production, is a custom in-memory processing engine specifically designed for
the task. This experience taught us that the current typical usage of Hadoop as
a "big data" platform, while great for experimentation, is not well suited to
low-latency processing, and points the way to future work on data analytics
platforms that can handle "big" as well as "fast" data
- …