55,775 research outputs found
Learning Correlations between Linguistic Indicators and Semantic Constraints: Reuse of Context-Dependent Descriptions of Entities
This paper presents the results of a study on the semantic constraints
imposed on lexical choice by certain contextual indicators. We show how such
indicators are computed and how correlations between them and the choice of a
noun phrase description of a named entity can be automatically established
using supervised learning. Based on this correlation, we have developed a
technique for automatic lexical choice of descriptions of entities in text
generation. We discuss the underlying relationship between the pragmatics of
choosing an appropriate description that serves a specific purpose in the
automatically generated text and the semantics of the description itself. We
present our work in the framework of the more general concept of reuse of
linguistic structures that are automatically extracted from large corpora. We
present a formal evaluation of our approach and we conclude with some thoughts
on potential applications of our method.Comment: 7 pages, uses colacl.sty and acl.bst, uses epsfig. To appear in the
Proceedings of the Joint 17th International Conference on Computational
Linguistics 36th Annual Meeting of the Association for Computational
Linguistics (COLING-ACL'98
Building a Generation Knowledge Source using Internet-Accessible Newswire
In this paper, we describe a method for automatic creation of a knowledge
source for text generation using information extraction over the Internet. We
present a prototype system called PROFILE which uses a client-server
architecture to extract noun-phrase descriptions of entities such as people,
places, and organizations. The system serves two purposes: as an information
extraction tool, it allows users to search for textual descriptions of
entities; as a utility to generate functional descriptions (FD), it is used in
a functional-unification based generation system. We present an evaluation of
the approach and its applications to natural language generation and
summarization.Comment: 8 pages, uses eps
Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human intervention such as manually labeling training data or creating gazetteers. Second, the system can handle more than the three classical named-entity types (person, location, and organization). We describe the system’s architecture and compare its performance with a supervised system. We experimentally evaluate the system on a standard corpus, with the three classical named-entity types, and also on a new corpus, with a new named-entity type (car brands)
News Cohesiveness: an Indicator of Systemic Risk in Financial Markets
Motivated by recent financial crises significant research efforts have been
put into studying contagion effects and herding behaviour in financial markets.
Much less has been said about influence of financial news on financial markets.
We propose a novel measure of collective behaviour in financial news on the
Web, News Cohesiveness Index (NCI), and show that it can be used as a systemic
risk indicator. We evaluate the NCI on financial documents from large Web news
sources on a daily basis from October 2011 to July 2013 and analyse the
interplay between financial markets and financially related news. We
hypothesized that strong cohesion in financial news reflects movements in the
financial markets. Cohesiveness is more general and robust measure of systemic
risk expressed in news, than measures based on simple occurrences of specific
terms. Our results indicate that cohesiveness in the financial news is highly
correlated with and driven by volatility on the financial markets
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
- …