10,207 research outputs found
Using Global Constraints and Reranking to Improve Cognates Detection
Global constraints and reranking have not been used in cognates detection
research to date. We propose methods for using global constraints by performing
rescoring of the score matrices produced by state of the art cognates detection
systems. Using global constraints to perform rescoring is complementary to
state of the art methods for performing cognates detection and results in
significant performance improvements beyond current state of the art
performance on publicly available datasets with different language pairs and
various conditions such as different levels of baseline state of the art
performance and different data size conditions, including with more realistic
large data size conditions than have been evaluated with in the past.Comment: 10 pages, 6 figures, 6 tables; published in the Proceedings of the
55th Annual Meeting of the Association for Computational Linguistics, pages
1983-1992, Vancouver, Canada, July 201
Cross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and
argumentative structure in documents. Most previous research has focused on
inducing and evaluating models from the English RST Discourse Treebank.
However, discourse treebanks for other languages exist, including Spanish,
German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same
underlying linguistic theory, but differ slightly in the way documents are
annotated. In this paper, we present (a) a new discourse parser which is
simpler, yet competitive (significantly better on 2/3 metrics) to state of the
art for English, (b) a harmonization of discourse treebanks across languages,
enabling us to present (c) what to the best of our knowledge are the first
experiments on cross-lingual discourse parsing.Comment: To be published in EACL 2017, 13 page
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
Animacy in early New Zealand english
The literature suggests that animacy effects in present-day spoken New Zealand English (NZE) differ from animacy effects in other varieties of English. We seek to determine if such differences have a history in earlier NZE writing or not. We revisit two grammatical phenomena — progressives and genitives — that are well known to be sensitive to animacy effects, and we study these phenomena in corpora sampling 19th- and early 20th-century written NZE; for reference purposes, we also study parallel samples of 19th- and early 20th-century British English and American English. We indeed find significant regional differences between early New Zealand writing and the other varieties in terms of the effect that animacy has on the frequency and probabilities of grammatical phenomena
Beyond aspect: will be -ing and shall be -ing
This article discusses the synchronic status and diachronic development of will be -ing and shall be -ing (as in I’ll be leaving at noon).2 Although available since at least Middle English, the constructions did not establish a significant foothold in standard English until the twentieth century. Both types are also more prevalent in British English (BrE) than American English (AmE).
We argue that in present-day usage will/shall be -ing are aspectually underspecified: instances that clearly construe a situation as future-in-progress are in the minority. Similarly, although volition-neutrality has been identified as a key feature of will/shall be -ing, it is important to take account of other, generally richer meanings and associations, notably ‘future-as-matter-of-course’ (Leech 2004), ‘already-decided future’ (Huddleston & Pullum et al. 2002) and non-agentivity. Like volition-neutrality, these characteristics appear to be relevant not only in contemporary use, but also in their historical expansion. We show that the construction has evolved from progressive aspect towards more subjectivised evidential meaning
- …