Search CORE

489 research outputs found

A Recurrent Neural Model with Attention for the Recognition of Chinese Implicit Discourse Relations

Author: Chiarcos Christian
Rönnqvist Samuel
Schenk Niko
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2017
Field of study

We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model's ability to selectively focus on the relevant parts of an input sequence.Comment: To appear at ACL2017, code available at https://github.com/sronnqvist/discourse-ablst

arXiv.org e-Print Archive

Crossref

Translation inference by concept propagation

Author: Chiarcos Christian
Fäth Christian
Schenk Niko
Publication venue
Publication date: 25/04/2023
Field of study

This paper describes our contribution to the Third Shared Task on Translation Inference across Dictionaries (TIAD-2020). We describe an approach on translation inference based on symbolic methods, the propagation of concepts over a graph of interconnected dictionaries: Given a mapping from source language words to lexical concepts (e.g., synsets) as a seed, we use bilingual dictionaries to extrapolate a mapping of pivot and target language words to these lexical concepts. Translation inference is then performed by looking up the lexical concept(s) of a source language word and returning the target language word(s) for which these lexical concepts have the respective highest score. We present two instantiations of this system: One using WordNet synsets as concepts, and one using lexical entries (translations) as concepts. With a threshold of 0, the latter configuration is the second among participant systems in terms of F1 score. We also describe additional evaluation experiments on Apertium data, a comparison with an earlier approach based on embedding projection, and an approach for constrained projection that outperforms the TIAD-2020 vanilla system by a large margin

OPUS Augsburg

CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation

Author: Chiarcos Christian
Schenk Niko
Publication venue: OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019)
Publication date: 01/01/2019
Field of study

The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology

Dagstuhl Research Online Publication Server

Memory-based acquisition of argument structures and its application to implicit role detection

Author: Chiarcos Christian
Schenk Niko
Publication venue
Publication date: 01/01/2015
Field of study

OPUS Augsburg

Crossref

Unsupervised learning of prototypical fillers for implicit semantic role labeling

Author: Chiarcos Christian
Schenk Niko
Publication venue
Publication date: 01/01/2016
Field of study

OPUS Augsburg

Crossref

Resource-lean modeling of coherence in commonsense stories

Author: Chiarcos Christian
Schenk Niko
Publication venue
Publication date: 01/01/2017
Field of study

We present a resource-lean neural recognizer for modeling coherence in commonsense stories. Our lightweight system is inspired by successful attempts to modeling discourse relations and stands out due to its simplicity and easy optimization compared to prior approaches to narrative script learning. We evaluate our approach in the Story Cloze Test demonstrating an absolute improvement in accuracy of 4.7% over state-of-the-art implementations

OPUS Augsburg

Crossref

CoNLL-Merge: efficient harmonization of concurrent tokenization and textual variation

Author: Chiarcos Christian
Schenk Nico
Publication venue
Publication date: 27/04/2023
Field of study

OPUS Augsburg

The ACoLi CoNLL libraries: beyond tab-separated values

Author: Chiarcos Christian
Schenk Nico
Publication venue
Publication date: 27/04/2023
Field of study

We introduce the ACoLi CoNLL libraries, a set of Java archives to facilitate advanced manipulations of corpora annotated in TSV formats, including all members of the CoNLL format family. In particular, we provide means for (i) rule-based re-write operations, (ii) visualization and manual annotation, (iii) merging CoNLL files, and (iv) data base support. The ACoLi CoNLL libraries provide command-line interface to these functionalities. The following aspects are technologically innovative and exceed beyond the state of the art: We support every OWPL (one word per line) corpus format with tab-separated columns, whereas most existing tools are specific to one particular CoNLL dialect. We employ established W3C standards for rule-based graph rewriting operations on CoNLL sentences. We provide means for the heuristic, but fully automated merging of CoNLL annotations of the same textual content, in particular for resolving conflicting tokenizations. We demonstrate the usefulness and practicability of our proposed CoNLL libraries on well-established data sets of the Universal Dependency corpus and the Penn Treebank

OPUS Augsburg

A minimalist approach to shallow discourse parsing and implicit relation recognition

Author: Chiarcos Christian
Schenk Niko
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

OPUS Augsburg

Crossref

A recurrent neural model with attention for the recognition of Chinese implicit discourse relations

Author: Chiarcos Christian
Rönnqvist Samuel
Schenk Niko
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 27/04/2023
Field of study

We introduce an attention-based Bi-LSTM for Chinese implicit discourse relations and demonstrate that modeling argument pairs as a joint sequence can outperform word order-agnostic approaches. Our model benefits from a partial sampling scheme and is conceptually simple, yet achieves state-of-the-art performance on the Chinese Discourse Treebank. We also visualize its attention activity to illustrate the model’s ability to selectively focus on the relevant parts of an input sequence

OPUS Augsburg