Search CORE

595 research outputs found

Cross-Lingual Adaptation using Structural Correspondence Learning

Author: Prettenhofer Peter
Stein Benno
Publication venue
Publication date: 25/08/2010
Field of study

Cross-lingual adaptation, a special case of domain adaptation, refers to the transfer of classification knowledge between two languages. In this article we describe an extension of Structural Correspondence Learning (SCL), a recently proposed algorithm for domain adaptation, for cross-lingual adaptation. The proposed method uses unlabeled documents from both languages, along with a word translation oracle, to induce cross-lingual feature correspondences. From these correspondences a cross-lingual representation is created that enables the transfer of classification knowledge from the source to the target language. The main advantages of this approach over other approaches are its resource efficiency and task specificity. We conduct experiments in the area of cross-language topic and sentiment classification involving English as source language and German, French, and Japanese as target languages. The results show a significant improvement of the proposed method over a machine translation baseline, reducing the relative error due to cross-lingual adaptation by an average of 30% (topic classification) and 59% (sentiment classification). We further report on empirical analyses that reveal insights into the use of unlabeled data, the sensitivity with respect to important hyperparameters, and the nature of the induced cross-lingual correspondences

arXiv.org e-Print Archive

CiteSeerX

The Argument Reasoning Comprehension Task: Identification and Reconstruction of Implicit Warrants

Author: Gurevych Iryna
Habernal Ivan
Stein Benno
Wachsmuth Henning
Publication venue
Publication date: 01/01/2018
Field of study

Reasoning is a crucial part of natural language argumentation. To comprehend an argument, one must analyze its warrant, which explains why its claim follows from its premises. As arguments are highly contextualized, warrants are usually presupposed and left implicit. Thus, the comprehension does not only require language understanding and logic skills, but also depends on common sense. In this paper we develop a methodology for reconstructing warrants systematically. We operationalize it in a scalable crowdsourcing process, resulting in a freely licensed dataset with warrants for 2k authentic arguments from news comments. On this basis, we present a new challenging task, the argument reasoning comprehension task. Given an argument with a claim and a premise, the goal is to choose the correct implicit warrant from two options. Both warrants are plausible and lexically close, but lead to contradicting claims. A solution to this task will define a substantial step towards automatic warrant reconstruction. However, experiments with several neural attention and language models reveal that current approaches do not suffice.Comment: Accepted as NAACL 2018 Long Paper; see details on the front pag

arXiv.org e-Print Archive

TUbiblio

Crossref

Recommended from our members

Demanded Abstract Interpretation

Author: Stein Benno
Publication venue: University of Colorado Boulder
Publication date: 01/04/2022
Field of study

Formal static analysis is seeing increasingly widespread adoption as a tool for verificationand bug-finding, but even with powerful cloud infrastructure it can take minutes or hours for a developer to get analysis results after a code change. This dissertation considers the problem of making expressive and sophisticated static analyzers interactive by providing analysis results to developers in as close to real time as possible. While existing techniques offer some demand-driven or incremental aspects for certain classes of analysis, the fundamental challenge addressed by this work is doing both for abstract interpretation in arbitrary domains.This dissertation presents a technique, demanded abstract interpretation, that lifts analysiscomputations to a dependency graph structure in which incremental program edits and demand-driven evaluation of abstract semantics can be handled uniformly. Demanded abstract interpretation draws inspiration from graph-based approaches to incremental computation, and is not only sound and terminating but also from-scratch consistent with underlying batch analyses. The approach is parametric in the choice of abstract domain, supporting a wide range of analysis problems and enabling the reuse of highly-tuned existing domain implementations in our demanded analysis framework without requiring any per-domain reasoning about incrementality or demand. The complex, cyclic, and unbounded dependency structures that arise when analyzing loops and recursive control flow in an infinite-height domain are a key challenge, which our approach handles by dynamically extending novel acyclic encodings of such analysis computation.This dissertation describes and formalizes demanded abstract interpretation techniques forboth intraprocedural analysis and compositional interprocedural analysis. We also present promising experimental results in a prototype analysis implementation, and describe some extensions to the framework designed to confront practical resource constraints without sacrificing formal guarantees

CU Scholar Institutional Repository

Retrieval Models for Genre Classification

Author: Eissen Sven Meyer zu
Stein Benno
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2008
Field of study

Genre provides a characterization of a document with respect to its form or functional trait. Genre is orthogonal to topic, rendering genre information a powerful filter technology for information seekers in digital libraries. However, an efficient means for genre classification is an open and controversially discussed issue. This paper gives an overview and presents new results related to automatic genre classification of text documents. We present a comprehensive survey which contrasts the genre retrieval models that have been developed for Web and non-Web corpora. With the concept of genre-specific core vocabularies the paper provides an original contribution related to computational aspects and classification performance of genre retrieval models: we show how such vocabularies are acquired automatically and introduce new concentration measures that quantify the vocabulary distribution in a sensible way. Based on these findings we construct lightweight genre retrieval models and evaluate their discriminative power and computational efficiency. The presented concepts go beyond the existing utilization of vocabulary-centered, genre-revealing features and open new possibilities for the construction of genre classifiers that operate in real-time

CiteSeerX

AIS Electronic Library (AISeL)

A keyquery-based classification system for CORE

Author: Gollub Tim
Hagen Matthias
Stein Benno
Völske Michael
Publication venue
Publication date: 26/04/2017
Field of study

We apply keyquery-based taxonomy composition to compute a classification system for the CORE dataset, a shared crawl of about 850,000 scientific papers. Keyquery-based taxonomy composition can be understood as a two-phase hierarchical document clustering technique that utilizes search queries as cluster labels: In a first phase, the document collection is indexed by a reference search engine, and the documents are tagged with the search queries they are relevant—for their so-called keyqueries. In a second phase, a hierarchical clustering is formed from the keyqueries within an iterative process. We use the explicit topic model ESA as document retrieval model in order to index the CORE dataset in the reference search engine. Under the ESA retrieval model, documents are represented as vectors of similarities to Wikipedia articles; a methodology proven to be advantageous for text categorization tasks. Our paper presents the generated taxonomy and reports on quantitative properties such as document coverage and processing requirements

Online-Publikationssystem der Bauhaus-Universität Weimar

Digitale Bibliothek Thüringen

Paraphrase Acquisition from Image Captions

Author: Gohsen Marcel
Hagen Matthias
Potthast Martin
Stein Benno
Publication venue
Publication date: 15/02/2023
Field of study

We propose to use image captions from the Web as a previously underutilized resource for paraphrases (i.e., texts with the same "message") and to create and analyze a corresponding dataset. When an image is reused on the Web, an original caption is often assigned. We hypothesize that different captions for the same image naturally form a set of mutual paraphrases. To demonstrate the suitability of this idea, we analyze captions in the English Wikipedia, where editors frequently relabel the same image for different articles. The paper introduces the underlying mining technology, the resulting Wikipedia-IPC dataset, and compares known paraphrase corpora with respect to their syntactic and semantic paraphrase similarity to our new resource. In this context, we introduce characteristic maps along the two similarity dimensions to identify the style of paraphrases coming from different sources. An annotation study demonstrates the high reliability of the algorithmically determined characteristic maps

arXiv.org e-Print Archive

Differential Bias:On the Perceptibility of Stance Imbalance in Argumentation

Author: Al-Khatib Khalid
Palomino Alonso
Potthast Martin
Stein Benno
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/11/2022
Field of study

Most research on natural language processing treats bias as an absolute concept: Based on a (probably complex) algorithmic analysis, a sentence, an article, or a text is classified as biased or not. Given the fact that for humans the question of whether a text is biased can be difficult to answer or is answered contradictory, we ask whether an "absolute bias classification" is a promising goal at all. We see the problem not in the complexity of interpreting language phenomena but in the diversity of sociocultural backgrounds of the readers, which cannot be handled uniformly: To decide whether a text has crossed the proverbial line between non-biased and biased is subjective. By asking "Is text X more [less, equally] biased than text Y?" we propose to analyze a simpler problem, which, by its construction, is rather independent of standpoints, views, or sociocultural aspects. In such a model, bias becomes a preference relation that induces a partial ordering from least biased to most biased texts without requiring a decision on where to draw the line. A prerequisite for this kind of bias model is the ability of humans to perceive relative bias differences in the first place. In our research, we selected a specific type of bias in argumentation, the stance bias, and designed a crowdsourcing study showing that differences in stance bias are perceptible when (light) support is provided through training or visual aid

Proceedings - University of Groningen