230 research outputs found
ANNIS: a linguistic database for exploring information structure
In this paper, we discuss the design and implementation of our first version of the database "ANNIS" (ANNotation of Information Structure). For research based on empirical data, ANNIS provides a uniform environment for storing this data together with its linguistic annotations. A central database promotes standardized annotation, which facilitates interpretation and comparison of the data. ANNIS is used through a standard web browser and offers tier-based visualization of data and annotations, as well as search facilities that allow for cross-level and cross-sentential queries. The paper motivates the design of the system, characterizes its user interface, and provides an initial technical evaluation of ANNIS with respect to data size and query processing
A Model for Processing Illocutionary Structures and Argumentation in Debates
International audienc
Automatically identifying transitions between locutions in dialogue
International audienceThe contribution of this paper is theoretical foundations for dialogical argument mining, as well as initial implementation in software for dialogue processing. Automatically identifying the structure of reasoning from natural language is extremely demanding. Our hypothesis is that the structure of dialogue can yield additional clues as to argument structures that are created and cocreated. Our work has been performed using the MM2012 corpus in OVA+
Classifying Italian newspaper text: news or editorial?
We present a text classifier that can distinguish Italian news stories from editorials. Inspired by earlier work on English, we built a suitable train/test corpus and implemented a range of features, which can predict the distinction with an accuracy of 89,12%. As demonstrated by the earlier work, such a feature-based approach outperforms simple bag-of-words models when being transferred to new domains. We argue that the technique can also be used to distinguish opinionated from non-opinionated text outside of the realm of newspapers.Presentiamo una tecnica per la classificazione di articoli di giornale in italiano come articoli di cronaca oppure editoriali. Ispirandoci a precedenti pubblicazioni riguardanti la lingua inglese, abbiamo costruito un corpus adatto allo scopo e selezionato un insieme di caratteristiche testuali in grado di distinguere il genere con un accuratezza dell’ 89,12%. Come dimostrato dai lavori precedenti, questo approccio basato sulle proprietà del testo mostra risultati migliori rispetto ad altri quando trasferito a nuovi argomenti. Riteniamo inoltre che questa tecnica possa essere usata con successo anche in contesti diversi dagli articoli di giornale per distinguere testi contenenti opinioni dell’autore e non
Linguistic mechanisms of coherence in aphasic and non-aphasic discourse
Background: Coherence is the quality that distinguishes discourse from a random collection of sentences. People with aphasia have been reported to produce less-coherent discourse than non-language-impaired speakers. It is largely unclear how coherence is established in natural language and what leads to its impairment in aphasia.Aims: This paper presents a cross-methodological investigation on coherence in the discourse of Russian native speakers with and without aphasia. The purpose of this study was to examine the connection between language impairments in aphasia and different aspects of discourse coherence in order to determine the linguistic mechanisms that could be involved in establishing and maintaining it.Methods & Procedures: Coherence was operationalised as a combination of four aspects: informativeness, clarity, connectedness, and understandability. Twenty participants were asked to retell the content of a short movie. The retellings were annotated using Rhetorical Structure Theory (RST), a formalistic framework for discourse-structure analysis. Next, they were evaluated for coherence on a four-point scale by trained raters. The ratings were compared between groups. A classification analysis was performed to determine whether the ratings could be predicted based on the macrolinguistic variables collected from the RST annotations and several microlinguistic variables previously linked to coherence.Results: Retellings produced by speakers with aphasia received lower ratings than those of control participants on all aspects of coherence. The results indicate that different combinations of microlinguistic and discourse-structure variables play a role in establishing each of the coherence aspects.Conclusions: Our results provided supporting evidence on coherence impairment in aphasia. Perception of a discourse as more or less coherent was associated with both microlinguistic and macrolinguistic variables, with different combinations of variables relevant for each of the aspects. Furthermore, we found that discourse structure plays an important role, especially for understandability. We speculate that pragmatic knowledge shared by interlocutors may boost the coherence of aphasic discourse.</p
Primary and secondary discourse connectives: definitions and lexicons
Starting from the perspective that discourse structure arises from the presence of coherence relations, we provide a map of linguistic discourse structuring devices (DRDs), and focus on those for written text. We propose to structure these items by differentiating between primary and secondary connectives on the one hand, and free connecting phrases on the other. For the former, we propose that their behavior can be described by lexicons, and we show one concrete proposal that by now has been applied to three languages, with others being added in ongoing work. The lexical representations can be useful both for humans (theoretical investigations, transfer to other languages) and for machines (automatic discourse parsing and generation)
- …
