55 research outputs found
Proceedings of the Conference on Natural Language Processing 2010
This book contains state-of-the-art contributions to the 10th
conference on Natural Language Processing, KONVENS 2010
(Konferenz zur Verarbeitung natürlicher Sprache), with a focus
on semantic processing.
The KONVENS in general aims at offering a broad perspective
on current research and developments within the interdisciplinary
field of natural language processing. The central theme
draws specific attention towards addressing linguistic aspects
ofmeaning, covering deep as well as shallow approaches to semantic
processing. The contributions address both knowledgebased
and data-driven methods for modelling and acquiring
semantic information, and discuss the role of semantic information
in applications of language technology.
The articles demonstrate the importance of semantic processing,
and present novel and creative approaches to natural
language processing in general. Some contributions put their
focus on developing and improving NLP systems for tasks like
Named Entity Recognition or Word Sense Disambiguation, or
focus on semantic knowledge acquisition and exploitation with
respect to collaboratively built ressources, or harvesting semantic
information in virtual games. Others are set within the
context of real-world applications, such as Authoring Aids, Text
Summarisation and Information Retrieval. The collection highlights
the importance of semantic processing for different areas
and applications in Natural Language Processing, and provides
the reader with an overview of current research in this field
Students´ language in computer-assisted tutoring of mathematical proofs
Truth and proof are central to mathematics. Proving (or disproving) seemingly simple statements often turns out to be one of the hardest mathematical tasks. Yet, doing proofs is rarely taught in the classroom. Studies on cognitive difficulties in learning to do proofs have shown that pupils and students not only often do not understand or cannot apply basic formal reasoning techniques and do not know how to use formal mathematical language, but, at a far more fundamental level, they also do not understand what it means to prove a statement or even do not see the purpose of proof at all. Since insight into the importance of proof and doing proofs as such cannot be learnt other than by practice, learning support through individualised tutoring is in demand.
This volume presents a part of an interdisciplinary project, set at the intersection of pedagogical science, artificial intelligence, and (computational) linguistics, which investigated issues involved in provisioning computer-based tutoring of mathematical proofs through dialogue in natural language. The ultimate goal in this context, addressing the above-mentioned need for learning support, is to build intelligent automated tutoring systems for mathematical proofs. The research presented here has been focused on the language that students use while interacting with such a system: its linguistic propeties and computational modelling. Contribution is made at three levels: first, an analysis of language phenomena found in students´ input to a (simulated) proof tutoring system is conducted and the variety of students´ verbalisations is quantitatively assessed, second, a general computational processing strategy for informal mathematical language and methods of modelling prominent language phenomena are proposed, and third, the prospects for natural language as an input modality for proof tutoring systems is evaluated based on collected corpora
Concepts of tense
The dissertation examines the concepts of tense. There is not and cannot be one true concept for any linguistic phenomenon as there are no "true", language-independent linguistic phenomena. This means that studies employ concepts that differ from each other. However, the concepts should not differ from each other randomly; the concepts cannot be "right" or "wrong", but they can be more or less appropriate. Yet, it is not enough to just build or use an appropriate concept. It is also important to be explicit on the choices made to prevent further misunderstandings and to make the results of the study easier to understand and to compare; the results are always dependent on the theoretical background, yet the composition of concepts is too often too implicit.
The dissertation is metatheoretical in its nature: I examine twelve already existing concepts of tense: I have broken their structure down into individual components, which may have several different values. I have compared this theoretical data with a typological data of 193 tense markers from 62 languages and evaluated how each component value affects the possible data, the analysis and the results of any given study (typological or other).
The objectives of the dissertation are to make past and future tense research more comparable, to examine how the choice of concept affects the data, the analysis and the results, to help in building appropriate concepts that best serve the research question and to highlight the importance of making concepts and their component values explicit. The work has been structured in such a way that the central ideas are easy to transfer to the study of other linguistic phenomena as well.
The results show that carefully considering the concept is indeed an essential part of any linguistic study: Using different component values as a part of the concept results in different types of data that may be more or less suitable for a given purpose. These effects are individually illustrated with the typological data and the studies used as examples. The results also include a detailed list of components and their values relevant for tense as well as an analysis of their frequency, centrality and canonicity in regard to the concepts of tense. The typological data also serves as a typological study of tense in its own right. This means that in addition to addressing the main objectives the dissertation also provides answers to the questions "what is typically considered as tense in the literature" and "what tense markers are typically like". The dissertation also provides methodology for the systematic analysis of concepts in general.Väitöskirja tarkastelee tempuksen konseptia. Millekään kielelliselle ilmiölle ei voi olla olemassa yhtä "oikeaa" konseptia, sillä ei ole olemassa kielistä riippumattomia "oikeita" kielellisiä ilmiöitä. Tämä tarkoittaa, että tutkimukset käyttävät toisistaan eroavia konsepteja. Nämä konseptit eivät kuitenkaan eroa toisistaan sattumanvaraisesti; konseptit eivät voi olla "oikeita" tai "vääriä", mutta ne voivat olla enemmän tai vähemmän tarkoituksenmukaisia. Siltikään tarkoituksenmukaisen konseptin luominen tai käyttäminen ei vielä riitä. On myös tärkeää tuoda tehdyt valinnat eksplisiittisesti ilmi, jotta tulevilta väärinkäsityksiltä vältyttäisiin ja tutkimuksen tulokset olisivat helposti ymmärrettävissä ja vertailtavissa; tulokset ovat aina riippuvaisia tutkimuksen teoreettisesta taustasta, mutta konseptien rakenne jätetään silti usein implisiittiseksi.
Väitöskirja on luonteeltaan metateoreettinen: Tarkastelen kahtatoista olemassaolevaa tempuksen konseptia: olen hajoittanut niiden rakenteen yksittäisiksi komponenteiksi, joilla voi olla erilaisia arvoja. Olen vertaillut tätä teoreettista aineistoa 193 tempusmuodosta koostuvaan typologiseen aineistoon, joka on kerätty 62 kielestä (ja joka on analysoitu samaan tapaan). Olen analysoinut kuinka yksittäiset komponenttien arvot vaikuttavat tutkimusten (niin typologisten kuin muidenkin) aineistoon, analyysiin ja tuloksiin.
Väitöskirjan tavoitteina on tehdä mennyt ja tuleva tempustutkimus vertailukelpoisemmaksi, tutkia kuinka konseptin valinta vaikuttaa aineistoon, analyysiin ja tuloksiin, ohjata luomaan tarkoituksenmukaisia konsepteja, jotka parhaiten palvelevat tutkimuskysymystä ja korostaa konseptien ja niiden komponenttien arvojen eksplisiittistä esilletuontia. Työ on jäsennelty siten, että keskeiset ideat ovat helposti hyödynnettävissä myös muiden kielellisten ilmiöiden tutkimuksessa.
Tulosten mukaan konseptien huolellinen pohdinta on keskeinen osa mitä tahansa kielitieteellistä tutkimusta: Erilaisten komponenttiarvojen käyttäminen konseptin osana johtaa luonteeltaan erilaisiin aineistoihin, jotka ovat enemmän tai vähemmän soveltuvia tarkoituksiinsa. Tätä olen havainnollistanut typologisen aineiston ja esimerkkeinä käytettyjen tutkimusten avulla. Tuloksiin lukeutuu myös yksityiskohtainen tempukselle relevanttien komponenttien ja niiden arvojen lista, sekä niiden yleisyyden, keskeisyyden ja kanonisuuden arviointi. Typologinen aineisto on myös itsessään typologinen tutkimus. Pääasiallisten tavoitteiden lisäksi työ tarjoaa siis vastauksen myös kysymyksiin "mitä yleensä pidetään tempuksena kirjallisuudessa" sekä "millaisia tempusmuodot yleensä ovat". Väitöskirja tarjoaa myös metodologiaa konseptien systemaattiseen analyysiin
Term-driven E-Commerce
Die Arbeit nimmt sich der textuellen Dimension des E-Commerce an. Grundlegende Hypothese ist die textuelle Gebundenheit von Information und Transaktion im Bereich des elektronischen Handels. Überall dort, wo Produkte und Dienstleistungen angeboten, nachgefragt, wahrgenommen und bewertet werden, kommen natürlichsprachige Ausdrücke zum Einsatz. Daraus resultiert ist zum einen, wie bedeutsam es ist, die Varianz textueller Beschreibungen im E-Commerce zu erfassen, zum anderen können die umfangreichen textuellen Ressourcen, die bei E-Commerce-Interaktionen anfallen, im Hinblick auf ein besseres Verständnis natürlicher Sprache herangezogen werden
Exocentric Noun Phrases in English
The term ‘exocentric noun phrase’ (ENP) refers to a noun phrase without a head noun. The category of ENPs contains a range of nominal constructions including phrasal ones (e.g. the rich, the dead, whose head nouns denoting human references are missing) and clausal ones (e.g. I’ll eat what you give me, in which there seems to be a missing nominal antecedent). Although these constructions have been studied before, there has been very little comprehensive research on ENPs as a category. This thesis has two aims to accomplish: first, it fully examines ENPs with the support of contemporary and historical corpus data; secondly, based on this direct syntactic examination of ENPs, it critically evaluates the possibility of a unified theory. The first aim is addressed in Chapters 3 to 8, in which I conduct systematic reviews of four representative kinds of ENPs in English, i.e. Generic Constructions (ENPs with a pattern of ‘determinative + adjective’ such as the rich or the sublime), referential metonymy (e.g. Shakespeare is on the bookshelf, where Shakespeare refers to his works), compound pronouns (indefinite pronouns with compounding morphology such as someone or anything) and free relatives (relative clauses without explicit antecedents, e.g. She is who I refer to). Syntactic explanations are proposed for each of these ENPs. The second aim is addressed in Chapter 9, based on the proposals of the previous chapters. I argue, contra Huddleston & Pullum et al. (2002) and Payne et al. (2007), that there cannot be a unified solution for all ENPs, including their ‘fusion of functions’ theory (FFT): although ENPs share a superficially similar syntactic structure characterised by the lack of head nouns, the forms of the missing head nouns and the mechanisms underlying the absence of these head nouns vary (historical ellipsis, compounding, conjunction of clauses, etc.). As a result, each kind of ENP needs an individual, more specific account that takes into consideration its own syntactic behaviour and historical development
Complement clauses and complementation systems: a cross-linguistic study of grammatical organization
The dissertation provides a cross-linguistic investigation into the grammatical structure of complement clauses and the organization of complementation systems. Based on a balanced sample of 100 widely dispersed languages, the major goals of the present work are to set the two landmark typological reference articles on complementation (Noonan 1985|2007, Dixon 2006) onto a broad empirical basis and to explore hitherto understudied phenomena in the constitution of complementation systems. In particular, the traditional focus on object complement clauses is shifted to complements in ‘subject’ function, and the dissertation is the first to analyse systematically the cross-linguistic productivity, morphosyntactic coding, syntagmatic arrangement and diachronic rise of complements in S- and A-function, as compared to their corresponding object clauses. On a methodological plane, it combines a multivariate approach to clause-linkage with recent statistical techniques of data mining (e.g. HCFA, cluster analyses, NeighborNet, MDS) in order to measure (dis)similarities in the cross-linguistic organization of complementation constructions. This comprises, for example, a precise gauging of the degree to which the internal structure of complements is ‘desententialized’ (Lehmann 1988) and made NP-like, of the ways in which this correlates with the possible external functions and positions of the complement in the main clause, and of the ways in which these distributional patterns in complementation systems reflect the historical origins and lexical diffusion of the relevant constructions. Above all, the dissertation problematizes the conceptual and terminological foundations for the typological study of complementation, which, despite decades of intensive research, remain challenging to establish in a cross-linguistically satisfactory way
Subatomic quantification (Volume 6)
The goal of this book is to explore the relationship between the cognitive notion of parthood and various grammatical devices expressing this concept in natural language. The monograph aims to investigate syntactic constructions and lexical categories, e.g., partitives, whole-adjectives, and multipliers, encoding different kinds of part-whole structures both in Slavic and non-Slavic languages. It is envisioned to inspire radical rethinking of the ontology of models accounting for nominal semantics. Specifically, it provides novel evidence for a mereotopological approach to meaning, i.e., a theory of wholes that captures not only parthood but also topological relations holding between parts. This evidence comes from the phenomenon of subatomic quantification, i.e., quantification over parts of referents of concrete count nouns
Recommended from our members
Cross-generational linguistic variation in the Canberra Vietnamese heritage language community: A corpus-centred investigation
This dissertation investigates cross-generational linguistic differences in the Canberra Vietnamese bilingual community, with a particular focus on Vietnamese as the heritage language. Specifically, it documents the vernacular and considers key aspects of this data from different theoretical perspectives. Its main contribution is an insight into a rarely studied heritage language variety in a contact community that has never been examined.
The dissertation consists of five core chapters, organised into two parts. In the first part (Chapters 2–3), I describe how I documented the vernacular and created the Canberra Vietnamese English Corpus (CanVEC), an original corpus compiled specifically for this study that is also the first to be freely available for research purposes. The corpus consists of over ten hours of spontaneous speech produced by 45 Vietnamese-English bilingual speakers across two generations living in Canberra. In the second part of the study (Chapters 4–6), I put the corpus to use and investigate aspects of the cross-generational differences in Vietnamese as the heritage language in this community.
In particular, I first probe the Vietnamese heritage language via its participation in the code-switching discourse (Chapter 4). In doing so, I focus on the applicability of the Matrix Language Framework (MLF) (Myers-Scotton, 1993, 2002) and its associated Matrix Language (ML) Turnover Hypothesis (Myers-Scotton, 1998) to the code-switching data in CanVEC. Since support for this prominent model has mainly come from language pairs that have different clausal word order or vastly different inventories of inflectional morphology, Vietnamese-English as a pair in which both languages are SVO and essentially isolating offers a tantalising testing ground for its application. Results show that the universal claims of this model do not hold so straight-forwardly. CanVEC data challenges several assumptions of the MLF, with the model ultimately only being able to account for around half of the CanVEC code-switching data. I further demonstrate that even when the ML is putatively identifiable and a cross-generational ML ‘turnover’ is quantitatively observed, the predictions do not reflect the direction of structural influence that we see in CanVEC. The MLF approach therefore sheds only limited light on cross-generational language shift and variation in this community.
Given that null elements emerge as a distinct area of difficulty in Chapter 4, I take this aspect as the focal point for the next part of the investigation (Chapter 5), where I use the variationist approach (Labov, 1972 et seq.) to explore three cases where null and overt realisation alternates in Vietnamese: subjects, objects, and copulas. In doing so, I move away from the bilingual portion of CanVEC to examine the monolingual heritage Vietnamese subset directly. Results show that Vietnamese null subjects vary significantly across generations, while null objects and copulas remain stable in terms of use. As speakers also overwhelmingly prefer overt forms over null forms (∼70:30) across all the three of the variables of interest, I appeal to the generative interface-oriented approach (Sorace & Filiaci, 2006 et seq.) to next examine the distribution of overt subjects, objects, copulas (Chapter 6). These results converge with what was found for null forms: cross-generational effects were observed for pronominal subjects, but not pronominal objects and copulas. This finding also supports the importance of a distinction drawn in previous works between internal (syntax-semantics) and external (syntax-discourse/pragmatics) interface phenomena, with the latter being seemingly more susceptible to change.
Ultimately, this dissertation highlights the empirical and theoretical value of studying rarely considered contact varieties, while deploying an integrated approach that acknowledges the multi-faceted complexity of the contact communities where these varieties are spoken.Cambridge Trust International Scholarshi
Iconicity in Language and Speech
Die vorliegende Arbeit befasst sich mit dem großen Oberthema der Ikonizität und ihrer Verbreitung auf verschiedenen linguistischen Ebenen. Ikonizität bezeichnet die Ähnlichkeit zwischen der sprachlichen Form und ihrer Bedeutung (vgl. Perniss und Vigliocco, 2014). So wie eine Skulptur einem Objekt oder einer Person ähnelt, kann auch der Klang oder die Form von Wörtern der Sache ähneln, auf die sie verweisen. Frühere theoretische Ansätze betonen, dass die Arbitrarität von sprachlichen Zeichen und deren Bedeutung ein Hauptmerkmal menschlicher Sprache ist und Ikonizität für die Sprachevolution eine Rolle gespielt haben mag, jedoch in der heutigen Sprache zu vernachlässigen ist. Im Gegensatz dazu ist das Hauptanliegen dieser Arbeit, das Potenzial und die Bedeutung von Ikonizität in der heutigen Sprache zu untersuchen. Die einzelnen Kapitel der Dissertation können als separate Teile betrachtet werden, die in ihrer Gesamtheit das umfassende Spektrum der Ikonizität sichtbar machen. Von der sprachevolutionären Debatte ausgehend wird in den einzelnen Kapiteln auf die unterschiedlichen Ebenen der Ikonizität eingegangen. Es werden experimentelle Untersuchungen zur Lautsymbolik, am Beispiel der deutschen Pokémon-Namen, zur ikonischen Prosodie und zu ikonischen Wörtern, den sogenannten Ideophonen, vorgestellt. Die Ergebnisse der einzelnen Untersuchungen deuten auf die weite Verbreitung der Ikonizität im heutigen Deutsch hin. Darüber hinaus entschlüsselt diese Dissertation das kommunikative Potenzial der Ikonizität als eine Kraft, die nicht nur die Entstehung der Sprache ermöglichte, sondern auch nach Jahrtausenden bestehen bleibt, sich immer wieder neu entfaltet und uns tagtäglich in mündlicher, schriftlicher Form und in Gesten begegnet.This dissertation is concerned with the major theme of iconicity and its prevalence on different linguistic levels. Iconicity refers to a resemblance between the linguistic form and the meaning of a referent (cf. Perniss and Vigliocco, 2014). Just like a sculpture resembles an object or a model, so can the sound or shape of words resemble the thing they refer to. Previous theoretical approaches emphasize that arbitrariness of the linguistic sign is one of the main features of human language; iconicity, however, may have played a role for language evolution, but is negligible in contemporary language. In contrast, the main point of this thesis is to explore the potential and the importance of iconicity in the language nowadays. The individual chapters of the dissertation can be viewed as separate parts that, taken together, reveal the comprehensive spectrum of iconicity. Starting from the language evolutionary debate, the individual chapters address iconicity on different linguistic levels. I present experimental evidence on sound symbolism, using the example of German Pokémon names, on iconic prosody, and on iconic words, the so-called ideophones. The results of the individual investigations point to the widespread use of iconicity in contemporary German. Moreover, this dissertation deciphers the communicative potential of iconicity as a force that not only enabled the emergence of language, but also persists after millennia, unfolding again and again and encountering us every day in speech, writing, and gestures
- …