178 research outputs found
Gamifying Language Resource Acquisition
PhD ThesisNatural Language Processing, is an important collection of methods for processing the vast
amounts of available natural language text we continually produce. These methods make
use of supervised learning, an approach that learns from large amounts of annotated
data. As humans, weâre able to provide information about text that such systems can learn from.
Historically, this was carried out by small groups of experts. However, this did not scale. This led
to various crowdsourcing approaches being taken that used large pools of non-experts.
The traditional form of crowdsourcing was to pay users small amounts of money to complete
tasks. As time progressed, gamification approaches such as GWAPs, showed various benefits
over the micro-payment methods used before. These included a cost saving, worker training
opportunities, increased worker engagement and potential to far exceed the scale of crowdsourcing.
While these were successful in domains such as image labelling, they struggled in the domain
of text annotation, which wasnât such a natural fit. Despite many challenges, there were also
clearly many opportunities and benefits to applying this approach to text annotation. Many of
these are demonstrated by Phrase Detectives. Based on lessons learned from Phrase Detectives
and investigations into other GWAPs, in this work, we attempt to create full GWAPs for NLP,
extracting the benefits of the methodology. This includes training, high quality output from
non-experts and a truly game-like GWAP design that players are happy to play voluntarily
Proceedings of the 2010 Annual Conference of the Gesellschaft fĂŒr Semantik
Sinn & Bedeutung - the annual conference of the Gesellschaft fĂŒr Semantik - aims to bring together both established researchers and new blood working on current issues in natural language semantics, pragmatics, the syntax-semantics interface, the philosophy of language or carrying out psycholinguistic studies related to meaning.
Every year, the conference moves to a different location in Europe.
The 2010 conference - Sinn & Bedeutung 15 - took place on September 9 - 11 at Saarland University, SaarbrĂŒcken, organized by the Department for German Studies
Chinese whispers Chinese rooms: the poetry of John Ashbery and cognitive studies
This thesis examines the relationship of John Ashberyâs poetry to developments in cognitive studies over the course of the last sixty years, particularly the science of linguistics as viewed from a Chomskyan perspective. The thesis is divided into four chapters which position particular topics in cognitive studies as organising principles for examining Ashberyâs poetry. The first chapter concentrates on developments in syntactic theory in relation to Ashberyâs experiments with poetic syntax. The second chapter examines the notion of âintentionâ and âintentionalityâ in Ashberyâs writing from the perspective of cognitive âtheory of contextâ writing, particularly the work of Deirdre Wilson and Daniel Sperber. The final two chapters consider cognitive questions using Ashberyâs poetry as a means of entry into controversial areas in formal cognitive studies. The third chapter examines his poetry in relation to temporality, suggesting that Ashberyâs experiments with time form âtheories of consciousnessâ as they consciously manipulate readerly consciousness and attention. The final chapter explores perception in relation to Ashberyâs writing. The thesis argues that poetry can be conceived of as a less formalised method of cognitive study, and that poetic experiment can lead to significant reconceptualisations of cognitive notions which may play a role in framing critical questions for more formal experiments in cognitive science-philosophy going forward. The thesis concludes with reflections on the wider implications for literary cognitive studies in general
Recommended from our members
Cross-generational linguistic variation in the Canberra Vietnamese heritage language community: A corpus-centred investigation
This dissertation investigates cross-generational linguistic differences in the Canberra Vietnamese bilingual community, with a particular focus on Vietnamese as the heritage language. Specifically, it documents the vernacular and considers key aspects of this data from different theoretical perspectives. Its main contribution is an insight into a rarely studied heritage language variety in a contact community that has never been examined.
The dissertation consists of five core chapters, organised into two parts. In the first part (Chapters 2â3), I describe how I documented the vernacular and created the Canberra Vietnamese English Corpus (CanVEC), an original corpus compiled specifically for this study that is also the first to be freely available for research purposes. The corpus consists of over ten hours of spontaneous speech produced by 45 Vietnamese-English bilingual speakers across two generations living in Canberra. In the second part of the study (Chapters 4â6), I put the corpus to use and investigate aspects of the cross-generational differences in Vietnamese as the heritage language in this community.
In particular, I first probe the Vietnamese heritage language via its participation in the code-switching discourse (Chapter 4). In doing so, I focus on the applicability of the Matrix Language Framework (MLF) (Myers-Scotton, 1993, 2002) and its associated Matrix Language (ML) Turnover Hypothesis (Myers-Scotton, 1998) to the code-switching data in CanVEC. Since support for this prominent model has mainly come from language pairs that have different clausal word order or vastly different inventories of inflectional morphology, Vietnamese-English as a pair in which both languages are SVO and essentially isolating offers a tantalising testing ground for its application. Results show that the universal claims of this model do not hold so straight-forwardly. CanVEC data challenges several assumptions of the MLF, with the model ultimately only being able to account for around half of the CanVEC code-switching data. I further demonstrate that even when the ML is putatively identifiable and a cross-generational ML âturnoverâ is quantitatively observed, the predictions do not reflect the direction of structural influence that we see in CanVEC. The MLF approach therefore sheds only limited light on cross-generational language shift and variation in this community.
Given that null elements emerge as a distinct area of difficulty in Chapter 4, I take this aspect as the focal point for the next part of the investigation (Chapter 5), where I use the variationist approach (Labov, 1972 et seq.) to explore three cases where null and overt realisation alternates in Vietnamese: subjects, objects, and copulas. In doing so, I move away from the bilingual portion of CanVEC to examine the monolingual heritage Vietnamese subset directly. Results show that Vietnamese null subjects vary significantly across generations, while null objects and copulas remain stable in terms of use. As speakers also overwhelmingly prefer overt forms over null forms (âŒ70:30) across all the three of the variables of interest, I appeal to the generative interface-oriented approach (Sorace & Filiaci, 2006 et seq.) to next examine the distribution of overt subjects, objects, copulas (Chapter 6). These results converge with what was found for null forms: cross-generational effects were observed for pronominal subjects, but not pronominal objects and copulas. This finding also supports the importance of a distinction drawn in previous works between internal (syntax-semantics) and external (syntax-discourse/pragmatics) interface phenomena, with the latter being seemingly more susceptible to change.
Ultimately, this dissertation highlights the empirical and theoretical value of studying rarely considered contact varieties, while deploying an integrated approach that acknowledges the multi-faceted complexity of the contact communities where these varieties are spoken.Cambridge Trust International Scholarshi
Cappadocian kinship
Cappadocian kinship systems are very interesting from a sociolinguistic and anthropological perspective because of the mixture of inherited Greek and borrowed Turkish kinship terms. Precisely because the number of Turkish kinship terms differs from one variety to another, it is necessary to talk about Cappadocian kinship systems in the plural rather than about the Cappadocian kinship system in the singular. Although reference will be made to other Cappadocian varieties, this paper will focus on the kinship systems of MiĆĄotika and Aksenitika, the two Central Cappadocian dialects still spoken today in several communities in Greece. Particular attention will be given to the use of borrowed Turkish kinship terms, which sometimes seem to co-exist together with their inherited Greek counterparts, e.g. mĂĄna vs. nĂ©ne âmotherâ, ailfĂł/aelfĂł vs. ÎłardĂĄĆĄ âbrotherâ etc. In the final part of the paper some kinship terms with obscure or hitherto unknown etymology will be discussed, e.g. kĂĄka âgrandmotherâ, iĆŸĂĄ âauntâ, lĂșva âuncle (fatherâs brother)â etc
Proceedings of the VIIth GSCP International Conference
The 7th International Conference of the Gruppo di Studi sulla Comunicazione Parlata, dedicated to the memory of Claire Blanche-Benveniste, chose as its main theme Speech and Corpora. The wide international origin of the 235 authors from 21 countries and 95 institutions led to papers on many different languages. The 89 papers of this volume reflect the themes of the conference: spoken corpora compilation and annotation, with the technological connected fields; the relation between prosody and pragmatics; speech pathologies; and different papers on phonetics, speech and linguistic analysis, pragmatics and sociolinguistics. Many papers are also dedicated to speech and second language studies. The online publication with FUP allows direct access to sound and video linked to papers (when downloaded)
- âŠ