193 research outputs found
Corpus Linguistics and the Law: Extending the Field from a Statistical Perspective
During the last 5â10 years, corpus-linguistic applications have slowly become more widespread in matters of legal interpretation; specifically, we see more court cases in which corpus-linguistic data are brought to bear on the (original) ordinary/public meaning of expressions in legal texts (in briefs and judicial opinions), but also more academic research focusing on if/how corpus-linguistic methods can shed light on the plain/ordinary meaning of words in a legal text.While this development is welcome, it also comes with shortcoming/risks, some of which are now hotly debated in recent and forthcoming law review articles. In particular, there is a whole family of currently debated shortcomings/risks that is virtually exclusively due to the fact that several early adopters/promoters of corpus methods for legal applications have been massively simplifying the field of corpus linguistics to what they know and what seems convenient.This is not useful for several reasons, one of which is that it makes corpus-linguistic applications in the legal field more vulnerable to various lines of attack in the legal literature. More important, however, is that reductionist corpus-linguistic applications also undermine the strength of the cases that corpus linguistics can (help) make. In this paper, I will discuss a few applications that showcase the wider range of methods that proper/full-fledged corpus analysis has to offer: one case study on historical trends in corpus data based on frequencies augmented with required but never-used additional statistics such as dispersion and uncertainty/robustness estimates; the other involves applying semantic vector spaces and word embeddings to explore (heuristically) the scope of terms
Towards a dynamic behavioral profile : a diachronic study of polysemous 'sentir' in Spanish
This study examines the diachronic evolution of the polysemy of the Spanish verb sentir (âto feelâ) by means of a corpus-based dynamic Behavioral Profile (BP) analysis. Methodologically, it presents the first application of the BP approach to historical data and proposes some methodological innovations not only within the current body of research in historical semantics, but also with regard to previous applications of the BP approach. First, whereas the majority of existing studies in quantitative historical semantics are largely based on observed frequencies or percentages of collocational co-occurrence, our study leverages more complex historical data that are based on the similarities of vectors. Second, this study also provides an extension of the methodological apparatus of the BP approach by complementing the traditional Hierarchical Agglomerative Cluster analysis (HAC) with a dynamic BP approach derived from Multidimensional Scaling maps (MDS). Theoretically, this methodology contributes to a comprehensive perspective on the process of Constructionalization and the nature of networks, which is illustrated on the basis of the development of the Discourse Marker (DM) lo siento (âIâm sorryâ)
Syntax from and for discourse II: More on complex sentences as meso-constructions
Abstract This paper presents a direct continuation of preceding corpus-linguistic research on complex sentence constructions with temporal adverbial clauses in a cognitive and usage-based framework (Diessel 2008; Hampe 2015). Working towards a more systematic construction-based account of complex sentences with before-, after-, until- and once-clauses in spontaneously spoken English, Hampe (2015) hypothesised that the morpho-syntactic realisations of configurations with initial adverbial clauses systematically diverge from those of configurations with final ones as a reflection of the specific functionality of each and that usage properties that are found across instantiations with a coherent functional load are retained in the schematisations creating constructions. This paper employs a multinomial regression in order to test to which extent each of eight closely related complex-sentence constructions with either initial or final before-, after-, until- and once-clauses can be predicted from the realisation of a few key morpho-syntactic properties of the respective adverbial and matrix clauses involved. The results support an analysis of complex-sentence constructions as meso-constructions that are not only specific about the subordinator and the positioning of the adverbial clause, but also retain âtracesâ of characteristic usage properties
Ordinary Meaning and Corpus Linguistics
This Article discusses how corpus analysis, and similar empirically based methods of language study, can help inform judicial assessments about language meaning. We first briefly outline our view of legal language and interpretation in order to underscore the importance of the ordinary meaning doctrine, and thus the relevance of tools such as corpus analysis, to legal interpretation. Despite the heterogeneity of the judicial interpretive process, and the importance of the specific context relevant to the statute at issue, conventions of meaning that cut across contexts are a necessary aspect of legal interpretation. Because ordinary meaning must in some sense be generalizable across contexts, it would seem to be subject in some way to the empirical verification that corpus analysis can provide. We demonstrate the potential of corpus analysis through the study of two rather infamous cases in which the reviewing courts made various general claims about language meaning. In both cases, United States v. Costello and Smith v. United States, the courts made statements about language that are contradicted by corpus analysis. We also demonstrate the potential of corpus analysis through Hartâs no-vehicles-in-the-park hypothetical. A discussion of how to approach Hartâs hypothetical shows the potential but also the complexities of the kind of linguistic analyses required by such scenarios. Corpus linguistics can yield results that are relevant to legal interpretation, but performing the necessary analyses is complex and requires significant training in order to perform competently. We conclude that while it is doubtful that judges will themselves become proficient at corpus linguistics, they should be receptive to the expert testimony of corpus linguists in appropriate circumstances
Linguistic annotation in/for corpus linguistics
This article surveys linguistic annotation in corpora and corpus linguistics. We first define the concept of 'corpus ' as a radial category and then, in Section 2, discuss a variety of kinds of information for which corpora are annotated and that are exploited in contemporary corpus linguistics. Section 3 then exemplifies many current formats of annotation with an eye to highlighting both the diversity of formats currently available and the emergence of XML annotation as, for now, the most widespread form of annotation. Section 4 summarizes and concludes with desiderata for future developments.
New Information in Naturalistic Data Is Also Signalled by Pitch Movement: An Analysis from Monolingual English/Spanish and Bilingual Spanish Speakers
New Information in Naturalistic Data Is Also Signalled by Pitch Movement:Â An Analysis from Monolingual English/Spanish and Bilingual Spanish SpeakersIn communication, speakers and listeners need ways to highlight certain information and relegate other information to the background. They also need to keep track of what information they (think they) have already communicated to the listener, and of the listeners' (supposed) knowledge of topics and referents. This knowledge and its layout in the utterance is commonly referred to as information structure, i.e., the degree to which propositions and referents are given or new. All languages have 'chosen' different ways to encode such information structure, for instance by modifying the pitch or intensity of the vocal signal or the order of words in a sentence. In this study, we assess whether the use of pitch to signal new information holds in typologically different languages such as English and Spanish by analyzing three population group monolingual California English speakers, bilingual speakers of English and Spanish from California (Chicano Spanish), and monolingual Mexican Spanish speakers from Mexico City. Our study goes beyond previous work in several respects. First, most current work is based on sentences just read or elicited in response to highly standardized and often somewhat artificial stimuli whose generalizability to more naturalistic settings may be questionable. We opted instead to use semidirected interviews whose more naturalistic setting provides data with a higher degree of authenticity. Second, in order to deal with the resulting higher degree of noise in the data as well as the inherent multifactoriality of the data, we are using state-of-the-art statistical methods to explore our data, namely generalized linear mixed-effects modeling, to accommodate speaker- and lexically-specific variability. Despite the noisy data, we find that contour tones including H+L or L+H sequences signal new information, and that items encoding new information also exhibit proportionally longer stressed vowels, than those encoding given information. We also find cross-dialectal variation between monolingual Mexican Spanish speakers on the one hand and monolingual English speakers and Chicanos on the other: Mexican Spanish speakers modify pitch contours less than monolingual English speakers, whereas the English patterns affect even the Spanish pronunciation of early bilinguals. Our findings, therefore, corroborate Gussenhoven's theory (2002) that some aspects of intonation are shared cross-linguistically (longer vowel length & higher pitch for new info), whereas others are encoded language-specifically and vary even across dialects (pitch excursion & the packaging of information structure)
EFL and/vs. ESL? A multi-level regression modeling perspective on bridging the paradigm gap
The study of learner language and of indigenized varieties are growing areas of English-language corpus-linguistic research, which are shaped by two current trends: First, the recognition that more rigorous methodological approaches are urgently needed: with few exceptions, existing work is based on over-/under-use frequency counts that fail to unveil complex non-native linguistic patterns; second, the collective effort to bridge an existing "paradigm gap " (Sridhar & Sridhar 1986) between EFL and ESL research. This paper contributes to these developments by offering a multifactorial analysis of seventeen lexical verbs in the dative alternation in speech and writing of German/French learners and Hong Kong/India/Singapore English speakers. We exemplify the advantages of hierarchical mixed-effects modeling, which allows us to control for speaker and verb-specific effects, but also for the hierarchical structure of the corpus data. Second, we address the theoretical question of whether EFL and ESL represent discrete English varieties or a continuum
Translation, interpreting, cognition: The way out of the box
Cognitive aspects of the translation process have become central in Translation and Interpreting Studies in recent years, further establishing the field of Cognitive Translatology. Empirical and interdisciplinary studies investigating translation and interpreting processes promise a hitherto unprecedented predictive and explanatory power. This collection contains such studies which observe behaviour during translation and interpreting. The contributions cover a vast area and investigate behaviour during translation and interpreting â with a focus on training of future professionals, on language processing more generally, on the role of technology in the practice of translation and interpreting, on translation of multimodal media texts, on aspects of ergonomics and usability, on emotions, self-concept and psychological factors, and finally also on revision and post-editing. For the present publication, we selected a number of contributions presented at the Second International Congress on Translation, Interpreting and Cognition hosted by the Tra&Co Lab at the Johannes Gutenberg University of Mainz
Translation, interpreting, cognition: The way out of the box
Cognitive aspects of the translation process have become central in Translation and Interpreting Studies in recent years, further establishing the field of Cognitive Translatology. Empirical and interdisciplinary studies investigating translation and interpreting processes promise a hitherto unprecedented predictive and explanatory power. This collection contains such studies which observe behaviour during translation and interpreting. The contributions cover a vast area and investigate behaviour during translation and interpreting â with a focus on training of future professionals, on language processing more generally, on the role of technology in the practice of translation and interpreting, on translation of multimodal media texts, on aspects of ergonomics and usability, on emotions, self-concept and psychological factors, and finally also on revision and post-editing. For the present publication, we selected a number of contributions presented at the Second International Congress on Translation, Interpreting and Cognition hosted by the Tra&Co Lab at the Johannes Gutenberg University of Mainz
- âŠ