152 research outputs found

    Human Associations Help to Detect Conventionalized Multiword Expressions

    Full text link
    In this paper we show that if we want to obtain human evidence about conventionalization of some phrases, we should ask native speakers about associations they have to a given phrase and its component words. We have shown that if component words of a phrase have each other as frequent associations, then this phrase can be considered as conventionalized. Another type of conventionalized phrases can be revealed using two factors: low entropy of phrase associations and low intersection of component word and phrase associations. The association experiments were performed for the Russian language

    The Role of Formulaic Language in the Creation of Grammar

    Get PDF
    Research in the field of Formulaic Language has shown it to be a very diverse phenomenon in both the form it takes and the functions it performs (e.g., Erma and Warren, 2000; Wray, 2002). The proposal made by Sinclair (1991) states that language as a system is organized according to two principles, the idiom principle\u27, which includes the use of all multi-word prefabricated sequences, and \u27the open choice principle,\u27 which covers word-for-word operations. Formulaic language is the embodiment of the idiom principle and constitutes the core of linguistic structure. Therefore, it must be subjected to scientific scrutiny from the variety of perspectives \u2013 typological, psycholinguistic, socio-pragmatic, and language acquisition. This dissertation reports on the percentage of formulaic sequences - prefabs - in spoken and written Russian; the distribution of prefab types across two spoken and four written genres, and their interaction with non-prefabricated language and the impact that prefabs have on the structure of a particular language type. Russian is the language typologically and structurally different from English. The main structural difference between English and Russian is that the Russian language has a free word order, wide inflectional system to code grammatical relations, and a satellite verb system. I hypothesize that these structural differences influence the quantity and the nature of formulaic sequences used in the language, the nature of alternation of prefabricated and non-prefabricated strings, and the preference of the speakers for one rather than the other aforementioned principles. The method applied in the analysis of Russian prefabs is developed by Erman and Warren (2000) and originally was applied to the analysis of the English texts. This dissertation seeks to address a methodological issue of applying this method to typologically different languages. It has been argued (Garcia and Florimon van Putte, 1989) that the fixedness of the English word order contributes to the co-occurrence of elements and the formation of formulaic sequences in English. In this case, formulaic language becomes a language-specific tendency pertaining to English, and not a universal mechanism for language storage, processing, production and use. The findings support the usage-based approaches driven by forces resulting from the frequency of use, discourse and communicative functions, grounded in the fine balance between the economy principle and the power of language creativity. The results of the study are used to draw implications for language processing and language modeling. As we continue to perfect the methods of identification, classification and analysis of formulaic sequences, we will be in a better position to describe not only the amount but the nature of formulaic language, its interaction with non-formulas, and the impact this alternation has on the linguistic structure as a whole. The current study investigates the nature of formulaic language in a free word order language. We seek to apply the method of identification, classification and analysis of prefabs, its interaction with each other and with non-formulaic language, as well as the estimation of choices made in producing spoken and written language. My dissertation results suggest that a free word order language uses at least as many prefabs as a fixed word order language. On average, in a free word order language like Russian 65% of spoken and 58% of written language is composed of multiword formulaic sequences. The results strengthen the hypothesis that the idiom principle is a mechanism of global linguistic organization and processing. The proportion and distribution or prefabs is less affected by language type than by spoken written medium distinction and genre variation. In addition, the results show that prefabs are frozen structures not amicable to standard syntactic transformations even in a free word order language. The results support the dual system of language processing, i.e., holistic and analytic, present in a free word order language

    Comparing two thesaurus representations for Russian

    Get PDF
    © 2018 Global WordNet Association. All Rights Reserved. In the paper we presented a new Russian wordnet, RuWordNet, which was semi-automatically obtained by transformation of the existing Russian thesaurus RuThes. At the first step, the basic structure of wordnets was reproduced: synsets’ hierarchy for each part of speech and the basic set of relations between synsets (hyponym-hypernym, part-whole, antonyms). At the second stage, we added causation, entailment and domain relations between synsets. Also derivation relations were established for single words and the component structure for phrases included in RuWordNet. The described procedure of transformation highlights the specific features of each type of thesaurus representations

    Writing in the Disciplines and Within-discipline Variations: A Comparison of the Formulaic Profiles of the Medical Research Article and the Medical Case Report

    Get PDF
    Research of formulaic language in academic writing has primarily investigated the use of single types of formulaic sequences in academic research articles in various disciplines. Studies in this line of research have revealed dramatic variations in the use of formulaic language across academic disciplines (e.g., Cortes, 2004; Hyland, 2008a; Jalali & Moini, 2014; Shahriari, 2017). However, there is evidence that discipline alone does not tell the whole story about linguistic variation (Gray, 2015). Different varieties of texts within one discipline may reflect different linguistic characteristics depending on specific communicative purposes (Biber & Conrad, 2009). It follows that the almost exclusive focus on the academic research article may “limit our knowledge of the discourse practices within discipline” (Gray, 2015, p. 19). Moreover, formulaic language encompasses different types of sequences (e.g., collocations, lexical bundles, frames, etc.) each of which only reveals a partial picture of formulaicity in discourse (Wray, 2005). Thus, studies that investigate the use of single types of formulaic sequences may provide only partial descriptions of the registers they investigate. Therefore, to better serve disciplinary writing instruction, there is a need for studies that provide more comprehensive descriptions of formulaic language in various registers within one discipline. The present dissertation takes a step in that direction by investigating within-discipline linguistic variation through the comparison of the formulaic profiles of two registers in the field of medicine: the medical research article (MRA) and the medical case report (MCR). These two registers that have both been reported in the medical literature to contribute to advancing research, clinical practice, and education in the field (e.g., Man et al., 2004; Rison et al., 2017). The study proposes a more comprehensive approach to the description of formulaic language and investigates the use of various formulaic sequences that have been described as accounting for the formulaicity of discourse. Such sequences include: (a) collocations, pairs of words that tend to co-occur, (b) multiword collocations, sequences of three or more words with strong mutual attraction (such sequences consist primarily of lexical words, most of which are technical terms), (c) lexical bundles, most frequent sequences of three or more words in a register, described as the building blocks of academic writing (Cortes, 2013), and (d) frames, sequences of three or more items with one variable slot. Frames have been described as allowing writers to make more creative use of formulaic language (e.g., Biber, 2009; Gray & Biber, 2013). The analyses of the formulaic sequences in the two registers often revealed structural similarities but noticeable variations in terms of the discourse functions of the sequences. Such variations reflect the differences in the situational characteristics of the two registers such as communicative purposes, nature of data and evidence, textual organization, to name but a few. The findings of the present study portray MRAs and MCRs as two distinct registers, thus highlighting the importance of investing within-discipline variations to better serve disciplinary writing instruction

    Formulaic language

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    Theories and methods

    Get PDF
    The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective

    The Acquisition of Formulaic Sequences in High-Intermediate ESL Learners

    Get PDF
    This study investigates the relative effectiveness of three types of form-focused instruction on the acquisition of English formulaic sequences (FSs), which learners of all proficiency levels seem to struggle with. 40 Mandarin-speaking graduate students were randomly assigned to 4 groups: 1 control group and 3 treatment groups. Over 2 weeks all groups received 3 reading comprehension lessons based on 3 reading passages with 10 target FSs in each. The control group received no instruction on FSs, while in the three treatment groups, after the reading comprehension activity, learners received three types of intervention: (i) Input Enhancement in combination with Explicit Instruction, (ii) Collaborative Gap-fill tasks, and (iii) Spot-the-Difference tasks. A Vocabulary Knowledge Scale test and an Awareness test were used as pre-tests, while immediate and delayed post-tests included a cued gap-fill test followed by a multiple-choice question test and the same Awareness test. Findings obtained from ANOVAs and Cohen\u27s d effect size calculations showed that three types of form-focused instruction benefited learners in acquiring higher levels of productive and receptive knowledge of new FSs. Form-focused instruction was particularly successful in helping learners produce the newly learnt FSs in a different context. Results also revealed that effective retention of the target FSs\u27 form was associated with higher levels of productive knowledge. Furthermore, learners\u27 engagement in understanding the meaning of new FSs in their context had a durable positive effect on their retention of the form and productive knowledge of these FSs. Direct instruction of new FSs\u27 meaning helped learners retain meaning most efficiently, while explicit strategy teaching tended to enhance learners\u27 ability to notice FSs in L2 input. Correlation analyses also suggested a complex interaction of factors related to the acquisition of FSs as frequency, n-gram length and MI Score separately could not fully account for the levels of success in acquiring new FSs receptively and productively among learners

    Code-switching in Greater Bilbao : A bilingual variety of colloquial Basque

    Get PDF
    This doctoral dissertation examines the role of code-switching between Basque and Spanish linguistic elements in the metropolitan area of Greater Bilbao in the Basque Country. The study consists of four articles and a compilation article. The articles examine bilingual speech from different points of view: variation in grammatical code-switching patterns, the role of swearing, slang and code-switching in constructing an informal register of Basque, language ideologies that discourage and encourage code-switching, and conventionalization of semantic-pragmatic code-switching patterns. The Basque context of language revitalization has created new divisions between speakers, as the formerly unidirectional bilingualism has turned into a situation where great numbers of Spanish speakers are learning Basque in adult acquisition programs or in Basque-medium education. Basque is still, however, a minority language in the Greater Bilbao area and the bilingual Basque speakers live scattered among the monolingual majority. The effect of these social structures on linguistic structures is examined in two sets of data that were collected for the purposes of this study. For the first set of data, 22 hours of naturally occurring peer-group conversations with 22 Basque-Spanish bilinguals were recorded, while the second set consists of 12 hours of metalinguistic conversations with 47 bilingual Basques. The speakers use their bilingual repertoire in numerous creative and dynamic ways. Yet some tendencies can be detected. Colloquial Basque in Bilbao is a bilingual speech style that always includes some code-switching to Spanish. There is considerable variation in the individuals code-switching patterns. Some of the informants, particularly L1-speakers of Basque, use very intensive and syntactically intrusive code-switching, whereas others, especially L2-speakers of Basque, only engage in syntactically peripheral code-switching, such as Spanish interjections, discourse markers and tags. The L2-speakers purist tendencies seem to have two sources: firstly, the normative setting of acquisition where language mixing is discouraged, and secondly, the general interpretation of new speakers code-switching as lack of proficiency in the minority language. Some Spanish elements have become conventionalized throughout the speech community as the default option. All informants use Spanish discourse markers, and swear words and colloquialisms are always introduced in Spanish in otherwise Basque speech. Spanish discourse markers seem to have been automatized as conversational routines, whereas Spanish swear words and colloquialisms have become conventionalized because of the domains they are associated with, and because of the lack of these stylistic categories in standard Basque.Koodinvaihto Suur-Bilbaossa: alueen puhekieleen kuuluu baskin ja espanjan sekoitus Kaksikieliset puhetyylit saavat usein liikanimiĂ€ kuten Spanglish (Spanish + English) ja portuñol (portuguĂȘs + español), ja niitĂ€ pidetÀÀn helposti huonona kielenkĂ€yttönĂ€, jota olisi syytĂ€ vĂ€lttÀÀ. Sekakieli Ă€rsyttÀÀ ja sen pelĂ€tÀÀn johtavan kielen rappioon. Kahden kielen sekoittaminen samassa puhetilanteessa on kuitenkin olennainen osa kaksikielisen ihmisen kommunikatiivista kompetenssia. MonikielisellĂ€ puhujalla on yksikieliseen puhujaan verrattuna kĂ€ytössÀÀn moninkertainen ilmaisuvarasto, jota hyödyntĂ€mĂ€llĂ€ hĂ€nellĂ€ on laajemmat kielelliset resurssit kuin yksikielisellĂ€. TĂ€ssĂ€ vĂ€itöskirjatutkimuksessa tarkastelen Baskimaan Suur-Bilbaon alueen kaksikielistĂ€ puhetta, josta kĂ€ytetÀÀn kansankielistĂ€ nimitystĂ€ euskañol. Se on yhdistelmĂ€ sanoista euskara, baskin kieli , ja español, espanjan kieli . Espanjan siirryttyĂ€ Francon diktatuurista demokratiaan 1980-luvun taitteessa vĂ€hemmistökieliĂ€ syrjinyt kielipolitiikka muuttui. Baskimaassa alkoi voimakas kielenelvytys. Baskin kielellĂ€ on nykyÀÀn paljon puhujia, jotka puhuvat baskia toisena kielenÀÀn. Baskin kirjakieli euskara batua on pÀÀssyt leviĂ€mÀÀn eri kouluasteisiin opetuskieleksi, ja sitĂ€ kĂ€ytetÀÀn monenlaisissa julkisissa konteksteissa sekĂ€ mediassa. VĂ€itöskirjatutkimus kuitenkin osoittaa, ettĂ€ uutta kirjakieltĂ€ pidetÀÀn muodollisena ja keinotekoisena ja usein sitĂ€ pyritÀÀn muokkaamaan epĂ€muodollisiin tilanteisiin sopivammaksi nimenomaan koodinvaihdolla, sekoittamalla baskia ja espanjaa. Toisin kuin usein kuvitellaan, koodinvaihto on siis jopa edullista vĂ€hemmistökielen kĂ€ytölle, koska yksikielistĂ€ baskia pidetÀÀn liian muodollisena epĂ€virallisiin tilanteisiin. Baskimaassa kaikki baskinkieliset ovat kaksikielisiĂ€ ja puhuvat myös espanjaa. Espanjankielisten sanojen ja ilmausten kĂ€yttö muuten baskinkielisessĂ€ puheessa ei ole siis este ymmĂ€rtĂ€miselle. NĂ€in puhekielisestĂ€ baskista on kĂ€ytĂ€nnössĂ€ tullut kaksikielinen kielimuoto, joka sisĂ€ltÀÀ aina jonkin verran koodinvaihtoa espanjaan. Esimerkiksi keskustelusanat, kiroilu ja slangi-ilmaukset sanotaan lĂ€hes aina espanjaksi muuten baskinkielisessĂ€ puheessa. Tutkimuksesta ilmenee, ettĂ€ koodinvaihdon mÀÀrĂ€ssĂ€ ja laadussa on kuitenkin eroja puhujan taustasta riippuen. ÄidinkieleltÀÀn baskit kĂ€yttĂ€vĂ€t koodinvaihtoa monipuolisesti ja runsaasti. Koulussa kielen omaksuneet baskit taas kĂ€yttĂ€vĂ€t koodinvaihtoa vain vĂ€hĂ€n ja silloinkin se rajoittuu yleensĂ€ hyvin yleisiin espanjankielisiin ilmauksiin, kuten kirosanoihin. Tutkimuksen mukaan koodinvaihdossa ilmeneville eroille on kaksi kaksi syytĂ€: 1) luokkahuoneen ilmapiiri ohjaa vahvasti yksikieliseen kielenkĂ€yttöön ja 2) ei-Ă€idinkielisten puhujien koodinvaihto tulkitaan usein kielitaidon puutteeksi. NĂ€in puristiset asenteet ja yksikielisyysnormit rajoittavat kaksikielisyyden tĂ€yttĂ€ hyödyntĂ€mistĂ€

    Syntactic and Semantic Patterns of Domain-specific Multiword Units in Marine Accident Investigation Reports

    Get PDF
    The present study is a systematic corpus-based investigation of the domain-specific multiword units (henceforth MWUs) in marine accident investigation reports (henceforth MAIR), with a view to characterizing their most prominent syntactic, semantic and functional features. To achieve these principal objectives, the target MWUs were first identified by applying a new approach, which incorporates the notion of ‘meaning’ into statistical-based measures. This method ensures the domain-specific MWU extraction to the largest extent and provides valid data for the subsequent analysis. Through proposing a three-dimensional analytical framework, this study has obtained the following findings: First, the domain-specific MWUs are largely composed of two-word sequences, while the occurrences of 4- and 5-word MWUs are relatively rare. Among all the target MWUs, only 1.10% of the expressions occur very commonly within the genre (˚1,000 times). By contrast, the majority of the expressions (70.97%) occur with the frequency less than 100 times. The skewed distribution indicates that MAIR genre tends to employ a wide variety of domain-specific MWUs rather than repetition of a small number of common expressions. Second, in terms of the syntactic features of the domain-specific MWUs, NP structure is the most commonly employed grammatical type. The abundant use of this structure implies that the domain-specific meaning of MAIR genre is largely carried in the nominal group. Apart from NP structure, there is also a marked prevalence of VP structures among the domain-specific MWUs in MAIR genre and these MWUs present structural variation. Of all the VP-based patterns, the ‘verb phrase with active verb’ pattern stands out since it incorporates a large number of action verbs, which are used to describe the actions done by people. The wide use of these phrases implies that MAIR genre tends to highlight the people’s roles during the accidents, with particular attention to the information about what or who caused or performed the activity. Similarly, PP structures were also frequently adopted by the domain-specific MWUs, especially the pattern beginning with preposition of. This pattern was mostly used to specify possessions. It thus can be inferred that the information that provided in MAIR genre tends to be concrete and specific. Third, by conducting a functional analysis of the target MWUs, it was found that the primary function of the domain-specific MWUs is to express referential meanings and contribute to the thematic development. Furthermore, due to their multifunctional nature, some referential MWUs also perform the function of stance and discourse organizing. When expressing stance, most MWUs express impersonal epistemic stance, with the purpose of minimizing the imposition of the reporters’ opinions. Other word sequences appear to be deontic in nature, as they are mainly realized by the MWUs incorporating with require or modal verbs. The primary function of these MWUs is to set out the obligations and issue suggestions for the agents according to certain norms and regulations. When functioning as discourse organizer, the domain-specific MWUs usually adopt the pattern of ‘that-clause controlled by main verbs in active voice’ to introduce the topics. Unlikely, when using for elaborating the topics, they tend to clarify the logical relationships, especially the causative-resultative relation, rather than providing additional information in MAIR genre. Fourth, the distinctive semantic features of the domain-specific MWUs can be best reflected when these MWUs perform the functions of activity identification and specification. For instance, most domain-specific MWUs used for describing activities are of general nature, but they convey specialized meaning in MAIR genre. Similarly, when domain-specific MWUs are used to provide tangible or intangible frames for specifying certain attributes, the use of these MWUs in MAIR genre is significantly deviant from their use in general English register. In all, by gaining insights into the salient features of the domain-specific MWUs in MAIR genre, the present study may make contributions and implications in the following aspects: the construction of extraction method for domain-specific MWUs, the compilation of maritime-specific MWU list, the teaching and learning of maritime English, especially the maritime-specific MWUs, and providing reference for writing MAIR to the experts who are from non-native English speaking countries.Abstract i List of Tables v List of Figures vii Chapter 1 Introduction 1 1.1. Background of this study 1 1.2. Objectives of this study 3 1.3. Significance of this study 4 1.4. Terminological issues 5 1.5. Organization of this dissertation 6 Chapter 2 Theoretical background 8 2.1. Understanding the notions of phraseology 8 2.2.1. An overview of influential notions of phraseology 9 2.1.2. Parameters of defining MWUs 13 2.1.3. Operational definition of MWUs 17 2.1.4. An overview of influential taxonomy of phraseology 19 2.2. Theoretical discussion of MWUs 23 2.2.1. Theoretical framework of this study 23 2.2.2. Nature of multiword units 25 2.2.3. Previous studies of phraseology 29 Chapter 3 Analytical framework and research design 37 3.1. Analytical framework 37 3.1.1 Analytical framework for syntactic features of domain-specific MWUs 38 3.1.2. Analytical framework for semantic features of domain-specific MWUs 40 3.1.3. Analytical framework for functional features of domain-specificMWUs 42 3.2. Research questions 43 3.3. Corpora used in this study 44 3.3.1. Corpus of Marine Accident Investigation Reports (COMAIR) 44 3.3.2. British National Corpus Baby (BNC Baby) 47 3.4. Tools and procedures for data analysis 48 3.4.1. Tools for data processing 48 3.4.2. Procedures for data analysis 49 3.4.3. Inter-rater reliability 50 3.5. Summary 51 Chapter 4 Identification of domain-specific MWUs in the COMAIR 52 4.1. Current approaches to MWU extraction 52 4.2. My proposed approach to domain-specific MWU extraction 53 4.3. The detailed process of domain-specific MWU extraction 55 4.3.1. Step 1: N-gram retrieval 55 4.3.2. Step 2: Keyword-gram extraction 56 4.3.3. Step 3: Measuring the association strength of keyword-grams 58 4.3.4. Step 4: Filtering out process 66 4.3.5. Step 5: Domain-specific MWU identification 70 Chapter 5 Frequency distributions and syntactic features of domain-specific MWUs 72 5.1. Frequency distributions of domain-specific MWUs 72 5.1.1. Frequency distributions of domain-specific MWUs in various lengths 72 5.1.2. Overall frequency distribution across different frequency bands 74 5.2. Syntactic features of domain-specific MWUs 76 Chapter 6 Functional and semantic features of domain-specific MWUs 80 6.1. Distributions across primary discourse functions 80 6.2. Multiple functioning 82 6.3. Stance MWUs 84 6.3.1. Notion of stance MWUs 84 6.3.2. Stance MWUs in COMAIR 84 6.4. Discourse organizing MWUs 90 6.4.1. Notion of discourse organizing MWUs 90 6.4.2. Discourse organizing MWUs in COMAIR 90 6.5. Referential MWUs 96 6.5.1. Notion of referential MWUs 97 6.5.2. Referential MWUs in COMAIR 97 6.6. Summary 112 Chapter 7 Conclusions and implications 113 7.1. Summary of the major findings 113 7.2. Implications of this study 116 7.3. Limitations of this study 117 References 118 Appendix 132Docto

    Modality, usage and diachrony: Constructional changes in the modal domain in American English

    Get PDF
    The present work revisits changes in the modal domain in AmE. It is argued that the individual developments of the core modal verbs and selected semi-modals over the course of the 19th and 20th century have been heterogenous in such a way that a unified treatment of either category as a whole is conceptually and methodologically highly questionable at best. The lack of any clear uniformity regarding their distributional behavior across time is particularly remarkable in the case of the modal verbs, as it stands in stark contrast to their morphosyntactic coherence. Furthermore, the case studies presented here reveal that even some modal verbs and their institutionalized contracted forms are on different paths in terms of their distribution as well as function, which casts serious doubts on whether lumping them together can be justified. Therefore, the case is made that, next to formal and functional properties, distributional information (i.e. any aspects related to usage intensity) requires careful consideration and should factor in when it comes to identifying more homogenous sub-categories of modal expressions. Essentially, the goal is to identify conventionalized modal utterance types that may also correspond (more or less) to the mental representations speakers have abstracted
    • 

    corecore