327 research outputs found
Theories and methods
The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective
Addressing the grammar needs of Chinese EAP students: an account of a CALL materials development project
This study investigated the grammar needs of Chinese EAP Foundation students and developed electronic self-access grammar materials for them. The research process consisted of three phases. In the first phase, a corpus linguistics based error analysis was conducted, in which 50 student essays were compiled and scrutinized for formal errors. A tagging system was specially devised and employed in the analysis. The EA results, together with an examination of Foundation tutors’ perceptions of error frequency and gravity led me to prioritise article errors for treatment; in the second phase, remedial materials were drafted based on the EA results and insights drawn from my investigations into four research areas (article pedagogy, SLA theory, grammar teaching approaches and CALL methodologies) and existing grammar materials; in the third phase, the materials were refined and evaluated for their effectiveness as a means of improving the Chinese Foundation students’ use of the article.
Findings confirm the claim that L2 learner errors are systematic in nature and lend support to the value of Error Analysis. L1 transfer appears to be one of the main contributing factors in L2 errors. The salient errors identified in the Chinese Foundation corpus show that mismanagement of the article system is the most frequent cause of grammatical errors; Foundation tutors, however, perceive article errors to be neither frequent nor serious. An examination of existing materials reveals that the article is given low priority in ELT textbooks and treatments provided in pedagogical grammar books are inappropriate in terms of presentation, language and exercise types. The devised remedial materials employ both consciousness-raising activities and production exercises, using EAP language and authentic learner errors. Preliminary evaluation results suggest that the EA-informed customised materials have the potential to help learners to perform better in proofreading article errors in academic texts
Formulaic language
The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective
Predicting ESL learners’ oral proficiency by measuring the collocations in their spontaneous speech
Collocation, known as words that commonly co-occur, is a major category of formulaic language. There is now general consensus among language researchers that collocation is essential to effective language use in real-world communication situations (Ellis, 2008; Nesselhauf, 2005; Schmitt, 2010; Wray, 2002). Although a number of contemporary speech-processing theories assume the importance of formulaic language to spontaneous speaking (Bygate, 1987; de Bot, 1992; Kormos, 2006; Levelt, 1999), none of them gives an adequate explanation of the role that collocation plays in speech communication. In the practices of L2 speaking assessment, a test taker’s collocational performance is usually not separately scored mainly because human raters can only focus on a limited range of speech characteristics (Luoma, 2004).
This paper argues for the centrality of collocation evaluation to communication-oriented L2 oral assessment. Based on a logical analysis of the conceptual connections among collocation, speech-processing theories, and rubrics for oral language assessment, the author formulated a new construct called Spoken Collocational Competence (SCC). In light of Skehan’s (1998, 2009) trade-off hypothesis, he developed a series of measures for SCC, namely Operational Collocational Performance Measures (OCPMs), to cover three dimensions of learner collocation performance in spontaneous speaking: collocation accuracy, collocation complexity, and collocation fluency. He then investigated the empirical performance of these measures with 2344 lexical collocations extracted from sixty adult English as a second language (ESL) learners’ oral assessment data collected in two distinctive contexts of language use: conversing with an interlocutor on daily-life topics (or the SPEAK exam) and giving an academic lecture (or the TEACH exam). Multiple regression and logistic regression were performed on criterion measures of these learners’ oral proficiency (i.e., human holistic scores and oral proficiency certification decisions) as a function of the OCPMs.
The study found that the participants generally achieved higher collocation accuracy and complexity in the TEACH exam than in the SPEAK exam. In addition, the OCPMs as a whole predicted the participants’ oral proficiency certification status (certified or uncertified) with high accuracy (Negelkerke R2 = .968). However, the predictive power of OCPMs for human holistic scores seemed to be higher in the SPEAK exam (adjusted R2 = .678) than in the TEACH exam (adjusted R2 = .573). These findings suggest that L2 learners’ collocational performance in free speech deserve examiners’ closer attention and that SCC may contribute to the construct of oral proficiency somewhat differently across speaking contexts. Implications for L2 speaking theory, automated speech evaluation, and teaching and learning of oral communication skills are discussed
The Influence of Features of Collocations on the Collocational Knowledge and Development of Kurdish High School Students: A Longitudinal Study
This study explored the influence of four features of collocations- frequency of occurrence, syntactic structure, semantic transparency, and congruency with L1- on the collocational knowledge and development of 252 Kurdish high school learners of English as a foreign language. The importance of collocations in learning English as a second or foreign language and the difficulties that challenge learners at different levels of language proficiency have been well established. However, few studies have adopted a longitudinal research design or a hybrid definition of collocations, incorporating both frequency-based and phraseological views. The present study took this approach to explore learners’ collocational knowledge and development and the influence of features of collocations on their collocational knowledge and development at the high school level of learning English as a foreign language. The study employed two tests: an appropriateness judgement test to measure learners’ receptive knowledge and a gap-filling test to measure their productive knowledge of collocations.
The data were collected in two waves, one at the beginning of their school year and the other at the end. Data analyses were conducted to determine the relationship between features of collocations and learners’ collocational knowledge and development. The results revealed frequency of occurrence as the most influential factor affecting learners’ knowledge and development. Influence of the syntactic structure of collocations on the learners’ knowledge and development came second whereas congruency with L1 occupied the third position. Semantic transparency seemed to have the least influence on their collocational knowledge and development. Gender appeared as an influential factor in the individual tests. However, its influence was not significant in terms of overall knowledge development. In general, the results indicated that learners’ productive collocational knowledge lagged behind their receptive. However, receptive and productive collocational knowledge did not increase at the same rate over the study period. While learners’ receptive collocational knowledge did not show an increase in knowledge, their productive knowledge increased significantly over the school year. The results also revealed that grammatical collocations were less challenging than lexical collocations at this level of language learning. Finally, according to the study results, some pedagogical implications and suggestions for further studies are presented.Kurdistan Regional Government (KRG
Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts
This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference
Recommended from our members
The Near-Synonymous Classifiers in Mandarin Chinese: Etymology, Modern Usage, And Possible Problems in L2 Classroom
Many Chinese classifiers are nearly synonymic – they can be used with the same head nouns without changing the meaning of the sentence, in other words, such classifiers can be used interchangeably or almost interchangeably. This poses a challenge for Chinese language learners, especially those who lack such a grammatical category in their own native language. Another complication arises from the ambiguous English translations of many classifiers.
In this paper we investigate the collocation behavior of near-synonymous Chinese classifiers, focusing on their semantic nuances and interchangeability. Analyzing 6 pairs of classifiers — 栋 and 幢, 匹 and 头, 批 and 派, 颗 and 粒, 辆 and 台, and 根 and 支— drawn from the HSK exam glossary, the dataset for this study encompasses 1200 samples (100 per each variable) and 416 distinct head nouns.
Through a corpus-based approach we analyze collocation behavior of each classifier on its own and as a part of the pair. The results showcase that not all pairs exhibit complete interchangeability. The collocation behavior of 批 and 派 differ significantly, where 批 primarily quantifies batches with a \u27first\u27 connotation, while 派 is used more in artistic expressions. The interchangeability of 栋 and 幢 varies with context. 幢 emerges as the least fre¬¬quent morpheme in the corpus, emphasizing its specific contextual usage. While both are used in address lines, 栋 predominantly quantifies standalone buildings, whereas 幢 is more aligned with larger architectural complexes. The analysis of 匹 and 头 highlights their distinctiveness, with 匹 counting horses and wolves and 头 being more versatile with various animals. 颗 and 粒 appear partially interchangeable, particularly with 珠-related head nouns and items associated with plants, fruits, and trees. The research also underscores that 辆 is primarily linked to car-related nouns, while 台 is used more versatile as a classifier for machines and electronic devices, including computers, printers, phones, cameras. 根 and 支 only overlap in the head noun 笔, and their roles diverge, with 根 being a versatile classifier and 支 also appearing as part of medical terms
Language and Linguistics in a Complex World Data, Interdisciplinarity, Transfer, and the Next Generation. ICAME41 Extended Book of Abstracts
This is a collection of papers, work-in-progress reports, and other contributions that were part of the ICAME41 digital conference
Translationese indicators for human translation quality estimation (based on English-to-Russian translation of mass-media texts)
A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Human translation quality estimation is a relatively new and challenging area of research,
because human translation quality is notoriously more subtle and subjective than machine
translation, which attracts much more attention and effort of the research community. At
the same time, human translation is routinely assessed by education and certification institutions,
as well as at translation competitions. Do the quality labels and scores generated
from real-life quality judgments align well with objective properties of translations? This
thesis puts this question to a test using machine learning methods.
Conceptually, this research is built around a hypothesis that linguistic properties characteristic
of translations, as a specific form of communication, can correlate with translation
quality. This assumption is often made in translation studies but has never been put to
a rigorous empirical test. Exploring translationese features in a quality estimation task
can help identify quality-related trends in translational behaviour and provide data-driven
insights into professionalism to improve training. Using translationese for quality estimation
fits well with the concept of quality in translation studies, because it is essentially a
document-level property. Linguistically-motivated translationese features are also more interpretable
than popular distributed representations and can explain linguistic differences
between quality categories in human translation.
We investigated (i) an extended set of Universal Dependencies-based morphosyntactic
features as well as two lexical feature sets capturing (ii) collocational properties of translations,
and (iii) ratios of vocabulary items in various frequency bands along with entropy
scores from n-gram models. To compare the performance of our feature sets in translationese
classifications and in quality estimation tasks against other representations, the
experiments were also run on tf-idf features, QuEst++ features and on contextualised
embeddings from a range of pre-trained language models, including the state-of-the-art
multilingual solution for machine translation quality estimation. Our major focus was on
document-level prediction, however, where the labels and features allowed, the experiments
were extended to the sentence level.
The corpus used in this research includes English-to-Russian parallel subcorpora of student
and professional translations of mass-media texts, and a register-comparable corpus of
non-translations in the target language. Quality labels for various subsets of student translations
come from a number of real-life settings: translation competitions, graded student
translations, error annotations and direct assessment. We overview approaches to benchmarking
quality in translation and provide a detailed description of our own annotation
experiments.
Of the three proposed translationese feature sets, morphosyntactic features, returned
the best results on all tasks. In many settings they were secondary only to contextualised
embeddings. At the same time, performance on various representations was contingent
on the type of quality captured by quality labels/scores. Using the outcomes of machine
learning experiments and feature analysis, we established that translationese properties of
translations were not equality reflected by various labels and scores. For example, professionalism
was much less related to translationese than expected. Labels from documentlevel
holistic assessment demonstrated maximum support for our hypothesis: lower-ranking
translations clearly exhibited more translationese. They bore more traces of mechanical
translational behaviours associated with following source language patterns whenever possible,
which led to the inflated frequencies of analytical passives, modal predicates, verbal
forms, especially copula verbs and verbs in the finite form. As expected, lower-ranking
translations were more repetitive and had longer, more complex sentences. Higher-ranking
translations were indicative of greater skill in recognising and counteracting translationese
tendencies. For document-level holistic labels as an approach to capture quality, translationese
indicators might provide a valuable contribution to an effective quality estimation
pipeline.
However, error-based scores, and especially scores from sentence-level direct assessment,
proved to be much less correlated by translationese and fluency issues, in general. This was
confirmed by relatively low regression results across all representations that had access only
to the target language side of the dataset, by feature analysis and by correlation between
error-based scores and scores from direct assessment
- …