5,848 research outputs found
Training and Scaling Preference Functions for Disambiguation
We present an automatic method for weighting the contributions of preference
functions used in disambiguation. Initial scaling factors are derived as the
solution to a least-squares minimization problem, and improvements are then
made by hill-climbing. The method is applied to disambiguating sentences in the
ATIS (Air Travel Information System) corpus, and the performance of the
resulting scaling factors is compared with hand-tuned factors. We then focus on
one class of preference function, those based on semantic lexical collocations.
Experimental results are presented showing that such functions vary
considerably in selecting correct analyses. In particular we define a function
that performs significantly better than ones based on mutual information and
likelihood ratios of lexical associations.Comment: To appear in Computational Linguistics (probably volume 20, December
94). LaTeX, 21 page
Intersections and differentiations: a corpus-assisted discourse study of gender representations in the British press before, during and after the London Olympics 2012
This study examines the impact of a global sports event on gender representations in media reporting. Whereas previous research on gender, sport and media has been mainly concerned with sports events in the North American or Australian context, this study investigates the British media reporting before, during and after the London Olympics 2012. Our study follows the approach of Corpus-Assisted Discourse Studies (CADS) and uses both quantitative and qualitative research procedures. The results reveal more balanced gender representations during the London Olympics in that the ‘regular’ biased associations were supressed in favour of positive references to female achievements. However, little carry-though of the ‘gains’ was noted. Also, this study shows that the positive associations intersected with national sentiments and were used to celebrate the nation-state. At the same time, some subtle resistance was observed to accepting as ‘truly’ British the non-white athletes and those not born in Britain
Word-to-Word Models of Translational Equivalence
Parallel texts (bitexts) have properties that distinguish them from other
kinds of parallel data. First, most words translate to only one other word.
Second, bitext correspondence is noisy. This article presents methods for
biasing statistical translation models to reflect these properties. Analysis of
the expected behavior of these biases in the presence of sparse data predicts
that they will result in more accurate models. The prediction is confirmed by
evaluation with respect to a gold standard -- translation models that are
biased in this fashion are significantly more accurate than a baseline
knowledge-poor model. This article also shows how a statistical translation
model can take advantage of various kinds of pre-existing knowledge that might
be available about particular language pairs. Even the simplest kinds of
language-specific knowledge, such as the distinction between content words and
function words, is shown to reliably boost translation model performance on
some tasks. Statistical models that are informed by pre-existing knowledge
about the model domain combine the best of both the rationalist and empiricist
traditions
What effect does short term Study Abroad (SA) have on learners’ vocabulary knowledge?
This thesis describes a study which tracks longitudinal changes in vocabularyknowledge during a short-term Study Abroad (SA) experience. A test ofproductive vocabulary knowledge, Lex30 (Meara & Fitzpatrick, 2000),requiring the production of word association responses, is used to elicit vocabulary from 38 Japanese L1 learners of English at four test times at equal intervals before and after an SA experience. The study starts by investigating whether there are changes in both the total number of words and in the number of less frequently occurring words produced by SA participants. Three additional ways of measuring the development of lexical knowledge over time are then proposed. The first examines changes in the ability of participants of different proficiency levels in producing collocates in response to Lex30 cue words. The second tracks changes in spelling accuracy to measure if improvements take place over time. The third analysis uses an online measuring instrument (Wmatrix; Rayson, 2009) to explore if there are any changes in the mastery of specific semantic domains. The results show that there is significant growth in the productive use of less frequent vocabulary knowledge during the SA period. There is also an increase in collocation production with lower proficiency participants and evidence of some improvement in the way certain vocabulary items are spelled. The tendency for SA learners to produce more words from semantic groups related to SA experiences is also demonstrated. Post-SA tests show that while some knowledge attrition occurs it does not decline to pre-SA levels. The studyshows how short-term SA programmes can be evaluated using a word association test, contributing to a better understanding of how vocabularydevelops during intensive language learning experiences. It also demonstrates the gradual shift of productive vocabulary knowledge from partial word knowledge to a more complete state of productive mastery
"Immerse yourself in the traditions of the esimply way of life": analysing English translations of Italian agriturismi websites
This article aims to analyze and compare the Italian and the British languages of tourism, and the language used by translators in their translations of tourist websites into English.
In particular, we will focus on mistranslations of collocations. The tools used for analysis are two sets of corpora: a comparable corpus made up of original Italian agriturismi websites and original British farmhouse holiday websites, and a parallel corpus made up of original Italian agriturismi websites and their translations into English. The theoretical framework adopted is the one proposed by Sinclair in his description of the phraseological approach to language. The results of the analysis show the importance of studying collocations across cultures and the strict relationship between language, culture, and promotional strategies
Recommended from our members
“Do I speak better?” A longitudinal study of lexical chunking in the spoken language of two Japanese students
The prominence of lexical chunks or prefabricated language has grown over recent years, however there have been few longitudinal case studies exploring changes in non-native speaker (NNS) speech and little work done involving NNSs in identifying chunks in their own speech. This study attempts to track changes in two intermediate-level Japanese students' spoken usage of lexical chunks over a period of five months in the UK. Each NNS was recorded three times in conversational long turns at two-month intervals.
Twelve native speakers (NSs) were asked to order transcripts of each student's speech by perceived fluency level and three also underlined the lexical chunks; however there was little coherence amongst NSs in these tasks. Identification of chunks using Wordsmith software suggests an overall rise in the percentage of talk within chunks and a reduction in ill-formed chunks over the five months.
Following some awareness-raising training on identifying lexical chunks, the Japanese students themselves were asked to identify chunks within their own transcripts. Despite the difficulty of the task, they were able to do this and additionally offered insights into which chunks were common for them. These insights included an awareness of typical Japanese phrases and how they felt their speech had changed overall. A further recording and transcribing cycle suggests that this training resulted in some short-term uptake as the percentage of chunks used increased after the lessons. Both students found it highly motivating to record and analyse transcripts of their talk as they could see progress in their own spoken language development
Assessing English language learners' collocation knowledge:A systematic review of receptive and productive measurements
Since collocation knowledge is integral to second language vocabulary depth, it necessitates a careful examination of various measurement approaches. To this end, the current paper provides an overview and evaluation of extant collocation measurements used in empirical studies on L2 English (N = 153) published between 1980 and 2023 indexed in the SSCI, SCIE, AHCI, SCOPUS, and ERIC databases. Six instruments, seven item formats, and three other assessment tools were identified and reviewed for the assessment of receptive and productive collocation knowledge. The review focused on the collocation knowledge measured by each tool, the instrument and/or item format employed, item design, reported reliability, and potential drawbacks of employing each instrument and item format in research or practice. The review proposes several theoretical and practical considerations for future assessments of and research on English collocation knowledge.</p
Differential rotation decay in the radiative envelopes of CP stars
Stars of spectral classes A and late B are almost entirely radiative. CP
stars are a slowly rotating subgroup of these stars. It is possible that they
possessed long-lived accretion disks in their T Tauri phase. Magnetic coupling
of disk and star leads to rotational braking at the surface of the star.
Microscopic viscosities are extremely small and will not be able to reduce the
rotation rate of the core of the star. We investigate the question whether
magneto-rotational instability can provide turbulent angular momentum
transport. We illuminate the question whether or not differential rotation is
present in CP stars. Numerical MHD simulations of thick stellar shells are
performed. An initial differential rotation law is subject to the influence of
a magnetic field. The configuration gives indeed rise to magneto-rotational
instability. The emerging flows and magnetic fields transport efficiently
angular momentum outwards. Weak dependence on the magnetic Prandtl number
(~0.01 in stars) is found from the simulations. Since the estimated time-scale
of decay of differential rotation is 10^7-10^8 yr and comparable to the
life-time of A stars, we find the braking of the core to be an ongoing process
in many CP stars. The evolution of the surface rotation of CP stars with age
will be an observational challenge and of much value for verifying the
simulations.Comment: 8 pages, 11 figures; submitted to Astron. & Astrophy
Recommended from our members
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach
Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. . . ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing concern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying material
- …