8 research outputs found

    Empirical methods for the study of denotation in nominalizations in Spanish

    Get PDF
    This article deals with deverbal nominalizations in Spanish; concretely, we focus on the denotative distinction between event and result nominalizations. The goals of this work is twofold: first, to detect the most relevant features for this denotative distinction; and, second, to build an automatic classification system of deverbal nominalizations according to their denotation. We have based our study on theoretical hypotheses dealing with this semantic distinction and we have analyzed them empirically by means of Machine Learning techniques which are the basis of the ADN-Classifier. This is the first tool that aims to automatically classify deverbal nominalizations in event, result, or underspecified denotation types in Spanish. The ADN-Classifier has helped us to quantitatively evaluate the validity of our claims regarding deverbal nominalizations. We set up a series of experiments in order to test the ADN-Classifier with different models and in different realistic scenarios depending on the knowledge resources and natural language processors available. The ADN-Classifier achieved good results (87.20% accuracy)

    (In)definiteness Spread in Semitic Construct State: Does it Really Exist?

    Get PDF
    To argue against a long established assumption seems to be not that easy task. In this article, I argue against one of those assumptions, namely (in)definiteness spread in Semitic Construct State (CS). I argue that CSs are of two types: either definite or indefinite. The former refers to those CSs, where the head N is syntactically definite, in the sense of having the definite article al-/ha- (the, Arabic/Hebrew), and the latter to those not having it. Three tenets constitute the crux of this paper: i) the controversy (in)definiteness spread gives rise to among Semitic scholars, ii) there does exist good evidence that the head N of CSs can take the definite article in Arabic and Hebrew, and iii) in Arabic the absence of the indefinite article (marker) -n on the head N has presumably to do with what I call VCR (= the Vowel Contextualization Rule) like several similar other phonological phenomena in the language. As for the specifici-ty/uniqueness denoted by the head N of a CS in some contexts, I propose that such specificity has nothing to do with definiteness spread, but rather it may be linked to a Universal Gram-mar (UG) principle, which correlates specificity/uniqueness with possessivization cross-linguistically, or to a UG parameter in the case of Semitic CSs

    Proceedings

    Get PDF
    Proceedings of the Ninth International Workshop on Treebanks and Linguistic Theories. Editors: Markus Dickinson, Kaili Müürisep and Marco Passarotti. NEALT Proceedings Series, Vol. 9 (2010), 268 pages. © 2010 The editors and contributors. Published by Northern European Association for Language Technology (NEALT) http://omilia.uio.no/nealt . Electronically published at Tartu University Library (Estonia) http://hdl.handle.net/10062/15891

    Translationese indicators for human translation quality estimation (based on English-to-Russian translation of mass-media texts)

    Get PDF
    A thesis submitted in partial fulfilment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Human translation quality estimation is a relatively new and challenging area of research, because human translation quality is notoriously more subtle and subjective than machine translation, which attracts much more attention and effort of the research community. At the same time, human translation is routinely assessed by education and certification institutions, as well as at translation competitions. Do the quality labels and scores generated from real-life quality judgments align well with objective properties of translations? This thesis puts this question to a test using machine learning methods. Conceptually, this research is built around a hypothesis that linguistic properties characteristic of translations, as a specific form of communication, can correlate with translation quality. This assumption is often made in translation studies but has never been put to a rigorous empirical test. Exploring translationese features in a quality estimation task can help identify quality-related trends in translational behaviour and provide data-driven insights into professionalism to improve training. Using translationese for quality estimation fits well with the concept of quality in translation studies, because it is essentially a document-level property. Linguistically-motivated translationese features are also more interpretable than popular distributed representations and can explain linguistic differences between quality categories in human translation. We investigated (i) an extended set of Universal Dependencies-based morphosyntactic features as well as two lexical feature sets capturing (ii) collocational properties of translations, and (iii) ratios of vocabulary items in various frequency bands along with entropy scores from n-gram models. To compare the performance of our feature sets in translationese classifications and in quality estimation tasks against other representations, the experiments were also run on tf-idf features, QuEst++ features and on contextualised embeddings from a range of pre-trained language models, including the state-of-the-art multilingual solution for machine translation quality estimation. Our major focus was on document-level prediction, however, where the labels and features allowed, the experiments were extended to the sentence level. The corpus used in this research includes English-to-Russian parallel subcorpora of student and professional translations of mass-media texts, and a register-comparable corpus of non-translations in the target language. Quality labels for various subsets of student translations come from a number of real-life settings: translation competitions, graded student translations, error annotations and direct assessment. We overview approaches to benchmarking quality in translation and provide a detailed description of our own annotation experiments. Of the three proposed translationese feature sets, morphosyntactic features, returned the best results on all tasks. In many settings they were secondary only to contextualised embeddings. At the same time, performance on various representations was contingent on the type of quality captured by quality labels/scores. Using the outcomes of machine learning experiments and feature analysis, we established that translationese properties of translations were not equality reflected by various labels and scores. For example, professionalism was much less related to translationese than expected. Labels from documentlevel holistic assessment demonstrated maximum support for our hypothesis: lower-ranking translations clearly exhibited more translationese. They bore more traces of mechanical translational behaviours associated with following source language patterns whenever possible, which led to the inflated frequencies of analytical passives, modal predicates, verbal forms, especially copula verbs and verbs in the finite form. As expected, lower-ranking translations were more repetitive and had longer, more complex sentences. Higher-ranking translations were indicative of greater skill in recognising and counteracting translationese tendencies. For document-level holistic labels as an approach to capture quality, translationese indicators might provide a valuable contribution to an effective quality estimation pipeline. However, error-based scores, and especially scores from sentence-level direct assessment, proved to be much less correlated by translationese and fluency issues, in general. This was confirmed by relatively low regression results across all representations that had access only to the target language side of the dataset, by feature analysis and by correlation between error-based scores and scores from direct assessment

    Investigation of the Distributions, Derivation, and Generalizations in Arabic Plural System

    Full text link
    The Arabic plural system poses a challenge to current morphological accounts since the regular sound plural that is formed by suffixation contrasts with irregular broken plurals formed by internally modifying the singular stem. Although aspects of the Arabic plural system have been widely studied since the early ages of Arab grammarians (Abu Al-Saud 1971; Yaaqub 2004), there are several issues that remain unresolved and warrant further investigation. This dissertation uses a combination of statistical, qualitative, and computational approaches to provide a comprehensive account of several outstanding problems in Arabic nominal plurals. Theoretical investigations of Arabic nominal plurals have led to conflicting results about the status of Arabic nominal ablaut as a minority default system (McCarthy & Prince 1990; Boudelaa & Gaskell 2002). Apart from this, little has been said about other aspects of the distribution of Arabic plurals, namely the interplay between regularity of the plural and its frequency in actual language use (Bybee 2001). This dissertation takes a usage-based approach to revisit the question of the status of Arabic as a minority-default system and to examine the interplay between the productivity and the frequency in actual language use of plural types. The results from the statistical distribution of sound and broken plurals are in line with the claim Arabic pluralization is a minority default system. The results are also consistent with the prediction made by the usage-based model that low type frequency compensates for weak lexical strength by high token frequency. The dissertation also investigates the role of singular stem weight on plural derivation. Numerous attempts have been made to model Arabic broken plurals, which fall into three main groups according to their specific morphological approach: Generative Morphophonology (Brame 1970; Levy 1971), Root-&-Pattern Morphology (McCarthy 1979; Hammond 1988), and Prosodic Morphology (McCarthy and Prince 1990). However, there has not been any investigation of the influence of the additive weight of singular stems on the derivation of plural forms. Results from qualitative and computational analyses provide evidence for the role of simple additive weight on the mapping of singular input stems to plural outputs in Arabic broken plural. Stem weight does not completely determine the plural pattern. Rather, its role can be viewed as a (quasi) well-formedness condition on plural templates based on input forms. The dissertation examines the types of information that are relevant to the Arabic plural system by performing a computational analysis on singular-plural pairs collected from a comprehensive corpus. The performance of multiple K-Nearest Neighbor (KNN) classifiers that use different combinations of factors to select plural patterns are compared to determine the importance of each factor. The results show that the CV template, vowel melody and semantic qualities of the singular all contribute to determining the shape of the plural template, though with varying degrees. The syllabic shape of the singular forms of Arabic nouns is the major factor in predicting their plural forms, followed by the vowel melody and the semantic features.PHDLinguisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/168038/1/fahadal_1.pd

    SinSpeC 01

    Get PDF
    Volume 1 of the Working Papers of the SFB732 "Incremental Specification in Context" (SinSpeC

    The boy from Bundaberg : studies in Melanesian linguistics in honour of Tom Dutton

    Get PDF

    The Restriction on Predicative Codas in Existential There-Clauses. Theoretical and Empirical Perspectives

    Get PDF
    This thesis investigates the so-called Predicate Restriction (PR) in English existential sentences. The general consensus in the literature is that only stage-level predicates (i.e. predicates denoting temporary properties, like 'sick') can appear in the coda of an existential sentence, while individual-level predicates (i.e. predicates that denote permanent properties, like 'tall') are excluded. After a discussion of the various theoretical approaches to the PR (syntactic, semantic, as well as pragmatic), this thesis pursues two empirical studies, using both corpus data from the BNC and a judgment study. Both studies confirm the theoretical preference for SLP (in a general sense), but also show that the distinction between predicates should not be reduced to a binary SLP–ILP choice. Predicates should rather be analysed in a more fine-grained system with multiple factors, as done in Jäger (2001)
    corecore