46,949 research outputs found

    A Corpus-Based Investigation of Definite Description Use

    Full text link
    We present the results of a study of definite descriptions use in written texts aimed at assessing the feasibility of annotating corpora with information about definite description interpretation. We ran two experiments, in which subjects were asked to classify the uses of definite descriptions in a corpus of 33 newspaper articles, containing a total of 1412 definite descriptions. We measured the agreement among annotators about the classes assigned to definite descriptions, as well as the agreement about the antecedent assigned to those definites that the annotators classified as being related to an antecedent in the text. The most interesting result of this study from a corpus annotation perspective was the rather low agreement (K=0.63) that we obtained using versions of Hawkins' and Prince's classification schemes; better results (K=0.76) were obtained using the simplified scheme proposed by Fraurud that includes only two classes, first-mention and subsequent-mention. The agreement about antecedents was also not complete. These findings raise questions concerning the strategy of evaluating systems for definite description interpretation by comparing their results with a standardized annotation. From a linguistic point of view, the most interesting observations were the great number of discourse-new definites in our corpus (in one of our experiments, about 50% of the definites in the collection were classified as discourse-new, 30% as anaphoric, and 18% as associative/bridging) and the presence of definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st

    Testing the Limits of Anaphoric Distance in Classical Arabic: a Corpus-Based Study

    Get PDF
    One of the central aims in research on anaphora is to discover the factors that determine the choice of referential expressions in discourse. Ariel (1988; 2001) offers an Accessibility Scale where referential expressions, including demonstratives, are categorized according to the values of anaphoric (i.e. textual) distance that each of these has in relation to its antecedent. The aim of this paper is to test Ariel’s (1988; 1990; 2001) claim that the choice to use proximal or distal anaphors is mainly determined by anaphoric distance. This claim is investigated in relation to singular demonstratives in a corpus of Classical Arabic (CA) prose texts by using word count to measure anaphoric distance. Results indicate that anaphoric distance cannot be taken as a consistent or reliable determinant of how anaphors are used in CA, and so Ariel’s claim is not supported by the results of this study. This also indicates that the universality of anaphoric distance, as a criterion of accessibility, is defied

    Text as scene: discourse deixis and bridging relations

    Get PDF
    En este artículo se presenta un nuevo marco, “el texto como escena”, que establece las bases para la anotación de dos relaciones de correferencia: la deixis discursiva y las relaciones de bridging. La incorporación de lo que llamamos escenas textuales y contextuales proporciona unas directrices de anotación más flexibles, que diferencian claramente entre tipos de categorías generales. Un marco como éste, capaz de tratar la deixis discursiva y las relaciones de bridging desde una perspectiva común, tiene como objetivo mejorar el bajo grado de acuerdo entre anotadores obtenido por esquemas de anotación anteriores, que son incapaces de captar las referencias vagas inherentes a estos dos tipos de relaciones. Las directrices aquí presentadas completan el esquema de anotación diseñado para enriquecer el corpus español CESS-ECE con información correferencial y así construir el corpus CESS-Ancora.This paper presents a new framework, “text as scene”, which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus.This paper has been supported by the FPU grant (AP2006-00994) from the Spanish Ministry of Education and Science. It is based on work supported by the CESS-ECE (HUM2004-21127), Lang2World (TIN2006- 15265-C06-06), and Praxem (HUM2006- 27378-E) projects

    Qualities, objects, sorts, and other treasures : gold digging in English and Arabic

    Get PDF
    In the present monograph, we will deal with questions of lexical typology in the nominal domain. By the term "lexical typology in the nominal domain", we refer to crosslinguistic regularities in the interaction between (a) those areas of the lexicon whose elements are capable of being used in the construction of "referring phrases" or "terms" and (b) the grammatical patterns in which these elements are involved. In the traditional analyses of a language such as English, such phrases are called "nominal phrases". In the study of the lexical aspects of the relevant domain, however, we will not confine ourselves to the investigation of "nouns" and "pronouns" but intend to take into consideration all those parts of speech which systematically alternate with nouns, either as heads or as modifiers of nominal phrases. In particular, this holds true for adjectives both in English and in other Standard European Languages. It is well known that adjectives are often difficult to distinguish from nouns, or that elements with an overt adjectival marker are used interchangeably with nouns, especially in particular semantic fields such as those denoting MATERIALS or NATlONALlTIES. That is, throughout this work the expression "lexical typology in the nominal domain" should not be interpreted as "a typology of nouns", but, rather, as the cross-linguistic investigation of lexical areas constitutive for "referring phrases" irrespective of how the parts-of-speech system in a specific language is defined

    Multiple analytical perspectives of the Eleme Anterior-Perfective

    Get PDF
    There is increasing recognition in typology that linguistic categories are language-specific and not universal, increasing the need for explicitness in language descriptions. In light of this development, I argue in this paper that preexisting labels and descriptions for a set of subject-marking TAM prefixes in Eleme do not adequately characterise the distribution and use of these forms, which is conditioned by the complex interaction of person and number features, Aktionsart, epistemic modality and information structure. In response to the challenges raised by these data, I argue that when multiple analytical perspectives are required to understand the function of a grammatical form, fine-grained quantitative analyses with description give a complex but useful basis on which to compare languages

    Comparing knowledge sources for nominal anaphora resolution

    Get PDF
    We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora by means of shallow lexico-semantic patterns. As corpora we use the British National Corpus (BNC), as well as the Web, which has not been previously used for this task. Our results show that (a) the knowledge encoded in WordNet is often insufficient, especially for anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite NP coreference, the Web-based method yields results comparable to those obtained using WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the dataset; (d) in both case studies, the BNC-based method is worse than the other methods because of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge gap often encountered in anaphora resolution, and handled examples with context-dependent relations between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems

    Presentational/Existential Structures in Spoken versus Written German: Es Gibt and SEIN

    Get PDF
    This article presents a synchronic, corpus-based examination of spoken German with regard to the distribution and function of presentational/ existential es gibt NP and a range of SEIN NP structures such as da SEIN , locative SEIN , es SEIN , and zero-locative SEIN . In particular, the use of da SEIN has been neglected in previous research. While es gibt is equally frequent in the spoken and written data, SEIN structures are typical of spoken German only, with da SEIN being the most frequent. The article concentrates on clauses with indefinite NPs, while the presentation of events with da and wider da-usage in spoken German are also considered

    Typological parameters of genericity

    Get PDF
    Different languages employ different morphosyntactic devices for expressing genericity. And, of course, they also make use of different morphosyntactic and semantic or pragmatic cues which may contribute to the interpretation of a sentence as generic rather than episodic. [...] We will advance the strong hypo thesis that it is a fundamental property of lexical elements in natural language that they are neutral with respect to different modes of reference or non-reference. That is, we reject the idea that a certain use of a lexical element, e.g. a use which allows reference to particular spatio-temporally bounded objects in the world, should be linguistically prior to all other possible uses, e.g. to generic and non-specific uses. From this it follows that we do not consider generic uses as derived from non-generic uses as it is occasionally assumed in the literature. Rather, we regard these two possibilities of use as equivalent alternative uses of lexical elements. The typological differences to be noted therefore concern the formal and semantic relationship of generic and non-generic uses to each other; they do not pertain to the question of whether lexical elements are predetermined for one of these two uses. Even supposing we found a language where generic uses are always zero-marked and identical to lexical sterns, we would still not assume that lexical elements in this language primarily have a generic use from which the non-generic uses are derived. (Incidentally, none of the languages examined, not even Vietnamese, meets this criterion.
    corecore