46,949 research outputs found
A Corpus-Based Investigation of Definite Description Use
We present the results of a study of definite descriptions use in written
texts aimed at assessing the feasibility of annotating corpora with information
about definite description interpretation. We ran two experiments, in which
subjects were asked to classify the uses of definite descriptions in a corpus
of 33 newspaper articles, containing a total of 1412 definite descriptions. We
measured the agreement among annotators about the classes assigned to definite
descriptions, as well as the agreement about the antecedent assigned to those
definites that the annotators classified as being related to an antecedent in
the text. The most interesting result of this study from a corpus annotation
perspective was the rather low agreement (K=0.63) that we obtained using
versions of Hawkins' and Prince's classification schemes; better results
(K=0.76) were obtained using the simplified scheme proposed by Fraurud that
includes only two classes, first-mention and subsequent-mention. The agreement
about antecedents was also not complete. These findings raise questions
concerning the strategy of evaluating systems for definite description
interpretation by comparing their results with a standardized annotation. From
a linguistic point of view, the most interesting observations were the great
number of discourse-new definites in our corpus (in one of our experiments,
about 50% of the definites in the collection were classified as discourse-new,
30% as anaphoric, and 18% as associative/bridging) and the presence of
definites which did not seem to require a complete disambiguation.Comment: 47 pages, uses fullname.sty and palatino.st
Testing the Limits of Anaphoric Distance in Classical Arabic: a Corpus-Based Study
One of the central aims in research on anaphora is to discover the factors that determine the choice of referential expressions in discourse. Ariel (1988; 2001) offers an Accessibility Scale where referential expressions, including demonstratives, are categorized according to the values of anaphoric (i.e. textual) distance that each of these has in relation to its antecedent. The aim of this paper is to test Ariel’s (1988; 1990; 2001) claim that the choice to use proximal or distal anaphors is mainly determined by anaphoric distance. This claim is investigated in relation to singular demonstratives in a corpus of Classical Arabic (CA) prose texts by using word count to measure anaphoric distance. Results indicate that anaphoric distance cannot be taken as a consistent or reliable determinant of how anaphors are used in CA, and so Ariel’s claim is not supported by the results of this study. This also indicates that the universality of anaphoric distance, as a criterion of accessibility, is defied
Text as scene: discourse deixis and bridging relations
En este artículo se presenta un nuevo marco, “el texto como escena”, que establece
las bases para la anotación de dos relaciones de correferencia: la deixis discursiva y las
relaciones de bridging. La incorporación de lo que llamamos escenas textuales y contextuales
proporciona unas directrices de anotación más flexibles, que diferencian claramente entre tipos
de categorías generales. Un marco como éste, capaz de tratar la deixis discursiva y las
relaciones de bridging desde una perspectiva común, tiene como objetivo mejorar el bajo grado
de acuerdo entre anotadores obtenido por esquemas de anotación anteriores, que son incapaces
de captar las referencias vagas inherentes a estos dos tipos de relaciones. Las directrices aquí
presentadas completan el esquema de anotación diseñado para enriquecer el corpus español
CESS-ECE con información correferencial y así construir el corpus CESS-Ancora.This paper presents a new framework, “text as scene”, which lays the foundations for
the annotation of two coreferential links: discourse deixis and bridging relations. The
incorporation of what we call textual and contextual scenes provides more flexible annotation
guidelines, broad type categories being clearly differentiated. Such a framework that is capable
of dealing with discourse deixis and bridging relations from a common perspective aims at
improving the poor reliability scores obtained by previous annotation schemes, which fail to
capture the vague references inherent in both these links. The guidelines presented here
complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with
coreference information, thus building the CESS-Ancora corpus.This paper has been supported by the FPU
grant (AP2006-00994) from the Spanish
Ministry of Education and Science. It is based
on work supported by the CESS-ECE
(HUM2004-21127), Lang2World (TIN2006-
15265-C06-06), and Praxem (HUM2006-
27378-E) projects
Qualities, objects, sorts, and other treasures : gold digging in English and Arabic
In the present monograph, we will deal with questions of lexical typology in the nominal domain. By the term "lexical typology in the nominal domain", we refer to crosslinguistic regularities in the interaction between (a) those areas of the lexicon whose elements are capable of being used in the construction of "referring phrases" or "terms" and (b) the grammatical patterns in which these elements are involved. In the traditional analyses of a language such as English, such phrases are called "nominal phrases". In the study of the lexical aspects of the relevant domain, however, we will not confine ourselves to the investigation of "nouns" and "pronouns" but intend to take into consideration all those parts of speech which systematically alternate with nouns, either as heads or as modifiers of nominal phrases. In particular, this holds true for adjectives both in English and in other Standard European Languages. It is well known that adjectives are often difficult to distinguish from nouns, or that elements with an overt adjectival marker are used interchangeably with nouns, especially in particular semantic fields such as those denoting MATERIALS or NATlONALlTIES. That is, throughout this work the expression "lexical typology in the nominal domain" should not be interpreted as "a typology of nouns", but, rather, as the cross-linguistic investigation of lexical areas constitutive for "referring phrases" irrespective of how the parts-of-speech system in a specific language is defined
Multiple analytical perspectives of the Eleme Anterior-Perfective
There is increasing recognition in typology that linguistic categories are
language-specific and not universal, increasing the need for explicitness in
language descriptions. In light of this development, I argue in this paper that preexisting
labels and descriptions for a set of subject-marking TAM prefixes in
Eleme do not adequately characterise the distribution and use of these forms,
which is conditioned by the complex interaction of person and number features,
Aktionsart, epistemic modality and information structure. In response to the
challenges raised by these data, I argue that when multiple analytical perspectives
are required to understand the function of a grammatical form, fine-grained
quantitative analyses with description give a complex but useful basis on which to
compare languages
Comparing knowledge sources for nominal anaphora resolution
We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora
and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links
encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora
by means of shallow lexico-semantic patterns. As corpora we use the British National
Corpus (BNC), as well as the Web, which has not been previously used for this task. Our
results show that (a) the knowledge encoded in WordNet is often insufficient, especially for
anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for
other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite
NP coreference, the Web-based method yields results comparable to those obtained using
WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the
dataset; (d) in both case studies, the BNC-based method is worse than the other methods because
of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge
gap often encountered in anaphora resolution, and handled examples with context-dependent relations
between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling
of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
Presentational/Existential Structures in Spoken versus Written German: Es Gibt and SEIN
This article presents a synchronic, corpus-based examination of spoken German with regard to the distribution and function of presentational/ existential es gibt NP and a range of SEIN NP structures such as da SEIN , locative SEIN , es SEIN , and zero-locative SEIN . In particular, the use of da SEIN has been neglected in previous research. While es gibt is equally frequent in the spoken and written data, SEIN structures are typical of spoken German only, with da SEIN being the most frequent. The article concentrates on clauses with indefinite NPs, while the presentation of events with da and wider da-usage in spoken German are also considered
Typological parameters of genericity
Different languages employ different morphosyntactic devices for expressing genericity. And, of course, they also make use of different morphosyntactic and semantic or pragmatic cues which may contribute to the interpretation of a sentence as generic rather than episodic. [...] We will advance the strong hypo thesis that it is a fundamental property of lexical elements in natural language that they are neutral with respect to different modes of reference or non-reference. That is, we reject the idea that a certain use of a lexical element, e.g. a use which allows reference to particular spatio-temporally bounded objects in the world, should be linguistically prior to all other possible uses, e.g. to generic and non-specific uses. From this it follows that we do not consider generic uses as derived from non-generic uses as it is occasionally assumed in the literature. Rather, we regard these two possibilities of use as equivalent alternative uses of lexical elements. The typological differences to be noted therefore concern the formal and semantic relationship of generic and non-generic uses to each other; they do not pertain to the question of whether lexical elements are predetermined for one of these two uses. Even supposing we found a language where generic uses are always zero-marked and identical to lexical sterns, we would still not assume that lexical elements in this language primarily have a generic use from which the non-generic uses are derived. (Incidentally, none of the languages examined, not even Vietnamese, meets this criterion.
- …