25 research outputs found

    DANSK and DaCy 2.6.0: Domain Generalization of Danish Named Entity Recognition

    Full text link
    Named entity recognition is one of the cornerstones of Danish NLP, essential for language technology applications within both industry and research. However, Danish NER is inhibited by a lack of available datasets. As a consequence, no current models are capable of fine-grained named entity recognition, nor have they been evaluated for potential generalizability issues across datasets and domains. To alleviate these limitations, this paper introduces: 1) DANSK: a named entity dataset providing for high-granularity tagging as well as within-domain evaluation of models across a diverse set of domains; 2) DaCy 2.6.0 that includes three generalizable models with fine-grained annotation; and 3) an evaluation of current state-of-the-art models' ability to generalize across domains. The evaluation of existing and new models revealed notable performance discrepancies across domains, which should be addressed within the field. Shortcomings of the annotation quality of the dataset and its impact on model training and evaluation are also discussed. Despite these limitations, we advocate for the use of the new dataset DANSK alongside further work on the generalizability within Danish NER

    Direct Causation: A New Approach to an Old Question

    Get PDF
    Causative constructions come in lexical and periphrastic variants, exemplified in English by Sam killed Lee and Sam caused Lee to die. While use of the former, the lexical causative, entails the truth of the latter, an entailment in the other direction does not hold. The source of this asymmetry is commonly ascribed to the lexical causative having an additional prerequisite of “direct causation , such that the causative relation holds between a contiguous cause and effect (Fodor 1970, Katz 1970). However, this explanation encounters both empirical and theoretical problems (Nelleman & van der Koot 2012). To explain the source of the directness inferences (as well as other longstanding puzzles), we propose a formal analysis based on the framework of Structural Equation Models (SEMs) (Pearl 2000) which provides the necessary background for licensing causal inferences. Specifically, we provide a formalization of a \u27sufficient set of conditions\u27 within a model and demonstrate its role in the selectional parameters of causative descriptions. We argue that “causal sufficiency” is not a property of singular conditions, but rather sets of conditions, which are individually necessary but only sufficient when taken together (a view originally motivated in the philosophical literature by Mackie 1965). We further introduce the notion of a “completion event” of a sufficient set, which is critical to explain the particular inferential profile of lexical causatives

    MULTILINGUAL SENTIMENT NORMALIZATION FOR SCANDINAVIAN LANGUAGES

    Get PDF
    In this paper, we address the challenge of multilingual sentiment analysis using a traditional lexicon and rule-based sentiment instrument that is tailored to capture sentiment patterns in a particular language. Focusing on a case study of three closely related Scandinavian languages (Danish, Norwegian, and Swedish) and using three tailored versions of VADER, we measure the relative degree of variation in valence using the OPUS corpus. We found that scores for Swedish are systematically skewed lower than Danish for translational pairs, and that scores for Norwegian are skewed higher for both other languages. We use a neural network to optimize the fit between Norwegian and Swedish respectively and Danish as the reference (target) language

    Speaker Attitude and Sexual Orientation Affect Phonetic Imitation

    Get PDF
    Numerous studies have documented the phenomenon of phonetic convergence: the process by which speakers alter their productions to become more similar on some phonetic or acoustic dimension to those of their interlocutor. Though social factors have been suggested as a motivator for imitation, few studies have established a tight connection between these extralinguistic factors and a speaker’s likelihood to imitate. The present study explores the effects of perceived sexual orientation and speaker attitude toward the interlocutor on the likelihood of imitation for extended VOT. Experimental results show that the extent of phonetic convergence (and divergence) depends on the perceived sexual orientation of the talker as well as whether the speaker is positively disposed to the interlocutor

    The Danish Gigaword Project

    Full text link
    Danish is a North Germanic/Scandinavian language spoken primarily in Denmark, a country with a tradition of technological and scientific innovation. However, from a technological perspective, the Danish language has received relatively little attention and, as a result, Danish language technology is hard to develop, in part due to a lack of large or broad-coverage Danish corpora. This paper describes the Danish Gigaword project, which aims to construct a freely-available one billion word corpus of Danish text that represents the breadth of the written language

    The Middle Construction in Mandarin Chinese

    No full text
    The middle is an un accusative construction which expresses a modal generalization over events\ud (Keyser and Roeper 1984). Although the middle is not homogenous cross-linguistically (Ting 2006),\ud manifestations of the middle have been observed in most Indo-European languages. In this thesis, I\ud will develop criteria for middles based on cross-linguistic generalizations and argue for the existence of\ud a middle construction in Chinese. Chinese has a class of so-called 'notional passives,' unaccusative\ud sentences which display active morphology but receive passive interpretation. I will provide evidence\ud that the notional passive is distinct both structurally and semantically from the canonical Chinese\ud passive and demonstrate the inadequacy of the topic-comment account of such constructions proposed\ud by Li and Thompson (1981).\ud My account of the middle will crucially define it as a resultative form in Chinese, appearing\ud exclusively with Resultative Verb Compounds (RVCs). I will adopt Cheng and Huang's (1994)\ud classification of RVCs into four verbal subcategories (unergative, transitive, ergative, and causative)\ud and consider the syntactic and semantic properties of the resultative middle based on the argument\ud structure of its component predicates. Using data, I will analyze whether these Chinese middle verbs\ud pattern in a predictable, cross-linguistically consistent way, considerin~ syntactic distribution,\ud aspectual composition, and semantic constraints on middle formation
    corecore