3,084 research outputs found

    The Parallel Meaning Bank: Towards a Multilingual Corpus of Translations Annotated with Compositional Meaning Representations

    Full text link
    The Parallel Meaning Bank is a corpus of translations annotated with shared, formal meaning representations comprising over 11 million words divided over four languages (English, German, Italian, and Dutch). Our approach is based on cross-lingual projection: automatically produced (and manually corrected) semantic annotations for English sentences are mapped onto their word-aligned translations, assuming that the translations are meaning-preserving. The semantic annotation consists of five main steps: (i) segmentation of the text in sentences and lexical items; (ii) syntactic parsing with Combinatory Categorial Grammar; (iii) universal semantic tagging; (iv) symbolization; and (v) compositional semantic analysis based on Discourse Representation Theory. These steps are performed using statistical models trained in a semi-supervised manner. The employed annotation models are all language-neutral. Our first results are promising.Comment: To appear at EACL 201

    PersoNER: Persian named-entity recognition

    Full text link
    © 1963-2018 ACL. Named-Entity Recognition (NER) is still a challenging task for languages with low digital resources. The main difficulties arise from the scarcity of annotated corpora and the consequent problematic training of an effective NER pipeline. To abridge this gap, in this paper we target the Persian language that is spoken by a population of over a hundred million people world-wide. We first present and provide ArmanPerosNERCorpus, the first manually-annotated Persian NER corpus. Then, we introduce PersoNER, an NER pipeline for Persian that leverages a word embedding and a sequential max-margin classifier. The experimental results show that the proposed approach is capable of achieving interesting MUC7 and CoNNL scores while outperforming two alternatives based on a CRF and a recurrent neural network

    Learning Social Relation Traits from Face Images

    Full text link
    Social relation defines the association, e.g, warm, friendliness, and dominance, between two or more people. Motivated by psychological studies, we investigate if such fine-grained and high-level relation traits can be characterised and quantified from face images in the wild. To address this challenging problem we propose a deep model that learns a rich face representation to capture gender, expression, head pose, and age-related attributes, and then performs pairwise-face reasoning for relation prediction. To learn from heterogeneous attribute sources, we formulate a new network architecture with a bridging layer to leverage the inherent correspondences among these datasets. It can also cope with missing target attribute labels. Extensive experiments show that our approach is effective for fine-grained social relation learning in images and videos.Comment: To appear in International Conference on Computer Vision (ICCV) 201

    How compatible are our discourse annotation frameworks? Insights from mapping RST-DT and PDTB annotations

    Get PDF
    Discourse-annotated corpora are an important resource for the community, but they are often annotated according to different frameworks. This makes joint usage of the annotations difficult, preventing researchers from searching the corpora in a unified way, or using all annotated data jointly to train computational systems. Several theoretical proposals have recently been made for mapping the relational labels of different frameworks to each other, but these proposals have so far not been validated against existing annotations. The two largest discourse relation annotated resources, the Penn Discourse Treebank and the Rhetorical Structure Theory Discourse Treebank, have however been annotated on the same texts, allowing for a direct comparison of the annotation layers. We propose a method for automatically aligning the discourse segments, and then evaluate existing mapping proposals by comparing the empirically observed against the proposed mappings. Our analysis highlights the influence of segmentation on subsequent discourse relation labelling, and shows that while agreement between frameworks is reasonable for explicit relations, agreement on implicit relations is low. We identify several sources of systematic discrepancies between the two annotation schemes and discuss consequences for future annotation and for usage of the existing resources

    Domain transfer for deep natural language generation from abstract meaning representations

    Get PDF
    Stochastic natural language generation systems that are trained from labelled datasets are often domainspecific in their annotation and in their mapping from semantic input representations to lexical-syntactic outputs. As a result, learnt models fail to generalize across domains, heavily restricting their usability beyond single applications. In this article, we focus on the problem of domain adaptation for natural language generation. We show how linguistic knowledge from a source domain, for which labelled data is available, can be adapted to a target domain by reusing training data across domains. As a key to this, we propose to employ abstract meaning representations as a common semantic representation across domains. We model natural language generation as a long short-term memory recurrent neural network encoderdecoder, in which one recurrent neural network learns a latent representation of a semantic input, and a second recurrent neural network learns to decode it to a sequence of words. We show that the learnt representations can be transferred across domains and can be leveraged effectively to improve training on new unseen domains. Experiments in three different domains and with six datasets demonstrate that the lexical-syntactic constructions learnt in one domain can be transferred to new domains and achieve up to 75-100% of the performance of in-domain training. This is based on objective metrics such as BLEU and semantic error rate and a subjective human rating study. Training a policy from prior knowledge from a different domain is consistently better than pure in-domain training by up to 10%

    Multimodality and superdiversity: evidence for a research agenda

    Get PDF
    In recent years, social science research in superdiversity has questioned notions such as multiculturalism and pluralism, which hinge on and de facto reproduce ideological constructs such as separate and clearly identifiable national cultures and ethnic identities; research in language and superdiversity, in translanguaging, polylanguaging and metrolingualism have analogously questioned concepts such as multi- and bi-lingualism, which hinge on ideological constructs such as national languages, mother tongue and native speaker proficiency. Research in multimodality has questioned the centrality of language in everyday communication as well as its paradigmatic role to the understanding of communicative practices. While the multimodality of communication is generally acknowledged in work on language and superdiversity, the potential of a social semiotic multimodal approach for understanding communication in superdiversity has not been adequately explored and developed yet – and neither has the concept of superdiversity been addressed in multimodal research. The present paper wants to start to fill this gap. By discussing sign-making practices in the superdiverse context of Leeds Kirkgate Market (UK), it maps the potentials of an ethnographic social semiotics for the study of communication in superdiversity and sketches an agenda for research on multimodality and superdiversity, identifying a series of working hypotheses, research questions, areas of investigations and domains and fields of enquiry

    Faking and Conspiring about COVID-19: A Discursive Approach

    Get PDF
    In the more general climate of post-truth - a social trend reflecting a disregard for reliable ways of knowing what is true, mostly acted through massive use of misinformation and rhetoric calling for emotions - an alarming “infodemic” accompanied the COVID-19 pandemic, affecting healthy attitudes and behaviors and further lessening trust in science, institutions, and traditional media. Its two main representative items, fake and conspiracy news, have been widely analyzed in psycho-social research, even if scholars mostly acknowledged the cognitive and social dimensions of those items and devoted less attention to their discursive construction. In addition, these works did not directly compare and differentiate fake and conspiracy pathways. In order to address this gap and promote a wider understanding of these matters, a qualitative investigation of an Italian sample of 112 fake and conspiracy news articles, mostly spread during the first two COVID-19 “waves” (from March 2020 to January 2021) was realized. Our sample gathered news specifically coming from social media posts, representing easy and fast channels for viral content diffusion. We analyzed the selected texts by means of Diatextual Analysis and Discursive Action Model models, aimed to (a) offer “in depth” fine-grained analysis of the psycholinguistic and argumentative features of fake and conspiracy news, and (b) differentiate them in line with the classical Aristotle’s rhetoric stances of logos, ethos, and pathos, thus bridging traditional and current lines of thinking. Even though they may share common roots set in the post-truth climate, fake and conspiracy news engage in different rhetoric patterns since they present different enjeu and construct specific epistemic pathways. Implications for health- and digital-literacy are debated
    • …
    corecore