11,620 research outputs found

    Seminar Users in the Arabic Twitter Sphere

    Full text link
    We introduce the notion of "seminar users", who are social media users engaged in propaganda in support of a political entity. We develop a framework that can identify such users with 84.4% precision and 76.1% recall. While our dataset is from the Arab region, omitting language-specific features has only a minor impact on classification performance, and thus, our approach could work for detecting seminar users in other parts of the world and in other languages. We further explored a controversial political topic to observe the prevalence and potential potency of such users. In our case study, we found that 25% of the users engaged in the topic are in fact seminar users and their tweets make nearly a third of the on-topic tweets. Moreover, they are often successful in affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Multilingual Models for Compositional Distributed Semantics

    Full text link
    We present a novel technique for learning semantic representations, which extends the distributional hypothesis to multilingual data and joint-space embeddings. Our models leverage parallel data and learn to strongly align the embeddings of semantically equivalent sentences, while maintaining sufficient distance between those of dissimilar sentences. The models do not rely on word alignments or any syntactic information and are successfully applied to a number of diverse languages. We extend our approach to learn semantic representations at the document level, too. We evaluate these models on two cross-lingual document classification tasks, outperforming the prior state of the art. Through qualitative analysis and the study of pivoting effects we demonstrate that our representations are semantically plausible and can capture semantic relationships across languages without parallel data.Comment: Proceedings of ACL 2014 (Long papers

    Interpreting political discourse at the Pan African Parliament into Arabic

    Get PDF
    in partial fulfilment of Masters in Translation (Option: Interpreting) at the UNIVERSITY OF THE WITWATERSRAND August 2015This study analyses the interpreting of political discourse at the Pan African Parliament (PAP) into Arabic, with a special focus on conflict resolution in the context of PAP debates on conflict situations in Africa and with special reference to the Libyan conflict. The debate I examine was held within a broader context of international dominant discourses and competing narratives of conflicts and foreign interventions, dominated by the United Nations Security Council’s (UNSC) resolutions on the Libyan conflict. But its battleground includes a broader African context, characterized by a certain degree of resistance as reflected in the discourse of conflict resolution that is informed by the immediate context of the PAP – an organ of the African Union (AU) which seeks to assist in conflict resolution and promote democracy and human rights throughout the continent in actualization of its motto: One Africa, One Voice. My analysis focuses on Critical Discourse Analysis (CDA) and I investigate in particular the conditions of reproduction of dominant discourses, narratives and framings in order to ascertain how they influence debates in the PAP and also what influence they have on interpreting strategies. The aim is to identify the degree to which these elements influence an interpreter’s role and performance and, in turn, how the interpreter then influences the course of a discussion. Based on this, the analysis seeks to determine certain variables, such as discursive and linguistic patterns, discursive moves, style, argumentation, ideologically and politically charged expressions, etc., so as to trace the influence of dominant discourses and competing narratives in the context of interpreting, and specifically how it relates to the political discourse on the Libyan conflict. In investigating interpreting strategies, the analysis also aims at pinpointing elements such as omissions, shifts, repetitions and the occurrence of certain discursive or linguistic elements that would demonstrate the scope of such influences. The analysis furthermore highlights some difficulties and recommends some points for future research. Key words: CDA, interpreter’s role, interpreting strategies, narratives, framing, cognitive load, structure of discourse, linguistic features, discursive strategies, conflict discourse, Africa, conflict resolutio

    Learning causality for Arabic - proclitics

    Get PDF
    The use of prefixed particles is a prevalent linguistic form to express causation in Arabic Language. However, such particles are complicated and highly ambiguous as they imply different meanings according to their position in the text. This ambiguity emphasizes the high demand for a large-scale annotated corpus that contains instances of these particles. In this paper, we present the process of building our corpus, which includes a collection of annotated sentences each containing an instance of a candidate causal particle. We use the corpus to construct and optimize predictive models for the task of causation recognition. The performance of the best models is significantly better than the baselines. Arabic is a less-resourced language and we hope this work would help in building better Information Extraction systems

    Linking discourse marker inventories

    Get PDF
    The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data
    • …
    corecore