11,620 research outputs found
Seminar Users in the Arabic Twitter Sphere
We introduce the notion of "seminar users", who are social media users
engaged in propaganda in support of a political entity. We develop a framework
that can identify such users with 84.4% precision and 76.1% recall. While our
dataset is from the Arab region, omitting language-specific features has only a
minor impact on classification performance, and thus, our approach could work
for detecting seminar users in other parts of the world and in other languages.
We further explored a controversial political topic to observe the prevalence
and potential potency of such users. In our case study, we found that 25% of
the users engaged in the topic are in fact seminar users and their tweets make
nearly a third of the on-topic tweets. Moreover, they are often successful in
affecting mainstream discourse with coordinated hashtag campaigns.Comment: to appear in SocInfo 201
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Multilingual Models for Compositional Distributed Semantics
We present a novel technique for learning semantic representations, which
extends the distributional hypothesis to multilingual data and joint-space
embeddings. Our models leverage parallel data and learn to strongly align the
embeddings of semantically equivalent sentences, while maintaining sufficient
distance between those of dissimilar sentences. The models do not rely on word
alignments or any syntactic information and are successfully applied to a
number of diverse languages. We extend our approach to learn semantic
representations at the document level, too. We evaluate these models on two
cross-lingual document classification tasks, outperforming the prior state of
the art. Through qualitative analysis and the study of pivoting effects we
demonstrate that our representations are semantically plausible and can capture
semantic relationships across languages without parallel data.Comment: Proceedings of ACL 2014 (Long papers
Interpreting political discourse at the Pan African Parliament into Arabic
in partial fulfilment of Masters in Translation (Option: Interpreting)
at the
UNIVERSITY OF THE WITWATERSRAND
August 2015This study analyses the interpreting of political discourse at the Pan African Parliament (PAP)
into Arabic, with a special focus on conflict resolution in the context of PAP debates on conflict
situations in Africa and with special reference to the Libyan conflict. The debate I examine was
held within a broader context of international dominant discourses and competing narratives of
conflicts and foreign interventions, dominated by the United Nations Security Council’s (UNSC)
resolutions on the Libyan conflict. But its battleground includes a broader African context,
characterized by a certain degree of resistance as reflected in the discourse of conflict resolution
that is informed by the immediate context of the PAP – an organ of the African Union (AU)
which seeks to assist in conflict resolution and promote democracy and human rights throughout
the continent in actualization of its motto: One Africa, One Voice.
My analysis focuses on Critical Discourse Analysis (CDA) and I investigate in particular the
conditions of reproduction of dominant discourses, narratives and framings in order to ascertain
how they influence debates in the PAP and also what influence they have on interpreting
strategies. The aim is to identify the degree to which these elements influence an interpreter’s
role and performance and, in turn, how the interpreter then influences the course of a discussion.
Based on this, the analysis seeks to determine certain variables, such as discursive and linguistic
patterns, discursive moves, style, argumentation, ideologically and politically charged
expressions, etc., so as to trace the influence of dominant discourses and competing narratives in
the context of interpreting, and specifically how it relates to the political discourse on the Libyan
conflict.
In investigating interpreting strategies, the analysis also aims at pinpointing elements such as
omissions, shifts, repetitions and the occurrence of certain discursive or linguistic elements that
would demonstrate the scope of such influences. The analysis furthermore highlights some difficulties and recommends some points for future research.
Key words: CDA, interpreter’s role, interpreting strategies, narratives, framing, cognitive load, structure of discourse, linguistic features, discursive strategies, conflict discourse, Africa, conflict resolutio
Learning causality for Arabic - proclitics
The use of prefixed particles is a prevalent linguistic form to express causation in Arabic Language. However, such particles are complicated and highly ambiguous as they imply different meanings according to their position in the text. This ambiguity emphasizes the high demand for a large-scale annotated corpus that contains instances of these particles. In this paper, we present the process of building our corpus, which includes a collection of annotated sentences each containing an instance of a candidate causal particle. We use the corpus to construct and optimize predictive models for the task of causation recognition. The performance of the best models is significantly better than the baselines. Arabic is a less-resourced language and we hope this work would help in building better Information Extraction systems
Linking discourse marker inventories
The paper describes the first comprehensive edition of machine-readable discourse marker lexicons. Discourse markers such as and, because, but, though or thereafter are essential communicative signals in human conversation, as they indicate how an utterance relates to its communicative context. As much of this information is implicit or expressed differently in different languages, discourse parsing, context-adequate natural language generation and machine translation are considered particularly challenging aspects of Natural Language Processing. Providing this data in machine-readable, standard-compliant form will thus facilitate such technical tasks, and moreover, allow to explore techniques for translation inference to be applied to this particular group of lexical resources that was previously largely neglected in the context of Linguistic Linked (Open) Data
- …