89,939 research outputs found
Team QCRI-MIT at SemEval-2019 Task 4: Propaganda Analysis Meets Hyperpartisan News Detection
In this paper, we describe our submission to SemEval-2019 Task 4 on
Hyperpartisan News Detection. Our system relies on a variety of engineered
features originally used to detect propaganda. This is based on the assumption
that biased messages are propagandistic in the sense that they promote a
particular political cause or viewpoint. We trained a logistic regression model
with features ranging from simple bag-of-words to vocabulary richness and text
readability features. Our system achieved 72.9% accuracy on the test data that
is annotated manually and 60.8% on the test data that is annotated with distant
supervision. Additional experiments showed that significant performance
improvements can be achieved with better feature pre-processing.Comment: Hyperpartisanship, propaganda, news media, fake news, SemEval-201
Recommended from our members
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach
Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist āapproaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. . . ], their own idiolectā (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing concern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ān-gram textbitesā, small textual segments that characterise that authorās writing, providing DNA-like chunks of identifying material
An End-to-End Conversational Style Matching Agent
We present an end-to-end voice-based conversational agent that is able to
engage in naturalistic multi-turn dialogue and align with the interlocutor's
conversational style. The system uses a series of deep neural network
components for speech recognition, dialogue generation, prosodic analysis and
speech synthesis to generate language and prosodic expression with qualities
that match those of the user. We conducted a user study (N=30) in which
participants talked with the agent for 15 to 20 minutes, resulting in over 8
hours of natural interaction data. Users with high consideration conversational
styles reported the agent to be more trustworthy when it matched their
conversational style. Whereas, users with high involvement conversational
styles were indifferent. Finally, we provide design guidelines for multi-turn
dialogue interactions using conversational style adaptation
Two-layer classification and distinguished representations of users and documents for grouping and authorship identification
Most studies on authorship identification reported a drop in the identification result when the number of authors exceeds 20-25. In this paper, we introduce a new user representation to address this problem and split classification across two layers. There are at least 3 novelties in this paper. First, the two-layer approach allows applying authorship identification over larger number of authors (tested over 100 authors), and it is extendable. The authors are divided into groups that contain smaller number of authors. Given an anonymous document, the primary layer detects the group to which the document belongs. Then, the secondary layer determines the particular author inside the selected group. In order to extract the groups linking similar authors, clustering is applied over users rather than documents. Hence, the second novelty of this paper is introducing a new user representation that is different from document representation. Without the proposed user representation, the clustering over documents will result in documents of author(s) distributed over several clusters, instead of a single cluster membership for each author. Third, the extracted clusters are descriptive and meaningful of their users as the dimensions have psychological backgrounds. For authorship identification, the documents are labelled with the extracted groups and fed into machine learning to build classification models that predicts the group and author of a given document. The results show that the documents are highly correlated with the extracted corresponding groups, and the proposed model can be accurately trained to determine the group and the author identity
Big words, small phrases: Mismatches between pause units and the polysynthetic word in Dalabon
This article uses instrumental data from natural speech to examine the phenomenon of pause placement within the verbal word in Dalabon, a polysynthetic Australian language of Arnhem Land. Though the phenomenon is incipient and in two sample texts occurs in only around 4% of verbs, there are clear possibilities for interrupting the grammatical word by pause after the pronominal prefix and some associated material at the left edge, though these within-word pauses are significantly shorter, on average, than those between words. Within-word pause placement is not random, but is restricted to certain affix boundaries; it requires that the paused-after material be at least dimoraic, and that the remaining material in the verbal word be at least disyllabic. Bininj Gun-wok, another polysynthetic language closely related to Dalabon, does not allow pauses to interrupt the verbal word, and the Dalabon development appears to be tied up with certain morphological innovations that have increased the proportion of closed syllables in the pronominal prefix zone of the verb. Though only incipient and not yet phonologized, pause placement in Dalabon verbs suggests a phonology-driven route by which polysynthetic languages may ultimately become less morphologically complex by fracturing into smaller units
- ā¦