455 research outputs found

    Resistance in public disputes dataset

    Get PDF
    This is the dataset for the paper 'Resistance in public disputes' paper. This includes eight clips, taken from public disputes found on social media (e.g. YouTube, Twitter). Where possible, both the full recording, an extended version and the clipped extract from the paper have been provided. These are labelled as such. Each clip corresponds to the similarly named transcript. Links to the original recordings can be found in the transcript, but these might fail to work after a period of time

    ソーシャルメディアからの自動知識獲得

    Get PDF
    2014年度第1回研究集会[2014年6月26日(木)]報告要

    Characterising User Content on a Multi-lingual Social Network

    Get PDF
    Social media has been on the vanguard of political infor- mation diffusion in the 21st century. Most studies that look into disinformation, political influence and fake-news focus on mainstream social media platforms. This has inevitably made English an important factor in our current understand- ing of political activity on social media. As a result, there has only been a limited number of studies into a large portion of the world, including the largest, multilingual and multi- cultural democracy: India. In this paper we present our char- acterisation of a multilingual social network in India called ShareChat. We collect an exhaustive dataset across 72 weeks before and during the Indian general elections of 2019, across 14 languages. We investigate the cross lingual dynamics by clustering visually similar images together, and exploring how they move across language barriers. We find that Tel- ugu, Malayalam, Tamil and Kannada languages tend to be dominant in soliciting political images (often referred to as memes), and posts from Hindi have the largest cross-lingual diffusion across ShareChat (as well as images containing text in English). In the case of images containing text that cross language barriers, we see that language translation is used to widen the accessibility. That said, we find cases where the same image is associated with very different text (and there- fore meanings). This initial characterisation paves the way for more advanced pipelines to understand the dynamics of fake and political content in a multi-lingual and non-textual setting

    DiGreC (Diachrony of Greek Case) treebank

    Get PDF
    A morphosyntactically and semantically annotated corpus of selected sentences from texts throughout the history of the Greek language, produced by the AHRC-funded project “Investigating Variation and Change: Case in Diachrony

    Investigating redundancy in emoji use : study on a twitter based corpus

    Get PDF
    In this paper we present an annotated corpus created with the aim of analyzing the informative behaviour of emoji – an issue of importance for sentiment analysis and natural language processing. The corpus consists of 2475 tweets all containing at least one emoji, which has been annotated using one of the three possible classes: Redundant, Non Redundant, and Non Redundant + POS. We explain how the corpus was collected, describe the annotation procedure and the interface developed for the task. We provide an analysis of the corpus, considering also possible predictive features, discuss the problematic aspects of the annotation, and suggest future improvements.peer-reviewe
    corecore