Search CORE

552 research outputs found

Evaluating prose style transfer with the Bible

Author: Carlson Keith
Riddell Allen
Rockmore Daniel
Publication venue: 'The Royal Society'
Publication date: 27/09/2018
Field of study

In the prose style transfer task a system, provided with text input and a target prose style, produces output which preserves the meaning of the input text but alters the style. These systems require parallel data for evaluation of results and usually make use of parallel data for training. Currently, there are few publicly available corpora for this task. In this work, we identify a high-quality source of aligned, stylistically distinct text in different versions of the Bible. We provide a standardized split, into training, development and testing data, of the public domain versions in our corpus. This corpus is highly parallel since many Bible versions are included. Sentences are aligned due to the presence of chapter and verse numbers within all versions of the text. In addition to the corpus, we present the results, as measured by the BLEU and PINC metrics, of several models trained on our data which can serve as baselines for future research. While we present these data as a style transfer corpus, we believe that it is of unmatched quality and may be useful for other natural language tasks as well

arXiv.org e-Print Archive

IUScholarWorks Open

Dartmouth Digital Commons (Dartmouth College)

Cover Text Steganography: N-gram and Entropy-based Approach

Author: Rico-Larmer Sara M
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 04/10/2016
Field of study

Steganography is an ancient technique for hiding a secret message within ordinary looking messages or objects (e.g., images), also known as cover messages. Among various techniques, hiding text data in plain text file is a challenging task due to lack of redundant information. This paper proposes two new approaches to embed a secret message in a cover text document. The two approaches are n-gram and entropy metric-based generation of stego text. We provide examples of encoding secret messages in a cover text document followed by an initial evaluation of how well stego texts look close to the plain texts. Furthermore, we also discuss several related work as well as our future work plan

DigitalCommons@Kennesaw State University

Finding Eyewitness Tweets During Crises

Author: Liu Huan
Lubold Nichola
Morstatter Fred
Pfeffer Jürgen
Pon-Barry Heather
Publication venue
Publication date: 01/01/2014
Field of study

Disaster response agencies have started to incorporate social media as a source of fast-breaking information to understand the needs of people affected by the many crises that occur around the world. These agencies look for tweets from within the region affected by the crisis to get the latest updates of the status of the affected region. However only 1% of all tweets are geotagged with explicit location information. First responders lose valuable information because they cannot assess the origin of many of the tweets they collect. In this work we seek to identify non-geotagged tweets that originate from within the crisis region. Towards this, we address three questions: (1) is there a difference between the language of tweets originating within a crisis region and tweets originating outside the region, (2) what are the linguistic patterns that can be used to differentiate within-region and outside-region tweets, and (3) for non-geotagged tweets, can we automatically identify those originating within the crisis region in real-time

arXiv.org e-Print Archive

CiteSeerX

Crossref

On Explaining Multimodal Hateful Meme Detection Models

Author: Chong Wen-Haw
Hee Ming Shan
Lee Roy Ka-Wei
Publication venue
Publication date: 01/04/2022
Field of study

Hateful meme detection is a new multimodal task that has gained significant traction in academic and industry research communities. Recently, researchers have applied pre-trained visual-linguistic models to perform the multimodal classification task, and some of these solutions have yielded promising results. However, what these visual-linguistic models learn for the hateful meme classification task remains unclear. For instance, it is unclear if these models are able to capture the derogatory or slurs references in multimodality (i.e., image and text) of the hateful memes. To fill this research gap, this paper propose three research questions to improve our understanding of these visual-linguistic models performing the hateful meme classification task. We found that the image modality contributes more to the hateful meme classification task, and the visual-linguistic models are able to perform visual-text slurs grounding to a certain extent. Our error analysis also shows that the visual-linguistic models have acquired biases, which resulted in false-positive predictions

arXiv.org e-Print Archive

Institutional Knowledge at Singapore Management University

A Comparative Study of Effective Approaches for Arabic Sentiment Analysis

Author: Abu Farha Ibrahim
Magdy Walid
Publication venue: 'Elsevier BV'
Publication date: 01/03/2021
Field of study

Edinburgh Research Explorer