552 research outputs found
Evaluating prose style transfer with the Bible
In the prose style transfer task a system, provided with text input and a
target prose style, produces output which preserves the meaning of the input
text but alters the style. These systems require parallel data for evaluation
of results and usually make use of parallel data for training. Currently, there
are few publicly available corpora for this task. In this work, we identify a
high-quality source of aligned, stylistically distinct text in different
versions of the Bible. We provide a standardized split, into training,
development and testing data, of the public domain versions in our corpus. This
corpus is highly parallel since many Bible versions are included. Sentences are
aligned due to the presence of chapter and verse numbers within all versions of
the text. In addition to the corpus, we present the results, as measured by the
BLEU and PINC metrics, of several models trained on our data which can serve as
baselines for future research. While we present these data as a style transfer
corpus, we believe that it is of unmatched quality and may be useful for other
natural language tasks as well
Cover Text Steganography: N-gram and Entropy-based Approach
Steganography is an ancient technique for hiding a secret message within ordinary looking messages or objects (e.g., images), also known as cover messages. Among various techniques, hiding text data in plain text file is a challenging task due to lack of redundant information. This paper proposes two new approaches to embed a secret message in a cover text document. The two approaches are n-gram and entropy metric-based generation of stego text. We provide examples of encoding secret messages in a cover text document followed by an initial evaluation of how well stego texts look close to the plain texts. Furthermore, we also discuss several related work as well as our future work plan
Finding Eyewitness Tweets During Crises
Disaster response agencies have started to incorporate social media as a
source of fast-breaking information to understand the needs of people affected
by the many crises that occur around the world. These agencies look for tweets
from within the region affected by the crisis to get the latest updates of the
status of the affected region. However only 1% of all tweets are geotagged with
explicit location information. First responders lose valuable information
because they cannot assess the origin of many of the tweets they collect. In
this work we seek to identify non-geotagged tweets that originate from within
the crisis region. Towards this, we address three questions: (1) is there a
difference between the language of tweets originating within a crisis region
and tweets originating outside the region, (2) what are the linguistic patterns
that can be used to differentiate within-region and outside-region tweets, and
(3) for non-geotagged tweets, can we automatically identify those originating
within the crisis region in real-time
On Explaining Multimodal Hateful Meme Detection Models
Hateful meme detection is a new multimodal task that has gained significant
traction in academic and industry research communities. Recently, researchers
have applied pre-trained visual-linguistic models to perform the multimodal
classification task, and some of these solutions have yielded promising
results. However, what these visual-linguistic models learn for the hateful
meme classification task remains unclear. For instance, it is unclear if these
models are able to capture the derogatory or slurs references in multimodality
(i.e., image and text) of the hateful memes. To fill this research gap, this
paper propose three research questions to improve our understanding of these
visual-linguistic models performing the hateful meme classification task. We
found that the image modality contributes more to the hateful meme
classification task, and the visual-linguistic models are able to perform
visual-text slurs grounding to a certain extent. Our error analysis also shows
that the visual-linguistic models have acquired biases, which resulted in
false-positive predictions
- …