7 research outputs found
Detecting Check-Worthy Claims in Political Debates, Speeches, and Interviews Using Audio Data
A large portion of society united around the same vision and ideas carries
enormous energy. That is precisely what political figures would like to
accumulate for their cause. With this goal in mind, they can sometimes resort
to distorting or hiding the truth, unintentionally or on purpose, which opens
the door for misinformation and disinformation. Tools for automatic detection
of check-worthy claims would be of great help to moderators of debates,
journalists, and fact-checking organizations. While previous work on detecting
check-worthy claims has focused on text, here we explore the utility of the
audio signal as an additional information source. We create a new multimodal
dataset (text and audio in English) containing 48 hours of speech. Our
evaluation results show that the audio modality together with text yields
improvements over text alone in the case of multiple speakers. Moreover, an
audio-only model could outperform a text-only one for a single speaker.Comment: check-worthy claims, fake news, political debates, audi
M4: Multi-generator, Multi-domain, and Multi-lingual Black-Box Machine-Generated Text Detection
Large language models (LLMs) have demonstrated remarkable capability to
generate fluent responses to a wide variety of user queries, but this has also
resulted in concerns regarding the potential misuse of such texts in
journalism, educational, and academic context. In this work, we aim to develop
automatic systems to identify machine-generated text and to detect potential
misuse. We first introduce a large-scale benchmark M4, which is
multi-generator, multi-domain, and multi-lingual corpus for machine-generated
text detection. Using the dataset, we experiment with a number of methods and
we show that it is challenging for detectors to generalize well on unseen
examples if they are either from different domains or are generated by
different large language models. In such cases, detectors tend to misclassify
machine-generated text as human-written. These results show that the problem is
far from solved and there is a lot of room for improvement. We believe that our
dataset M4, which covers different generators, domains and languages, will
enable future research towards more robust approaches for this pressing
societal problem. The M4 dataset is available at
https://github.com/mbzuai-nlp/M4.Comment: 11 page
US News and Social Media Framing around Vaping
In this paper, we investigate how vaping is framed differently (2008-2021)
between US news and social media. We analyze 15,711 news articles and 1,231,379
Facebook posts about vaping to study the differences in framing between media
varieties. We use word embeddings to provide two-dimensional visualizations of
the semantic changes around vaping for news and for social media. We detail
that news media framing of vaping shifted over time in line with emergent
regulatory trends, such as; flavored vaping bans, with little discussion around
vaping as a smoking cessation tool. We found that social media discussions were
far more varied, with transitions toward vaping both as a public health harm
and as a smoking cessation tool. Our cloze test, dynamic topic model, and
question answering showed similar patterns, where social media, but not news
media, characterizes vaping as combustible cigarette substitute. We use n-grams
to detail that social media data first centered on vaping as a smoking
cessation tool, and in 2019 moved toward narratives around vaping regulation,
similar to news media frames. Overall, social media tracks the evolution of
vaping as a social practice, while news media reflects more risk based
concerns. A strength of our work is how the different techniques we have
applied validate each other. Stakeholders may utilize our findings to intervene
around the framing of vaping, and may design communications campaigns that
improve the way society sees vaping, thus possibly aiding smoking cessation;
and reducing youth vaping
Using the Web as an Implicit Training Set: Application to Noun Compound Syntax and Semantics
Copyright © 2007, by the author(s)
CompSysTech’2001 – Bulgarian Computer Science Conference – 21-22.06.2001, Sofia, Bulgaria Investigating the Degree of Adequacy of the Relations in the Concept Structure of Students using the Method of Latent Semantic Analysis
Abstract. The research on the effects of study is hindered by the possibilities of the techniques and methods of registering, measuring and assessing the actually formed knowledge as information represented in the memory with the appropriate correlation among its units. The problem has been solved by the use of the latent semantic analysis for comparison and assessment of scientific texts and knowledge, expressed by the students in the form of free verbal statements. Key words: latent semantic analysis; notional structures, content analysis