39 research outputs found
Maximizing Neutrality in News Ordering
The detection of fake news has received increasing attention over the past
few years, but there are more subtle ways of deceiving one's audience. In
addition to the content of news stories, their presentation can also be made
misleading or biased. In this work, we study the impact of the ordering of news
stories on audience perception. We introduce the problems of detecting
cherry-picked news orderings and maximizing neutrality in news orderings. We
prove hardness results and present several algorithms for approximately solving
these problems. Furthermore, we provide extensive experimental results and
present evidence of potential cherry-picking in the real world.Comment: 14 pages, 13 figures, accepted to KDD '2
Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021
Automatic detection of fake news is a highly important task in the
contemporary world. This study reports the 2nd shared task called
UrduFake@FIRE2021 on identifying fake news detection in Urdu. The goal of the
shared task is to motivate the community to come up with efficient methods for
solving this vital problem, particularly for the Urdu language. The task is
posed as a binary classification problem to label a given news article as a
real or a fake news article. The organizers provide a dataset comprising news
in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and
(v) Business, split into training and testing sets. The training set contains
1300 annotated news articles -- 750 real news, 550 fake news, while the testing
set contains 300 news articles -- 200 real, 100 fake news. 34 teams from 7
different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE)
registered to participate in the UrduFake@FIRE2021 shared task. Out of those,
18 teams submitted their experimental results, and 11 of those submitted their
technical reports, which is substantially higher compared to the UrduFake
shared task in 2020 when only 6 teams submitted their technical reports. The
technical reports submitted by the participants demonstrated different data
representation techniques ranging from count-based BoW features to word vector
embeddings as well as the use of numerous machine learning algorithms ranging
from traditional SVM to various neural network architectures including
Transformers such as BERT and RoBERTa. In this year's competition, the best
performing system obtained an F1-macro score of 0.679, which is lower than the
past year's best result of 0.907 F1-macro. Admittedly, while training sets from
the past and the current years overlap to a large extent, the testing set
provided this year is completely different
Assessing the Use of Smartphones in Agriculture
Smartphones are an as yet untapped resource available to agriculture. They are ubiquitous across the globe yet have not previously been tested as a resource available to farmers. Imaging methods such as unmanned aerial vehicles (UAV) and satellite imaging have been well-explored and employed in various aspects of agriculture; however, such methods can be cost-prohibitive and at the mercy of another company or agency. If smartphones could be shown to capture color in such a way that relates in a quantifiable way to data measured by laboratory-grade equipment they could prove to be extremely valuable to farmers. Cutting out expensive and specialized technology for a device already sitting in people’s pockets would benefit farmers around the world. Given this idea, three experiments were designed to assess the color capabilities of smartphone cameras in relation to agricultural applications. The first experiment assessed the capability of smartphone cameras to identify the presence of cyanobacteria in a given water sample based on measurements of color and transmission spectra. These data were then related to color captured by four smartphones. Additionally, the measurements were used to create a preliminary customized Color Checker(TM)-inspired chart for use in identification of cyanobacteria. Current techniques employed by the state of New York for identifying cyanobacteria in water are cumbersome, involving week-long testing in government labs. This project is an attempt to simplify the process by using image capture with smartphones. The second assessment was similar to the first, with tomatoes in place of cyanobacteria. Five smartphone devices were used to image tomatoes at different stages of ripeness. A relationship was found to exist between the hue angles taken from the smartphone images and as measured by a spectroradiometer. A tomato Color Checker(TM) was created using the spectroradiometer measurements. The chart is intended for use in camera calibration for future imaging of tomatoes. The final assessment was an online experiment, wherein participants were asked to choose a color from an array generated from images of tomatoes that best represent the color of the tomato. This was a first step toward understanding which characteristics people use to categorize a crop as ripe and how those characteristics are rendered by smartphone imaging
3D seismic attribute-assisted analysis of microseismic events in the Marcellus Shale
Microseismic monitoring is often used during the process of oil and gas exploitation to monitor seismicity that may be triggered by hydraulic fracturing, a common practice in the Appalachian Basin. Anthropogenically-induced minor upward fracture growth is not uncommon in the Marcellus shale; however, in the area of study, significant microseismic activity was registered above the target zone. In order to ascertain whether out-of-zone growth might have been predictable and identify which areas are more likely to experience brittle failure first, 3D seismic and microseismic data were analyzed with a focus on better understanding variations in the acoustic properties associated with unconventional naturally fractured reservoirs.;Ant Tracking was used to identify areas of increased local seismic discontinuity, as these areas are generally more intensely deformed and may represent zones of increased fracture intensity. Ant Tracking results reveal discontinuities in the Marcellus are oriented approximately at N52E and N41W; discontinuities do not coincide with N25E trending folds apparent in the 3D seismic, but tend to follow deeper structural trends instead. These discontinuity orientations are interpreted to be a result of continued movement on deeper faults throughout the Paleozoic; these faults possibly acted as seed points for fractures further upsection and potentially led to the precipitation of the large N25E trending imbricate backthrusts seen in the 3D seismic.;The reservoir\u27s response to hydraulic fracturing also provided insights into local stress anisotropy and into optimal well and stage spacing needed to maximize drainage area and locate additional wells during the field development phase. Microseismic, well, and pump data used to gauge the reservoir\u27s response to a hydraulic fracture treatment indicated that the number of stages, lateral length, total proppant volume, and fracture energy heavily influence how a well produces. SHmax in the area is oriented at ~N96E in the region and microseismic event swarms generally trend N56E. Microseismic activity which forms at acute angles to SHmax is interpreted to be a result of shearing on pre-existing fractures. Ideally this study will fit into a larger framework of previous case studies that can be used to better understand shale gas reservoirs, and make hydrocarbon extraction safer, more efficient, and more predictable
Can LLM-Generated Misinformation Be Detected?
The advent of Large Language Models (LLMs) has made a transformative impact.
However, the potential that LLMs such as ChatGPT can be exploited to generate
misinformation has posed a serious concern to online safety and public trust. A
fundamental research question is: will LLM-generated misinformation cause more
harm than human-written misinformation? We propose to tackle this question from
the perspective of detection difficulty. We first build a taxonomy of
LLM-generated misinformation. Then we categorize and validate the potential
real-world methods for generating misinformation with LLMs. Then, through
extensive empirical investigation, we discover that LLM-generated
misinformation can be harder to detect for humans and detectors compared to
human-written misinformation with the same semantics, which suggests it can
have more deceptive styles and potentially cause more harm. We also discuss the
implications of our discovery on combating misinformation in the age of LLMs
and the countermeasures.Comment: The code, dataset and more resources on LLMs and misinformation will
be released on the project website: https://llm-misinformation.github.io
A Survey on Automated Fact-Checking
Fact-checking has become increasingly important due to the speed with which both information and misinformation can spread in the modern media ecosystem. Therefore, researchers have been exploring how factchecking can be automated, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the veracity of claims. In this paper, we survey automated fact-checking stemming from natural language processing, and discuss its connections to related tasks and disciplines. In this process, we present an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts. Finally, we highlight challenges for future research
A Longitudinal Analysis of the Effects of News Media Messages on Health Behaviors
Two primary research hypotheses were tested concerning aggregate effects of news media on aggregated health behaviors over time for four health behaviors: marijuana use, seatbelt use, beef consumption, fruit consumption. Several measures of seatbelt use and fruit consumption were used. The first primary hypothesis sought to establish any evidence of news media impact on behavior, and tested for effects using two different operationalizations of media coverage. The first operationalization distinguished between PRO and CON coverage. PRO coverage consisted of stories emphasizing positive aspects of performing the healthy behavior, while CON coverage consisted of stories emphasizing negative aspects of performing the healthy behavior. The second operationalization measured any media stories containing references to performing the behavior (the general behavioral media measure, or GBM). The second hypothesis proposed that media messages emphasizing the positive (PRO) and negative (CON) aspects of performing the healthy behavior would be more strongly associated with behavior change than would the more general behavioral media coverage measure (GBM) (Hypothesis 2A). It was further proposed that if there were very low levels of CON media, the PRO measure should still offer greater prediction than the general measure (Hypothesis 2B). Two methods, distributed lagged regression analysis and ideodynamic models, were used to test hypotheses. iii In sum, there was substantial support for Research Hypothesis 1, that trends in media coverage could explain a significant portion of the variation in trends in behavioral outcomes. Considering any measure of media coverage, any measure of behavior, and any method of analysis, there was at least one significant media/behavior association for each behavior. The conviction with which claims of causal inference could be made was varied. There was less convincing evidence supporting the second set of research hypotheses, that PRO/CON (or PRO only in the absence of CON) coverage would better predict behavior change than the GBM measure. These hypotheses could only be considered if there was any evidence of an association between media coverage and the behavioral measure. Of the five significant media/behavior relationships, four of them provided support (in varying degrees) for the superiority of the more refined media measure(s)
Combating Misinformation in the Age of LLMs: Opportunities and Challenges
Misinformation such as fake news and rumors is a serious threat on
information ecosystems and public trust. The emergence of Large Language Models
(LLMs) has great potential to reshape the landscape of combating
misinformation. Generally, LLMs can be a double-edged sword in the fight. On
the one hand, LLMs bring promising opportunities for combating misinformation
due to their profound world knowledge and strong reasoning abilities. Thus, one
emergent question is: how to utilize LLMs to combat misinformation? On the
other hand, the critical challenge is that LLMs can be easily leveraged to
generate deceptive misinformation at scale. Then, another important question
is: how to combat LLM-generated misinformation? In this paper, we first
systematically review the history of combating misinformation before the advent
of LLMs. Then we illustrate the current efforts and present an outlook for
these two fundamental questions respectively. The goal of this survey paper is
to facilitate the progress of utilizing LLMs for fighting misinformation and
call for interdisciplinary efforts from different stakeholders for combating
LLM-generated misinformation.Comment: 9 pages for the main paper, 35 pages including 656 references, more
resources on "LLMs Meet Misinformation" are on the website:
https://llm-misinformation.github.io