39 research outputs found

    Maximizing Neutrality in News Ordering

    Full text link
    The detection of fake news has received increasing attention over the past few years, but there are more subtle ways of deceiving one's audience. In addition to the content of news stories, their presentation can also be made misleading or biased. In this work, we study the impact of the ordering of news stories on audience perception. We introduce the problems of detecting cherry-picked news orderings and maximizing neutrality in news orderings. We prove hardness results and present several algorithms for approximately solving these problems. Furthermore, we provide extensive experimental results and present evidence of potential cherry-picking in the real world.Comment: 14 pages, 13 figures, accepted to KDD '2

    Overview of the Shared Task on Fake News Detection in Urdu at FIRE 2021

    Full text link
    Automatic detection of fake news is a highly important task in the contemporary world. This study reports the 2nd shared task called UrduFake@FIRE2021 on identifying fake news detection in Urdu. The goal of the shared task is to motivate the community to come up with efficient methods for solving this vital problem, particularly for the Urdu language. The task is posed as a binary classification problem to label a given news article as a real or a fake news article. The organizers provide a dataset comprising news in five domains: (i) Health, (ii) Sports, (iii) Showbiz, (iv) Technology, and (v) Business, split into training and testing sets. The training set contains 1300 annotated news articles -- 750 real news, 550 fake news, while the testing set contains 300 news articles -- 200 real, 100 fake news. 34 teams from 7 different countries (China, Egypt, Israel, India, Mexico, Pakistan, and UAE) registered to participate in the UrduFake@FIRE2021 shared task. Out of those, 18 teams submitted their experimental results, and 11 of those submitted their technical reports, which is substantially higher compared to the UrduFake shared task in 2020 when only 6 teams submitted their technical reports. The technical reports submitted by the participants demonstrated different data representation techniques ranging from count-based BoW features to word vector embeddings as well as the use of numerous machine learning algorithms ranging from traditional SVM to various neural network architectures including Transformers such as BERT and RoBERTa. In this year's competition, the best performing system obtained an F1-macro score of 0.679, which is lower than the past year's best result of 0.907 F1-macro. Admittedly, while training sets from the past and the current years overlap to a large extent, the testing set provided this year is completely different

    Assessing the Use of Smartphones in Agriculture

    Get PDF
    Smartphones are an as yet untapped resource available to agriculture. They are ubiquitous across the globe yet have not previously been tested as a resource available to farmers. Imaging methods such as unmanned aerial vehicles (UAV) and satellite imaging have been well-explored and employed in various aspects of agriculture; however, such methods can be cost-prohibitive and at the mercy of another company or agency. If smartphones could be shown to capture color in such a way that relates in a quantifiable way to data measured by laboratory-grade equipment they could prove to be extremely valuable to farmers. Cutting out expensive and specialized technology for a device already sitting in people’s pockets would benefit farmers around the world. Given this idea, three experiments were designed to assess the color capabilities of smartphone cameras in relation to agricultural applications. The first experiment assessed the capability of smartphone cameras to identify the presence of cyanobacteria in a given water sample based on measurements of color and transmission spectra. These data were then related to color captured by four smartphones. Additionally, the measurements were used to create a preliminary customized Color Checker(TM)-inspired chart for use in identification of cyanobacteria. Current techniques employed by the state of New York for identifying cyanobacteria in water are cumbersome, involving week-long testing in government labs. This project is an attempt to simplify the process by using image capture with smartphones. The second assessment was similar to the first, with tomatoes in place of cyanobacteria. Five smartphone devices were used to image tomatoes at different stages of ripeness. A relationship was found to exist between the hue angles taken from the smartphone images and as measured by a spectroradiometer. A tomato Color Checker(TM) was created using the spectroradiometer measurements. The chart is intended for use in camera calibration for future imaging of tomatoes. The final assessment was an online experiment, wherein participants were asked to choose a color from an array generated from images of tomatoes that best represent the color of the tomato. This was a first step toward understanding which characteristics people use to categorize a crop as ripe and how those characteristics are rendered by smartphone imaging

    3D seismic attribute-assisted analysis of microseismic events in the Marcellus Shale

    Get PDF
    Microseismic monitoring is often used during the process of oil and gas exploitation to monitor seismicity that may be triggered by hydraulic fracturing, a common practice in the Appalachian Basin. Anthropogenically-induced minor upward fracture growth is not uncommon in the Marcellus shale; however, in the area of study, significant microseismic activity was registered above the target zone. In order to ascertain whether out-of-zone growth might have been predictable and identify which areas are more likely to experience brittle failure first, 3D seismic and microseismic data were analyzed with a focus on better understanding variations in the acoustic properties associated with unconventional naturally fractured reservoirs.;Ant Tracking was used to identify areas of increased local seismic discontinuity, as these areas are generally more intensely deformed and may represent zones of increased fracture intensity. Ant Tracking results reveal discontinuities in the Marcellus are oriented approximately at N52E and N41W; discontinuities do not coincide with N25E trending folds apparent in the 3D seismic, but tend to follow deeper structural trends instead. These discontinuity orientations are interpreted to be a result of continued movement on deeper faults throughout the Paleozoic; these faults possibly acted as seed points for fractures further upsection and potentially led to the precipitation of the large N25E trending imbricate backthrusts seen in the 3D seismic.;The reservoir\u27s response to hydraulic fracturing also provided insights into local stress anisotropy and into optimal well and stage spacing needed to maximize drainage area and locate additional wells during the field development phase. Microseismic, well, and pump data used to gauge the reservoir\u27s response to a hydraulic fracture treatment indicated that the number of stages, lateral length, total proppant volume, and fracture energy heavily influence how a well produces. SHmax in the area is oriented at ~N96E in the region and microseismic event swarms generally trend N56E. Microseismic activity which forms at acute angles to SHmax is interpreted to be a result of shearing on pre-existing fractures. Ideally this study will fit into a larger framework of previous case studies that can be used to better understand shale gas reservoirs, and make hydrocarbon extraction safer, more efficient, and more predictable

    Can LLM-Generated Misinformation Be Detected?

    Full text link
    The advent of Large Language Models (LLMs) has made a transformative impact. However, the potential that LLMs such as ChatGPT can be exploited to generate misinformation has posed a serious concern to online safety and public trust. A fundamental research question is: will LLM-generated misinformation cause more harm than human-written misinformation? We propose to tackle this question from the perspective of detection difficulty. We first build a taxonomy of LLM-generated misinformation. Then we categorize and validate the potential real-world methods for generating misinformation with LLMs. Then, through extensive empirical investigation, we discover that LLM-generated misinformation can be harder to detect for humans and detectors compared to human-written misinformation with the same semantics, which suggests it can have more deceptive styles and potentially cause more harm. We also discuss the implications of our discovery on combating misinformation in the age of LLMs and the countermeasures.Comment: The code, dataset and more resources on LLMs and misinformation will be released on the project website: https://llm-misinformation.github.io

    A Survey on Automated Fact-Checking

    Get PDF
    Fact-checking has become increasingly important due to the speed with which both information and misinformation can spread in the modern media ecosystem. Therefore, researchers have been exploring how factchecking can be automated, using techniques based on natural language processing, machine learning, knowledge representation, and databases to automatically predict the veracity of claims. In this paper, we survey automated fact-checking stemming from natural language processing, and discuss its connections to related tasks and disciplines. In this process, we present an overview of existing datasets and models, aiming to unify the various definitions given and identify common concepts. Finally, we highlight challenges for future research

    A Longitudinal Analysis of the Effects of News Media Messages on Health Behaviors

    Get PDF
    Two primary research hypotheses were tested concerning aggregate effects of news media on aggregated health behaviors over time for four health behaviors: marijuana use, seatbelt use, beef consumption, fruit consumption. Several measures of seatbelt use and fruit consumption were used. The first primary hypothesis sought to establish any evidence of news media impact on behavior, and tested for effects using two different operationalizations of media coverage. The first operationalization distinguished between PRO and CON coverage. PRO coverage consisted of stories emphasizing positive aspects of performing the healthy behavior, while CON coverage consisted of stories emphasizing negative aspects of performing the healthy behavior. The second operationalization measured any media stories containing references to performing the behavior (the general behavioral media measure, or GBM). The second hypothesis proposed that media messages emphasizing the positive (PRO) and negative (CON) aspects of performing the healthy behavior would be more strongly associated with behavior change than would the more general behavioral media coverage measure (GBM) (Hypothesis 2A). It was further proposed that if there were very low levels of CON media, the PRO measure should still offer greater prediction than the general measure (Hypothesis 2B). Two methods, distributed lagged regression analysis and ideodynamic models, were used to test hypotheses. iii In sum, there was substantial support for Research Hypothesis 1, that trends in media coverage could explain a significant portion of the variation in trends in behavioral outcomes. Considering any measure of media coverage, any measure of behavior, and any method of analysis, there was at least one significant media/behavior association for each behavior. The conviction with which claims of causal inference could be made was varied. There was less convincing evidence supporting the second set of research hypotheses, that PRO/CON (or PRO only in the absence of CON) coverage would better predict behavior change than the GBM measure. These hypotheses could only be considered if there was any evidence of an association between media coverage and the behavioral measure. Of the five significant media/behavior relationships, four of them provided support (in varying degrees) for the superiority of the more refined media measure(s)

    Combating Misinformation in the Age of LLMs: Opportunities and Challenges

    Full text link
    Misinformation such as fake news and rumors is a serious threat on information ecosystems and public trust. The emergence of Large Language Models (LLMs) has great potential to reshape the landscape of combating misinformation. Generally, LLMs can be a double-edged sword in the fight. On the one hand, LLMs bring promising opportunities for combating misinformation due to their profound world knowledge and strong reasoning abilities. Thus, one emergent question is: how to utilize LLMs to combat misinformation? On the other hand, the critical challenge is that LLMs can be easily leveraged to generate deceptive misinformation at scale. Then, another important question is: how to combat LLM-generated misinformation? In this paper, we first systematically review the history of combating misinformation before the advent of LLMs. Then we illustrate the current efforts and present an outlook for these two fundamental questions respectively. The goal of this survey paper is to facilitate the progress of utilizing LLMs for fighting misinformation and call for interdisciplinary efforts from different stakeholders for combating LLM-generated misinformation.Comment: 9 pages for the main paper, 35 pages including 656 references, more resources on "LLMs Meet Misinformation" are on the website: https://llm-misinformation.github.io
    corecore