23 research outputs found

    SciFix: Outperforming GPT3 on Scientific Factual Error Correction

    Full text link
    Due to the prohibitively high cost of creating error correction datasets, most Factual Claim Correction methods rely on a powerful verification model to guide the correction process. This leads to a significant drop in performance in domains like scientific claims, where good verification models do not always exist. In this work, we introduce SciFix, a scientific claim correction system that does not require a verifier but can outperform existing methods by a considerable margin -- achieving correction accuracy of 84% on the SciFact dataset, 77% on SciFact-Open and 72% on the CovidFact dataset, compared to next best accuracies of 7%, 5%, and 15% on the same datasets respectively. Our method leverages the power of prompting with LLMs during training to create a richly annotated dataset that can be used for fully supervised training and regularization. We additionally use a claim-aware decoding procedure to improve the quality of corrected claims. Our method outperforms the very LLM that was used to generate the annotated dataset -- with Few-Shot Prompting on GPT3.5 achieving 58%, 61%, and 64% on the respective datasets, a consistently lower correction accuracy, despite using nearly 800 times as many parameters as our model.Comment: To appear in proceedings of EMNLP2023 (findings

    Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment

    Full text link
    Social media is awash with hateful content, much of which is often veiled with linguistic and topical diversity. The benchmark datasets used for hate speech detection do not account for such divagation as they are predominantly compiled using hate lexicons. However, capturing hate signals becomes challenging in neutrally-seeded malicious content. Thus, designing models and datasets that mimic the real-world variability of hate warrants further investigation. To this end, we present GOTHate, a large-scale code-mixed crowdsourced dataset of around 51k posts for hate speech detection from Twitter. GOTHate is neutrally seeded, encompassing different languages and topics. We conduct detailed comparisons of GOTHate with the existing hate speech datasets, highlighting its novelty. We benchmark it with 10 recent baselines. Our extensive empirical and benchmarking experiments suggest that GOTHate is hard to classify in a text-only setup. Thus, we investigate how adding endogenous signals enhances the hate speech detection task. We augment GOTHate with the user's timeline information and ego network, bringing the overall data source closer to the real-world setup for understanding hateful content. Our proposed solution HEN-mBERT is a modular, multilingual, mixture-of-experts model that enriches the linguistic subspace with latent endogenous signals from history, topology, and exemplars. HEN-mBERT transcends the best baseline by 2.5% and 5% in overall macro-F1 and hate class F1, respectively. Inspired by our experiments, in partnership with Wipro AI, we are developing a semi-automated pipeline to detect hateful content as a part of their mission to tackle online harm.Comment: 15 pages, 4 figures, 11 tables. Accepted at SIGKDD'2

    Critical Analysis of Heat Exchanger Cycle for its Maintainability Using Failure Modes and Effect Analysis and Pareto Analysis

    Get PDF
    The Failure Modes and Effect Analysis (FMEA) is an efficient evaluation technique to identify potential failures in products, processes, and services. FMEA is designed to identify and prioritize failure modes. It proves to be a useful method for identifying and correcting possible failures at its earliest possible level so that one can avoid consequences of poor performance. In this paper, FMEA tool is used in detection of failures of various components of heat exchanger cycle and to identify critical failures of the components which may hamper the system’s performance. Further, a detailed Pareto analysis is done to find out the most critical components of the cycle, the causes of its failures, and possible recommended actions. This paper can be used as a checklist which will help in maintainability of the system

    Characterizing the Entities in Harmful Memes: Who is the Hero, the Villain, the Victim?

    Full text link
    Memes can sway people's opinions over social media as they combine visual and textual information in an easy-to-consume manner. Since memes instantly turn viral, it becomes crucial to infer their intent and potentially associated harmfulness to take timely measures as needed. A common problem associated with meme comprehension lies in detecting the entities referenced and characterizing the role of each of these entities. Here, we aim to understand whether the meme glorifies, vilifies, or victimizes each entity it refers to. To this end, we address the task of role identification of entities in harmful memes, i.e., detecting who is the 'hero', the 'villain', and the 'victim' in the meme, if any. We utilize HVVMemes - a memes dataset on US Politics and Covid-19 memes, released recently as part of the CONSTRAINT@ACL-2022 shared-task. It contains memes, entities referenced, and their associated roles: hero, villain, victim, and other. We further design VECTOR (Visual-semantic role dEteCToR), a robust multi-modal framework for the task, which integrates entity-based contextual information in the multi-modal representation and compare it to several standard unimodal (text-only or image-only) or multi-modal (image+text) models. Our experimental results show that our proposed model achieves an improvement of 4% over the best baseline and 1% over the best competing stand-alone submission from the shared-task. Besides divulging an extensive experimental setup with comparative analyses, we finally highlight the challenges encountered in addressing the complex task of semantic role labeling within memes.Comment: Accepted at EACL 2023 (Main Track). 9 Pages (main content), Limitations, Ethical Considerations + 4 Pages (Refs.) + Appendix; 8 Figures; 5 Tables; Paper ID: 80

    Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model

    Full text link
    Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.Comment: EMNLP 202

    The gravitational-wave background null hypothesis: Characterizing noise in millisecond pulsar arrival times with the Parkes Pulsar Timing Array

    Get PDF
    The noise in millisecond pulsar (MSP) timing data can include contributions from observing instruments, the interstellar medium, the solar wind, solar system ephemeris errors, and the pulsars themselves. The noise environment must be accurately characterized in order to form the null hypothesis from which signal models can be compared, including the signature induced by nanohertz-frequency gravitational waves (GWs). Here we describe the noise models developed for each of the MSPs in the Parkes Pulsar Timing Array (PPTA) third data release, which have been used as the basis of a search for the isotropic stochastic GW background. We model pulsar spin noise, dispersion measure variations, scattering variations, events in the pulsar magnetospheres, solar wind variability, and instrumental effects. We also search for new timing model parameters and detected Shapiro delays in PSR~J0614−-3329 and PSR~J1902−-5105. The noise and timing models are validated by testing the normalized and whitened timing residuals for Gaussianity and residual correlations with time. We demonstrate that the choice of noise models significantly affects the inferred properties of a common-spectrum process. Using our detailed models, the recovered common-spectrum noise in the PPTA is consistent with a power law with a spectral index of γ=13/3\gamma=13/3, the value predicted for a stochastic GW background from a population of supermassive black hole binaries driven solely by GW emission.Comment: 18 pages, 10 figures. Accepted for publication in ApJ
    corecore