650 research outputs found
Automatic Fact-guided Sentence Modification
Online encyclopediae like Wikipedia contain large amounts of text that need
frequent corrections and updates. The new information may contradict existing
content in encyclopediae. In this paper, we focus on rewriting such dynamically
changing articles. This is a challenging constrained generation task, as the
output must be consistent with the new information and fit into the rest of the
existing document. To this end, we propose a two-step solution: (1) We identify
and remove the contradicting components in a target text for a given claim,
using a neutralizing stance model; (2) We expand the remaining text to be
consistent with the given claim, using a novel two-encoder sequence-to-sequence
model with copy attention. Applied to a Wikipedia fact update dataset, our
method successfully generates updated sentences for new claims, achieving the
highest SARI score. Furthermore, we demonstrate that generating synthetic data
through such rewritten sentences can successfully augment the FEVER
fact-checking training dataset, leading to a relative error reduction of 13%.Comment: AAAI 202
Fast non-autoregressive inverse folding with discrete diffusion
Generating protein sequences that fold into a intended 3D structure is a
fundamental step in de novo protein design. De facto methods utilize
autoregressive generation, but this eschews higher order interactions that
could be exploited to improve inference speed. We describe a non-autoregressive
alternative that performs inference using a constant number of calls resulting
in a 23 times speed up without a loss in performance on the CATH benchmark.
Conditioned on the 3D structure, we fine-tune ProteinMPNN to perform discrete
diffusion with a purity prior over the index sampling order. Our approach gives
the flexibility in trading off inference speed and accuracy by modulating the
diffusion speed. Code: https://github.com/johnyang101/pmpnndiffComment: NeurIPS Machine learning for Stuctural Biology worksho
The Limitations of Stylometry for Detecting Machine-Generated Fake News
Recent developments in neural language models (LMs) have raised concerns
about their potential misuse for automatically spreading misinformation. In
light of these concerns, several studies have proposed to detect
machine-generated fake news by capturing their stylistic differences from
human-written text. These approaches, broadly termed stylometry, have found
success in source attribution and misinformation detection in human-written
texts. However, in this work, we show that stylometry is limited against
machine-generated misinformation. While humans speak differently when trying to
deceive, LMs generate stylistically consistent text, regardless of underlying
motive. Thus, though stylometry can successfully prevent impersonation by
identifying text provenance, it fails to distinguish legitimate LM applications
from those that introduce false information. We create two benchmarks
demonstrating the stylistic similarity between malicious and legitimate uses of
LMs, employed in auto-completion and editing-assistance settings. Our findings
highlight the need for non-stylometry approaches in detecting machine-generated
misinformation, and open up the discussion on the desired evaluation
benchmarks.Comment: Accepted for Computational Linguistics journal (squib). Previously
posted with title "Are We Safe Yet? The Limitations of Distributional
Features for Fake News Detection
- …