45 research outputs found
Learning to Prove Theorems via Interacting with Proof Assistants
Humans prove theorems by relying on substantial high-level reasoning and
problem-specific insights. Proof assistants offer a formalism that resembles
human mathematical reasoning, representing theorems in higher-order logic and
proofs as high-level tactics. However, human experts have to construct proofs
manually by entering tactics into the proof assistant. In this paper, we study
the problem of using machine learning to automate the interaction with proof
assistants. We construct CoqGym, a large-scale dataset and learning environment
containing 71K human-written proofs from 123 projects developed with the Coq
proof assistant. We develop ASTactic, a deep learning-based model that
generates tactics as programs in the form of abstract syntax trees (ASTs).
Experiments show that ASTactic trained on CoqGym can generate effective tactics
and can be used to prove new theorems not previously provable by automated
methods. Code is available at https://github.com/princeton-vl/CoqGym.Comment: Accepted to ICML 201
SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition
Understanding the spatial relations between objects in images is a
surprisingly challenging task. A chair may be "behind" a person even if it
appears to the left of the person in the image (depending on which way the
person is facing). Two students that appear close to each other in the image
may not in fact be "next to" each other if there is a third student between
them.
We introduce SpatialSense, a dataset specializing in spatial relation
recognition which captures a broad spectrum of such challenges, allowing for
proper benchmarking of computer vision techniques. SpatialSense is constructed
through adversarial crowdsourcing, in which human annotators are tasked with
finding spatial relations that are difficult to predict using simple cues such
as 2D spatial configuration or language priors. Adversarial crowdsourcing
significantly reduces dataset bias and samples more interesting relations in
the long tail compared to existing datasets. On SpatialSense, state-of-the-art
recognition models perform comparably to simple baselines, suggesting that they
rely on straightforward cues instead of fully reasoning about this complex
task. The SpatialSense benchmark provides a path forward to advancing the
spatial reasoning capabilities of computer vision systems. The dataset and code
are available at https://github.com/princeton-vl/SpatialSense.Comment: Accepted to ICCV 201
Rel3D: A Minimally Contrastive Benchmark for Grounding Spatial Relations in 3D
Understanding spatial relations (e.g., "laptop on table") in visual input is
important for both humans and robots. Existing datasets are insufficient as
they lack large-scale, high-quality 3D ground truth information, which is
critical for learning spatial relations. In this paper, we fill this gap by
constructing Rel3D: the first large-scale, human-annotated dataset for
grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness
of 3D information in predicting spatial relations on large-scale human data.
Moreover, we propose minimally contrastive data collection -- a novel
crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset
come in minimally contrastive pairs: two scenes in a pair are almost identical,
but a spatial relation holds in one and fails in the other. We empirically
validate that minimally contrastive examples can diagnose issues with current
relation detection models as well as lead to sample-efficient training. Code
and data are available at https://github.com/princeton-vl/Rel3D.Comment: Accepted to NeurIPS 202
A Study of Face Obfuscation in ImageNet
Face obfuscation (blurring, mosaicing, etc.) has been shown to be effective
for privacy protection; nevertheless, object recognition research typically
assumes access to complete, unobfuscated images. In this paper, we explore the
effects of face obfuscation on the popular ImageNet challenge visual
recognition benchmark. Most categories in the ImageNet challenge are not people
categories; however, many incidental people appear in the images, and their
privacy is a concern. We first annotate faces in the dataset. Then we
demonstrate that face obfuscation has minimal impact on the accuracy of
recognition models. Concretely, we benchmark multiple deep neural networks on
obfuscated images and observe that the overall recognition accuracy drops only
slightly (<= 1.0%). Further, we experiment with transfer learning to 4
downstream tasks (object recognition, scene recognition, face attribute
classification, and object detection) and show that features learned on
obfuscated images are equally transferable. Our work demonstrates the
feasibility of privacy-aware visual recognition, improves the highly-used
ImageNet challenge benchmark, and suggests an important path for future visual
datasets. Data and code are available at
https://github.com/princetonvisualai/imagenet-face-obfuscation.Comment: Accepted to ICML 202
Recommended from our members
Strongly Incremental Constituency Parsing with Graph Neural Networks
Parsing sentences into syntax trees can benefit downstream applications in NLP. Transition-based parsers build trees by executing actions in a state transition system. They are computationally efficient, and can leverage machine learning to predict actions based on partial trees. However, existing transition-based parsers are predominantly based on the shift-reduce transition system, which does not align with how humans are known to parse sentences. Psycholinguistic research suggests that human parsing is strongly incremental—humans grow a single parse tree by adding exactly one token at each step. In this paper, we propose a novel transition system called attach-juxtapose. It is strongly incremental; it represents a partial sentence using a single tree; each action adds exactly one token into the partial tree. Based on our transition system, we develop a strongly incremental parser. At each step, it encodes the partial tree using a graph neural network and predicts an action. We evaluate our parser on Penn Treebank (PTB) and Chinese Treebank (CTB). On PTB, it outperforms existing parsers trained with only constituency trees; and it performs on par with state-of-the-art parsers that use dependency trees as additional training data. On CTB, our parser establishes a new state of the art. Code is available at https://github.com/princeton-vl/attach-juxtapose-parser
Generating Natural Language Proofs with Verifier-Guided Search
Deductive reasoning (drawing conclusions from assumptions) is a challenging
problem in NLP. In this work, we focus on proof generation: given a hypothesis
and a set of supporting facts in natural language, the model generates a proof
tree indicating how to deduce the hypothesis from supporting facts. Instead of
generating the entire proof in one shot, prior work has demonstrated the
promise of stepwise generation but achieved limited success on real-world data.
Existing stepwise methods struggle to generate proof steps that are both valid
and relevant. In this paper, we present a novel stepwise method NLProofS
(Natural Language Proof Search), which learns to generate relevant steps
conditioning on the hypothesis. At the core of our approach, we train an
independent verifier to check the validity of proof steps. Instead of
generating steps greedily, we search for proofs maximizing a global proof score
judged by the verifier. NLProofS achieves state-of-the-art performance on
EntailmentBank and RuleTaker. For example, it improves the percentage of
correctly predicted proofs from 20.9% to 33.3% in the distractor setting of
EntailmentBank. This is the first time stepwise methods have led to better
generation of challenging human-authored proofs
Systematical Characterization of the <i>AT-Hook</i> Gene Family in <i>Juglans regia</i> L. and the Functional Analysis of the <i>JrAHL2</i> in Flower Induction and Hypocotyl Elongation
AT-hook motif nuclear localization (AHL) proteins play essential roles in various plant biological processes. Yet, a comprehensive understanding of AHL transcription factors in walnut (Juglans regia L.) is missing. In this study, 37 AHL gene family members were first identified in the walnut genome. Based on the evolutionary analysis, JrAHL genes were grouped into two clades, and their expansion may occur due to segmental duplication. The stress-responsive nature and driving of developmental activities of JrAHL genes were revealed by cis-acting elements and transcriptomic data, respectively. Tissue-specific expression analysis showed that JrAHLs had a profound transcription in flower and shoot tip, JrAHL2 in particular. Subcellular localization showed that JrAHL2 is anchored to the nucleus. Overexpression of JrAHL2 in Arabidopsis adversely affected hypocotyl elongation and delayed flowering. Our study, for the first time, presented a detailed analysis of JrAHL genes in walnut and provided theoretical knowledge for future genetic breeding programs
Recommended from our members