11 research outputs found
Zero-Shot Classification by Logical Reasoning on Natural Language Explanations
Humans can classify an unseen category by reasoning on its language
explanations. This ability is owing to the compositional nature of language: we
can combine previously seen concepts to describe the new category. For example,
we might describe mavens as "a kind of large birds with black feathers", so
that others can use their knowledge of concepts "large birds" and "black
feathers" to recognize a maven. Inspired by this observation, in this work we
tackle zero-shot classification task by logically parsing and reasoning on
natural language explanations. To this end, we propose the framework CLORE
(Classification by LOgical Reasoning on Explanations). While previous methods
usually regard textual information as implicit features, CLORE parses the
explanations into logical structure the and then reasons along this structure
on the input to produce a classification score. Experimental results on
explanation-based zero-shot classification benchmarks demonstrate that CLORE is
superior to baselines, mainly because it performs better on tasks requiring
more logical reasoning. Alongside classification decisions, CLORE can provide
the logical parsing and reasoning process as a form of rationale. Through
empirical analysis we demonstrate that CLORE is also less affected by
linguistic biases than baselines.Comment: 8 pages, 5 figure
TextGuard: Provable Defense against Backdoor Attacks on Text Classification
Backdoor attacks have become a major security threat for deploying machine
learning models in security-critical applications. Existing research endeavors
have proposed many defenses against backdoor attacks. Despite demonstrating
certain empirical defense efficacy, none of these techniques could provide a
formal and provable security guarantee against arbitrary attacks. As a result,
they can be easily broken by strong adaptive attacks, as shown in our
evaluation. In this work, we propose TextGuard, the first provable defense
against backdoor attacks on text classification. In particular, TextGuard first
divides the (backdoored) training data into sub-training sets, achieved by
splitting each training sentence into sub-sentences. This partitioning ensures
that a majority of the sub-training sets do not contain the backdoor trigger.
Subsequently, a base classifier is trained from each sub-training set, and
their ensemble provides the final prediction. We theoretically prove that when
the length of the backdoor trigger falls within a certain threshold, TextGuard
guarantees that its prediction will remain unaffected by the presence of the
triggers in training and testing inputs. In our evaluation, we demonstrate the
effectiveness of TextGuard on three benchmark text classification tasks,
surpassing the certification accuracy of existing certified defenses against
backdoor attacks. Furthermore, we propose additional strategies to enhance the
empirical performance of TextGuard. Comparisons with state-of-the-art empirical
defenses validate the superiority of TextGuard in countering multiple backdoor
attacks. Our code and data are available at
https://github.com/AI-secure/TextGuard.Comment: Accepted by NDSS Symposium 202
Better Context Makes Better Code Language Models: A Case Study on Function Call Argument Completion
Pretrained code language models have enabled great progress towards program
synthesis. However, common approaches only consider in-file local context and
thus miss information and constraints imposed by other parts of the codebase
and its external dependencies. Existing code completion benchmarks also lack
such context. To resolve these restrictions we curate a new dataset of
permissively licensed Python packages that includes full projects and their
dependencies and provide tools to extract non-local information with the help
of program analyzers. We then focus on the task of function call argument
completion which requires predicting the arguments to function calls. We show
that existing code completion models do not yield good results on our
completion task. To better solve this task, we query a program analyzer for
information relevant to a given function call, and consider ways to provide the
analyzer results to different code completion models during inference and
training. Our experiments show that providing access to the function
implementation and function usages greatly improves the argument completion
performance. Our ablation study provides further insights on how different
types of information available from the program analyzer and different ways of
incorporating the information affect the model performance.Comment: 12 pages. Accepted to AAAI 202
Reinforcement-Learning based Portfolio Management with Augmented Asset Movement Prediction States
Portfolio management (PM) is a fundamental financial planning task that aims
to achieve investment goals such as maximal profits or minimal risks. Its
decision process involves continuous derivation of valuable information from
various data sources and sequential decision optimization, which is a
prospective research direction for reinforcement learning (RL). In this paper,
we propose SARL, a novel State-Augmented RL framework for PM. Our framework
aims to address two unique challenges in financial PM: (1) data heterogeneity
-- the collected information for each asset is usually diverse, noisy and
imbalanced (e.g., news articles); and (2) environment uncertainty -- the
financial market is versatile and non-stationary. To incorporate heterogeneous
data and enhance robustness against environment uncertainty, our SARL augments
the asset information with their price movement prediction as additional
states, where the prediction can be solely based on financial data (e.g., asset
prices) or derived from alternative sources such as news. Experiments on two
real-world datasets, (i) Bitcoin market and (ii) HighTech stock market with
7-year Reuters news articles, validate the effectiveness of SARL over existing
PM approaches, both in terms of accumulated profits and risk-adjusted profits.
Moreover, extensive simulations are conducted to demonstrate the importance of
our proposed state augmentation, providing new insights and boosting
performance significantly over standard RL-based PM method and other baselines.Comment: AAAI 202
Improving Certified Robustness via Statistical Learning with Logical Reasoning
Intensive algorithmic efforts have been made to enable the rapid improvements
of certificated robustness for complex ML models recently. However, current
robustness certification methods are only able to certify under a limited
perturbation radius. Given that existing pure data-driven statistical
approaches have reached a bottleneck, in this paper, we propose to integrate
statistical ML models with knowledge (expressed as logical rules) as a
reasoning component using Markov logic networks (MLN, so as to further improve
the overall certified robustness. This opens new research questions about
certifying the robustness of such a paradigm, especially the reasoning
component (e.g., MLN). As the first step towards understanding these questions,
we first prove that the computational complexity of certifying the robustness
of MLN is #P-hard. Guided by this hardness result, we then derive the first
certified robustness bound for MLN by carefully analyzing different model
regimes. Finally, we conduct extensive experiments on five datasets including
both high-dimensional images and natural language texts, and we show that the
certified robustness with knowledge-based logical reasoning indeed
significantly outperforms that of the state-of-the-art
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
Generative Pre-trained Transformer (GPT) models have exhibited exciting
progress in capabilities, capturing the interest of practitioners and the
public alike. Yet, while the literature on the trustworthiness of GPT models
remains limited, practitioners have proposed employing capable GPT models for
sensitive applications to healthcare and finance - where mistakes can be
costly. To this end, this work proposes a comprehensive trustworthiness
evaluation for large language models with a focus on GPT-4 and GPT-3.5,
considering diverse perspectives - including toxicity, stereotype bias,
adversarial robustness, out-of-distribution robustness, robustness on
adversarial demonstrations, privacy, machine ethics, and fairness. Based on our
evaluations, we discover previously unpublished vulnerabilities to
trustworthiness threats. For instance, we find that GPT models can be easily
misled to generate toxic and biased outputs and leak private information in
both training data and conversation history. We also find that although GPT-4
is usually more trustworthy than GPT-3.5 on standard benchmarks, GPT-4 is more
vulnerable given jailbreaking system or user prompts, potentially due to the
reason that GPT-4 follows the (misleading) instructions more precisely. Our
work illustrates a comprehensive trustworthiness evaluation of GPT models and
sheds light on the trustworthiness gaps. Our benchmark is publicly available at
https://decodingtrust.github.io/