3,276 research outputs found
Automatic Chain of Thought Prompting in Large Language Models
Large language models (LLMs) can perform complex reasoning by generating
intermediate reasoning steps. Providing these steps for prompting
demonstrations is called chain-of-thought (CoT) prompting. CoT prompting has
two major paradigms. One leverages a simple prompt like "Let's think step by
step" to facilitate step-by-step thinking before answering a question. The
other uses a few manual demonstrations one by one, each composed of a question
and a reasoning chain that leads to an answer. The superior performance of the
second paradigm hinges on the hand-crafting of task-specific demonstrations one
by one. We show that such manual efforts may be eliminated by leveraging LLMs
with the "Let's think step by step" prompt to generate reasoning chains for
demonstrations one by one, i.e., let's think not just step by step, but also
one by one. However, these generated chains often come with mistakes. To
mitigate the effect of such mistakes, we find that diversity matters for
automatically constructing demonstrations. We propose an automatic CoT
prompting method: Auto-CoT. It samples questions with diversity and generates
reasoning chains to construct demonstrations. On ten public benchmark reasoning
tasks with GPT-3, Auto-CoT consistently matches or exceeds the performance of
the CoT paradigm that requires manual designs of demonstrations. Code is
available at https://github.com/amazon-research/auto-co
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Hypercomplex neural networks have proven to reduce the overall number of parameters while ensuring valuable performance by leveraging the properties of Clifford algebras. Recently, hypercomplex linear layers have been further improved by involving efficient parameterized Kronecker products. In this article, we define the parameterization of hypercomplex convolutional layers and introduce the family of parameterized hypercomplex neural networks (PHNNs) that are lightweight and efficient large-scale models. Our method grasps the convolution rules and the filter organization directly from data without requiring a rigidly predefined domain structure to follow. PHNNs are flexible to operate in any user-defined or tuned domain, from 1-D to nD regardless of whether the algebra rules are preset. Such a malleability allows processing multidimensional inputs in their natural domain without annexing further dimensions, as done, instead, in quaternion neural networks (QNNs) for 3-D inputs like color images. As a result, the proposed family of PHNNs operates with 1/n free parameters as regards its analog in the real domain. We demonstrate the versatility of this approach to multiple domains of application by performing experiments on various image datasets and audio datasets in which our method outperforms real and quaternion-valued counterparts
Privacy risk and de-anonymization in heterogeneous information networks
Anonymized user datasets are often released for research or industry applications. As an example, t.qq.com released its anonymized users’ profile, social interaction, and recommendation log data in KDD Cup 2012 to call for recommendation algorithms. Since the entities (users and so on) and edges (links among entities) are of multiple types, the released social network is a heterogeneous information network. Prior work has shown how privacy can be compromised in homogeneous information networks by the use of specific types of graph patterns. We show how the extra information derived from heterogeneity can be used to relax these assumptions. To characterize and demonstrate this added threat, we formally define privacy risk in an anonymized heterogeneous information network to identify the vulnerability in the possible way such data are released, and further present a new de-anonymization attack that exploits the vulnerability. Our attack successfully de-anonymized most individuals involved in the data. We further show that the general ideas of exploiting privacy risk and de-anonymizing heterogeneous information networks can be extended to more general graphs
Analyzing intentions from big data traces of human activities
The rapid growth of big data formed by human activities makes research on intention analysis both challenging and rewarding. We study multifaceted problems in analyzing intentions from big data traces of human activities, and such problems span a range of machine learning, optimization, and security and privacy.
We show that analyzing intentions from industry-scale human activity big data can effectively improve the accuracy of computational models. Specifically, we take query auto-completion as a case study. We identify two hitherto-undiscovered problems: adaptive query auto-completion and mobile query auto-completion. We develop two computational models by analyzing intentions from big data traces of human activities on search interface interactions and on mobile application usage respectively.
Solving the large-scale optimization problems in the proposed query auto-completion models drives deeper studies of the solvers. Hence, we consider the generalized machine learning problem settings and focus on developing lightweight stochastic algorithms as solvers to the large-scale convex optimization problems with theoretical guarantees. For optimizing strongly convex objectives, we design an accelerated stochastic block coordinate descent method with optimal sampling; for optimizing non-strongly convex objectives, we design a stochastic variance reduced alternating direction method of multipliers with the doubling-trick.
Inevitably, human activities are human-centric, thus its research can inform security and privacy. On one hand, intention analysis research from human activities can be motivated from the security perspective. For instance, to reduce false alarms of medical service providers' suspicious accesses to electronic health records, we discover potential de facto diagnosis specialties that reflect such providers' genuine and permissible intentions of accessing records with certain diagnoses. On the other hand, we examine the privacy risk in anonymized heterogeneous information networks representing large-scale human activities, such as in social networking. Such data are released for external researchers to improve the prediction accuracy for users' online social networking intentions on the publishers' microblogging site. We show a negative result that makes a compelling argument: privacy must be a central goal for sensitive human activity data publishers
In-Context Learning with Iterative Demonstration Selection
Spurred by advancements in scale, large language models (LLMs) have
demonstrated strong few-shot learning ability via in-context learning (ICL).
However, the performance of ICL has been shown to be highly sensitive to the
selection of few-shot demonstrations. Selecting the most suitable examples as
context remains an ongoing challenge and an open problem. Existing literature
has highlighted the importance of selecting examples that are diverse or
semantically similar to the test sample while ignoring the fact that the
optimal selection dimension, i.e., diversity or similarity, is task-specific.
Leveraging the merits of both dimensions, we propose Iterative Demonstration
Selection (IDS). Using zero-shot chain-of-thought reasoning (Zero-shot-CoT),
IDS iteratively selects examples that are diverse but still strongly correlated
with the test sample as ICL demonstrations. Specifically, IDS applies
Zero-shot-CoT to the test sample before demonstration selection. The output
reasoning path is then used to choose demonstrations that are prepended to the
test sample for inference. The generated answer is accompanied by its
corresponding reasoning path for extracting a new set of demonstrations in the
next iteration. After several iterations, IDS adopts majority voting to obtain
the final result. Through extensive experiments on tasks including commonsense
reasoning, question answering, topic classification, and sentiment analysis, we
demonstrate that IDS can consistently outperform existing ICL demonstration
selection methods
Is ChatGPT a General-Purpose Natural Language Processing Task Solver?
Spurred by advancements in scale, large language models (LLMs) have
demonstrated the ability to perform a variety of natural language processing
(NLP) tasks zero-shot -- i.e., without adaptation on downstream data. Recently,
the debut of ChatGPT has drawn a great deal of attention from the natural
language processing (NLP) community due to the fact that it can generate
high-quality responses to human input and self-correct previous mistakes based
on subsequent conversations. However, it is not yet known whether ChatGPT can
serve as a generalist model that can perform many NLP tasks zero-shot. In this
work, we empirically analyze the zero-shot learning ability of ChatGPT by
evaluating it on 20 popular NLP datasets covering 7 representative task
categories. With extensive empirical studies, we demonstrate both the
effectiveness and limitations of the current version of ChatGPT. We find that
ChatGPT performs well on many tasks favoring reasoning capabilities (e.g.,
arithmetic reasoning) while it still faces challenges when solving specific
tasks such as sequence tagging. We additionally provide in-depth analysis
through qualitative case studies
- …