417 research outputs found
EvoPrompting: Language Models for Code-Level Neural Architecture Search
Given the recent impressive accomplishments of language models (LMs) for code
generation, we explore the use of LMs as adaptive mutation and crossover
operators for an evolutionary neural architecture search (NAS) algorithm. While
NAS still proves too difficult a task for LMs to succeed at solely through
prompting, we find that the combination of evolutionary prompt engineering with
soft prompt-tuning, a method we term EvoPrompting, consistently finds diverse
and high performing models. We first demonstrate that EvoPrompting is effective
on the computationally efficient MNIST-1D dataset, where EvoPrompting produces
convolutional architecture variants that outperform both those designed by
human experts and naive few-shot prompting in terms of accuracy and model size.
We then apply our method to searching for graph neural networks on the CLRS
Algorithmic Reasoning Benchmark, where EvoPrompting is able to design novel
architectures that outperform current state-of-the-art models on 21 out of 30
algorithmic reasoning tasks while maintaining similar model size. EvoPrompting
is successful at designing accurate and efficient neural network architectures
across a variety of machine learning tasks, while also being general enough for
easy adaptation to other tasks beyond neural network design
Recommended from our members
Seasonal dynamics of bacterial meningitis: a time-series analysis
Background Bacterial meningitis, which is caused mainly by Neisseria meningitidis, Haemophilus infl uenzae, and
Streptococcus pneumoniae, infl icts a substantial burden of disease worldwide. Yet, the temporal dynamics of this
disease are poorly characterised and many questions remain about the ecology of the disease. We aimed to
comprehensively assess seasonal trends in bacterial meningitis on a global scale.
Methods We developed the fi rst bacterial meningitis global database by compiling monthly incidence data as reported
by country-level surveillance systems. Using country-level wavelet analysis, we identifi ed whether a 12 month periodic
component (annual seasonality) was detected in time-series that had at least 5 years of data with at least 40 cases
reported per year. We estimated the mean timing of disease activity by computing the centre of gravity of the
distribution of cases and investigated whether synchrony exists between the three pathogens responsible for most
cases of bacterial meningitis.
Findings We used country-level data from 66 countries, including from 47 countries outside the meningitis belt in
sub-Saharan Africa. A persistent seasonality was detected in 49 (96%) of the 51 time-series from 38 countries eligible
for inclusion in the wavelet analyses. The mean timing of disease activity had a latitudinal trend, with bacterial
meningitis seasons peaking during the winter months in countries in both the northern and southern hemispheres.
The three pathogens shared similar seasonality, but time-shifts diff ered slightly by country.
Interpretation Our fi ndings provide key insight into the seasonal dynamics of bacterial meningitis and add to
knowledge about the global epidemiology of meningitis and the host, environment, and pathogen characteristics
driving these patterns. Comprehensive understanding of global seasonal trends in meningitis could be used to design
more eff ective prevention and control strategies
Two Failures of Self-Consistency in the Multi-Step Reasoning of LLMs
Large language models (LLMs) have achieved widespread success on a variety of
in-context few-shot tasks, but this success is typically evaluated via
correctness rather than consistency. We argue that self-consistency is an
important criteria for valid multi-step reasoning in tasks where the solution
is composed of the answers to multiple sub-steps. We propose two types of
self-consistency that are particularly important for multi-step reasoning --
hypothetical consistency (a model's ability to predict what its output would be
in a hypothetical other context) and compositional consistency (consistency of
a model's final outputs when intermediate sub-steps are replaced with the
model's outputs for those steps). We demonstrate that multiple variants of the
GPT-3/-4 models exhibit poor consistency rates across both types of consistency
on a variety of tasks.Comment: Added GPT-4 result
Predicting The Helpfulness Of Online Product Reviewers: A Data Mining Approach
The purpose of this study is to propose a data mining approach to predict the helpfulness scores of online product reviewers. Such prediction can facilitate consumers to judge whether to believe or disbelieve reviews written by different reviewers and can help e-stores or third-party product review websites to target and retain quality reviewers. In this study, we identify eight independent variables from the perspectives of reviewers’ review behavior and trust network to predict the helpfulness scores for these reviewers. We adopt M5 and SVM Regression as our underlying learning algorithms. Our empirical evaluation results on the basis of two product categories (i.e., Car and Computer) suggest that our proposed helpfulness prediction technique can predict the helpfulness scores of online product reviewers
Deep Reflection Prior
Reflections are very common phenomena in our daily photography, which
distract people's attention from the scene behind the glass. The problem of
removing reflection artifacts is important but challenging due to its ill-posed
nature. Recent learning-based approaches have demonstrated a significant
improvement in removing reflections. However, these methods are limited as they
require a large number of synthetic reflection/clean image pairs for
supervision, at the risk of overfitting in the synthetic image domain. In this
paper, we propose a learning-based approach that captures the reflection
statistical prior for single image reflection removal. Our algorithm is driven
by optimizing the target with joint constraints enhanced between multiple input
images during the training stage, but is able to eliminate reflections only
from a single input for evaluation. Our framework allows to predict both
background and reflection via a one-branch deep neural network, which is
implemented by the controllable latent code that indicates either the
background or reflection output. We demonstrate superior performance over the
state-of-the-art methods on a large range of real-world images. We further
provide insightful analysis behind the learned latent code, which may inspire
more future work
Training Language Models with Language Feedback at Scale
Pretrained language models often generate outputs that are not in line with
human preferences, such as harmful text or factually incorrect summaries.
Recent work approaches the above issues by learning from a simple form of human
feedback: comparisons between pairs of model-generated outputs. However,
comparison feedback only conveys limited information about human preferences.
In this paper, we introduce Imitation learning from Language Feedback (ILF), a
new approach that utilizes more informative language feedback. ILF consists of
three steps that are applied iteratively: first, conditioning the language
model on the input, an initial LM output, and feedback to generate refinements.
Second, selecting the refinement incorporating the most feedback. Third,
finetuning the language model to maximize the likelihood of the chosen
refinement given the input. We show theoretically that ILF can be viewed as
Bayesian Inference, similar to Reinforcement Learning from human feedback. We
evaluate ILF's effectiveness on a carefully-controlled toy task and a realistic
summarization task. Our experiments demonstrate that large language models
accurately incorporate feedback and that finetuning with ILF scales well with
the dataset size, even outperforming finetuning on human summaries. Learning
from both language and comparison feedback outperforms learning from each
alone, achieving human-level summarization performance
Improving Code Generation by Training with Natural Language Feedback
The potential for pre-trained large language models (LLMs) to use natural
language feedback at inference time has been an exciting recent development. We
build upon this observation by formalizing an algorithm for learning from
natural language feedback at training time instead, which we call Imitation
learning from Language Feedback (ILF). ILF requires only a small amount of
human-written feedback during training and does not require the same feedback
at test time, making it both user-friendly and sample-efficient. We further
show that ILF can be seen as a form of minimizing the KL divergence to the
ground truth distribution and demonstrate a proof-of-concept on a neural
program synthesis task. We use ILF to improve a Codegen-Mono 6.1B model's
pass@1 rate by 38% relative (and 10% absolute) on the Mostly Basic Python
Problems (MBPP) benchmark, outperforming both fine-tuning on MBPP and
fine-tuning on repaired programs written by humans. Overall, our results
suggest that learning from human-written natural language feedback is both more
effective and sample-efficient than training exclusively on demonstrations for
improving an LLM's performance on code generation tasks
A Deeper Look at Autonomous Vehicle Ethics: An Integrative Ethical Decision-Making Framework to Explain Moral Pluralism
The autonomous vehicle (AV) is one of the first commercialized AI-embedded robots to make autonomous decisions. Despite technological advancements, unavoidable AV accidents that result in life-and-death consequences cannot be completely eliminated. The emerging social concern of how an AV should make ethical decisions during unavoidable accidents is referred to as the moral dilemma of AV, which has promoted heated discussions among various stakeholders. However, there are research gaps in explainable AV ethical decision-making processes that predict how AVs’ moral behaviors are made that are acceptable from the AV users’ perspectives. This study addresses the key question: What factors affect ethical behavioral intentions in the AV moral dilemma? To answer this question, this study draws theories from multidisciplinary research fields to propose the “Integrative ethical decision-making framework for the AV moral dilemma.” The framework includes four interdependent ethical decision-making stages: AV moral dilemma issue framing, intuitive moral reasoning, rational moral reasoning, and ethical behavioral intention making. Further, the framework includes variables (e.g., perceived moral intensity, individual factors, and personal moral philosophies) that influence the ethical decision-making process. For instance, the framework explains that AV users from Eastern cultures will tend to endorse a situationist ethics position (high idealism and high relativism), which views that ethical decisions are relative to context, compared to AV users from Western cultures. This proposition is derived from the link between individual factors and personal moral philosophy. Moreover, the framework proposes a dual-process theory, which explains that both intuitive and rational moral reasoning are integral processes of ethical decision-making during the AV moral dilemma. Further, this framework describes that ethical behavioral intentions that lead to decisions in the AV moral dilemma are not fixed, but are based on how an individual perceives the seriousness of the situation, which is shaped by their personal moral philosophy. This framework provides a step-by-step explanation of how pluralistic ethical decision-making occurs, reducing the abstractness of AV moral reasoning processes
- …