60 research outputs found
Méthodes d'apprentissage appliquées aux heuristiques de recherche pour les problèmes de satisfaction de contraintes
Résumé
Motivation : La programmation par contraintes (PPC) propose un cadre formel pour représenter et résoudre des problèmes combinatoires concrets tels que la conception d'horaires de personnel d'hôpital ou l'allocation des portes d'embarquement dans un aéroport. Un problème de satisfaction de contraintes se modélise à l'aide de relations logiques, ou contraintes, entre des variables. La résolution du problème revient alors à identifier une solution qui respecte ces contraintes. Même si l'utilisation de techniques d'inférence puissantes, comme les algorithmes de filtrage, permettent de réduire sensiblement l'espace des solutions, cela reste insuffisant. Il convient alors de guider la recherche à l'aide d'heuristiques.
Enjeux et Contexte théorique : Cambazard et Jussien (2005) justifient l'intérêt porté aux heuristiques de recherche en les qualifiant de « Saint Graal » à la fois des communautés de recherche opérationnelle (RO) et de programmation par contraintes (PPC). Parmi les meilleures heuristiques actuelles figurent Impact-Based Search (IBS – Refalo (2004)), qui utilise la réduction de domaine après branchement, et maxSD (Zanarini et Pesant, 2007}), qui utilise le dénombrement des solutions des contraintes. D'autres heuristiques se basent quant à elles sur une estimation de la distribution de solutions, comme les méthodes Belief-Propagation (BP – Kschischang et al.(2001}) et Survey-Propagation (SP – Mezard et al. 2002). Ces méthodes, dites d'inférence, se sont notamment distinguées pour la résolution de problèmes de satisfaction booléenne. Récemment, une variante de ces deux méthodes, dénommée Expectation-Maximization Belief-Propagation, a été proposée par Hsu et al. (2007).
Enfin, ces méthodes d'inférence permettent également de déterminer certaines caractéristiques de la structure du problème que l'on cherche à résoudre. À titre d'exemple, de telles estimations permettent ainsi d'identifier les variables dites backbones (Kilby et al. (2005)), qui sont les variables qui prennent toujours la même valeur quelle que soit la solution. Ainsi, une estimation de la distribution des solutions ne fournit pas seulement de l'information utile pour une heuristique, mais également de l'information sur la structure sous-tendante du problème.
----------Abstract
Motivation Constraint Programming (CP) is a paradigm to model and solve pratical combinatorial problems, such as nurse scheduling or airport gate assignment. CP models such problems as Constraint Satisfaction Problems (CSPs), i.e. a set of relations between variables. Solving a CSP then becomes equivalent to finding a solution that satisfies all relations. Although strong inference techniques such as filtering algorithms lead to a much smaller solution space, the solution space typically remains too large to be explored exhaustively. This is where search heuristics come in and guide the search toward promising areas.
Significance and Theoritical Background Cambazard and Jussien (2005) emphasize the importance of search heuristics when they refer to them as the Holy Grail of both Operations Research (OR) and Constraint Programming (CP) communities. Two of the best current search heuristics are Impact-Based Search (IBS - Refalo (2004)) which exploits domain reduction, and Zanarini and Pesant (2007), which exploits constraint-based solution counting. Other heuristics are designed to estimate the distribution of solutions, such as Belief-Propagation (BP - Kschischang et al. (2001)) and Survey-Propagation (SP - Mezard et al. (2002)). These inference methods have been particularly suitable to Satisfiability (SAT) problems. Recently, Hsu et al. proposed Expectation-Maximization Belief-Propagation (EMBP), a variation of these two methods.
In addition, these inference methods also provide information about the underlying structure of the problem. For example, such estimates enable to detect backbone variables (Kilby et al. (2005), i.e. that always take the same value regardless of the solution. As a result, these estimates not only direct the search but also provide structural information about the solution space
Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
Pre-trained Transformer models like T5 and BART have advanced the state of
the art on a wide range of text generation tasks. Compressing these models into
smaller ones has become critically important for practical use. Common neural
network compression techniques such as knowledge distillation or quantization
are limited to static compression where the compression ratio is fixed. In this
paper, we introduce Modular Transformers, a modularized encoder-decoder
framework for flexible sequence-to-sequence model compression. Modular
Transformers train modularized layers that have the same function of two or
more consecutive layers in the original model via module replacing and
knowledge distillation. After training, the modularized layers can be flexibly
assembled into sequence-to-sequence models that meet different
performance-efficiency trade-offs. Experimental results show that after a
single training phase, by simply varying the assembling strategy, Modular
Transformers can achieve flexible compression ratios from 1.1x to 6x with
little to moderate relative performance drop.Comment: ACL 2023 Finding
Scruples: A Corpus of Community Ethical Judgments on 32,000 Real-Life Anecdotes
As AI systems become an increasing part of people's everyday lives, it
becomes ever more important that they understand people's ethical norms.
Motivated by descriptive ethics, a field of study that focuses on people's
descriptive judgments rather than theoretical prescriptions on morality, we
investigate a novel, data-driven approach to machine ethics.
We introduce Scruples, the first large-scale dataset with 625,000 ethical
judgments over 32,000 real-life anecdotes. Each anecdote recounts a complex
ethical situation, often posing moral dilemmas, paired with a distribution of
judgments contributed by the community members. Our dataset presents a major
challenge to state-of-the-art neural language models, leaving significant room
for improvement. However, when presented with simplified moral situations, the
results are considerably more promising, suggesting that neural models can
effectively learn simpler ethical building blocks.
A key take-away of our empirical analysis is that norms are not always
clean-cut; many situations are naturally divisive. We present a new method to
estimate the best possible performance on such tasks with inherently diverse
label distributions, and explore likelihood functions that separate intrinsic
from model uncertainty.Comment: 18 pages, 14 tables, 18 figures. Accepted to AAAI 2021. For
associated code and data, see https://github.com/allenai/scruple
Dynamic Neuro-Symbolic Knowledge Graph Construction for Zero-shot Commonsense Question Answering
Understanding narratives requires reasoning about implicit world knowledge
related to the causes, effects, and states of situations described in text. At
the core of this challenge is how to access contextually relevant knowledge on
demand and reason over it.
In this paper, we present initial studies toward zero-shot commonsense
question answering by formulating the task as inference over dynamically
generated commonsense knowledge graphs. In contrast to previous studies for
knowledge integration that rely on retrieval of existing knowledge from static
knowledge graphs, our study requires commonsense knowledge integration where
contextually relevant knowledge is often not present in existing knowledge
bases. Therefore, we present a novel approach that generates
contextually-relevant symbolic knowledge structures on demand using generative
neural commonsense knowledge models.
Empirical results on two datasets demonstrate the efficacy of our
neuro-symbolic approach for dynamically constructing knowledge graphs for
reasoning. Our approach achieves significant performance boosts over pretrained
language models and vanilla knowledge models, all while providing interpretable
reasoning paths for its predictions
Commonsense Knowledge Transfer for Pre-trained Language Models
Despite serving as the foundation models for a wide range of NLP benchmarks,
pre-trained language models have shown limited capabilities of acquiring
implicit commonsense knowledge from self-supervision alone, compared to
learning linguistic and factual knowledge that appear more explicitly in the
surface patterns in text. In this work, we introduce commonsense knowledge
transfer, a framework to transfer the commonsense knowledge stored in a neural
commonsense knowledge model to a general-purpose pre-trained language model. It
first exploits general texts to form queries for extracting commonsense
knowledge from the neural commonsense knowledge model and then refines the
language model with two self-supervised objectives: commonsense mask infilling
and commonsense relation prediction, which align human language with the
underlying commonsense knowledge. Empirical results show that our approach
consistently improves the model's performance on downstream tasks that require
commonsense reasoning. Moreover, we find that the improvement is more
significant in the few-shot setting. This suggests that our approach helps
language models better transfer to downstream tasks without extensive
supervision by injecting commonsense knowledge into their parameters.Comment: ACL 2023 Finding
Polynomial Time Construction for Spatially Balanced Latin Squares
In this paper we propose a construction that generates spatially balanced
Latin squares (SBLSs) in polynomial time. These structures are central to
the design of agronomic experiments, as they avoid biases that are otherwise
unintentionally introduced due to spatial auto-correlation. Previous
approaches were able to generate SBLSs of order up to 35 and required
about two weeks of computation. Our algorithm runs in O(n2) and generates
SBLSs of arbitrary order n where 2n + 1 is prime. For example, this
algorithm generates a SBLS of order 999 in a fraction of a second.National Science Foundation (NSF Expeditions
in Computing award for Computational Sustainability, grant 0832782;
NSF IIS award, grant 0514429), Intelligent Information Systems Institute, Cornell University (Air Force O ce of Scienti c Research, AFOSR,
grant FA9550-04-1-0151), Natural Sciences and Engineering Research Council of Canada (NSERC
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Dogwhistles are coded expressions that simultaneously convey one meaning to a
broad audience and a second one, often hateful or provocative, to a narrow
in-group; they are deployed to evade both political repercussions and
algorithmic content moderation. For example, in the sentence 'we need to end
the cosmopolitan experiment,' the word 'cosmopolitan' likely means 'worldly' to
many, but secretly means 'Jewish' to a select few. We present the first
large-scale computational investigation of dogwhistles. We develop a typology
of dogwhistles, curate the largest-to-date glossary of over 300 dogwhistles
with rich contextual information and examples, and analyze their usage in
historical U.S. politicians' speeches. We then assess whether a large language
model (GPT-3) can identify dogwhistles and their meanings, and find that
GPT-3's performance varies widely across types of dogwhistles and targeted
groups. Finally, we show that harmful content containing dogwhistles avoids
toxicity detection, highlighting online risks of such coded language. This work
sheds light on the theoretical and applied importance of dogwhistles in both
NLP and computational social science, and provides resources for future
research in modeling dogwhistles and mitigating their online harms.Comment: ACL 2023, see https://dogwhistles.allen.ai/ for the glossary and
other material
- …