173 research outputs found
ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models
Generative retrieval (Wang et al., 2022; Tay et al., 2022) is a new approach
for end-to-end document retrieval that directly generates document identifiers
given an input query. Techniques for designing effective, high-quality document
IDs remain largely unexplored. We introduce ACID, in which each document's ID
is composed of abstractive keyphrases generated by a large language model,
rather than an integer ID sequence as done in past work. We compare our method
with the current state-of-the-art technique for ID generation, which produces
IDs through hierarchical clustering of document embeddings. We also examine
simpler methods to generate natural-language document IDs, including the naive
approach of using the first k words of each document as its ID or words with
high BM25 scores in that document. We show that using ACID improves top-10 and
top-20 accuracy by 15.6% and 14.4% (relative) respectively versus the
state-of-the-art baseline on the MSMARCO 100k retrieval task, and 4.4% and 4.0%
respectively on the Natural Questions 100k retrieval task. Our results
demonstrate the effectiveness of human-readable, natural-language IDs in
generative retrieval with LMs. The code for reproducing our results and the
keyword-augmented datasets will be released on formal publication
Pebbles versus planetesimals
In the core accretion scenario, a massive core forms first and then accretes an envelope. When discussing how this core forms some divergences appear. First scenarios of planet formation predict the accretion of km-sized bodies, called planetesimals, while more recent works suggest growth by accretion of pebbles, which are cm-sized objects. These two accretion models are often discussed separately and we aim here at comparing the outcomes of the two models with identical initial conditions. We use two distinct codes: one computing planetesimal accretion, the other pebble accretion. Using a population synthesis approach, we compare planet simulations and study the impact of the two solid accretion models, focussing on the formation of single planets. We find that the planetesimal model predicts the formation of more giant planets, while the pebble accretion model forms more super-Earth mass planets. This is due to the pebble isolation mass concept, which prevents planets formed by pebble accretion to accrete gas efficiently before reaching Miso. This translates into a population of planets that are not heavy enough to accrete a consequent envelope but that are in a mass range where type I migration is very efficient. We also find higher gas mass fractions for a given core mass for the pebble model compared to the planetesimal one caused by luminosity differences. This also implies planets with lower densities which could be confirmed observationally. Focusing on giant planets, we conclude that the sensitivity of their formation differs: for the pebble accretion model, the time at which the embryos are formed, as well as the period over which solids are accreted strongly impact the results, while for the planetesimal model it depends on the planetesimal size and on the splitting in the amount of solids available to form planetesimals
How Much Does Attention Actually Attend? Questioning the Importance of Attention in Pretrained Transformers
The attention mechanism is considered the backbone of the widely-used
Transformer architecture. It contextualizes the input by computing
input-specific attention matrices. We find that this mechanism, while powerful
and elegant, is not as important as typically thought for pretrained language
models. We introduce PAPA, a new probing method that replaces the
input-dependent attention matrices with constant ones -- the average attention
weights over multiple inputs. We use PAPA to analyze several established
pretrained Transformers on six downstream tasks. We find that without any
input-dependent attention, all models achieve competitive performance -- an
average relative drop of only 8% from the probing baseline. Further, little or
no performance drop is observed when replacing half of the input-dependent
attention matrices with constant (input-independent) ones. Interestingly, we
show that better-performing models lose more from applying our method than
weaker models, suggesting that the utilization of the input-dependent attention
mechanism might be a factor in their success. Our results motivate research on
simpler alternatives to input-dependent attention, as well as on methods for
better utilization of this mechanism in the Transformer architecture.Comment: Findings of EMNLP 202
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Despite thousands of researchers, engineers, and artists actively working on
improving text-to-image generation models, systems often fail to produce images
that accurately align with the text inputs. We introduce TIFA (Text-to-Image
Faithfulness evaluation with question Answering), an automatic evaluation
metric that measures the faithfulness of a generated image to its text input
via visual question answering (VQA). Specifically, given a text input, we
automatically generate several question-answer pairs using a language model. We
calculate image faithfulness by checking whether existing VQA models can answer
these questions using the generated image. TIFA is a reference-free metric that
allows for fine-grained and interpretable evaluations of generated images. TIFA
also has better correlations with human judgments than existing metrics. Based
on this approach, we introduce TIFA v1.0, a benchmark consisting of 4K diverse
text inputs and 25K questions across 12 categories (object, counting, etc.). We
present a comprehensive evaluation of existing text-to-image models using TIFA
v1.0 and highlight the limitations and challenges of current models. For
instance, we find that current text-to-image models, despite doing well on
color and material, still struggle in counting, spatial relations, and
composing multiple objects. We hope our benchmark will help carefully measure
the research progress in text-to-image synthesis and provide valuable insights
for further research
Multidisciplinary design of a more electric regional aircraft including certification constraints
The use of electrified on-board systems is increasingly more required to reduce aircraft
complexity, polluting emissions, and its life cycle cost. However, the more and all-electric
aircraft configurations are still uncommon in the civil aviation context and their certifiability
has yet to be proven in some aircraft segments. The aim of the present paper is to define a
multidisciplinary design problem which includes some disciplines pertaining to the
certification domain. In particular, the study is focused on the preliminary design of a 19
passengers small regional turboprop aircraft. Different on-board systems architectures with
increasing electrification levels are considered. These architectures imply the use of bleedless
technologies including electrified ice protection and environmental control systems. The use
of electric actuators for secondary surfaces and landing gear are also considered. The aircraft design, which includes aerodynamic, structural, systems and propulsion domains, is then
assessed by some certification disciplines. In particular, minimum performance, external noise
and safety assessments are included in the workflow giving some insights on the aircraft
certifiability. The results show a reduction of 3% of MTOM and 3% of fuel mass depending
on the systems architecture selected. From the certification side, the design has proven to be
certifiable and the margins with the certification constraint can be controlled to improve the
overall design
GENIE: A Leaderboard for Human-in-the-Loop Evaluation of Text Generation
Leaderboards have eased model development for many NLP datasets by
standardizing their evaluation and delegating it to an independent external
repository. Their adoption, however, is so far limited to tasks that can be
reliably evaluated in an automatic manner. This work introduces GENIE, an
extensible human evaluation leaderboard, which brings the ease of leaderboards
to text generation tasks. GENIE automatically posts leaderboard submissions to
crowdsourcing platforms asking human annotators to evaluate them on various
axes (e.g., correctness, conciseness, fluency) and compares their answers to
various automatic metrics. We introduce several datasets in English to GENIE,
representing four core challenges in text generation: machine translation,
summarization, commonsense reasoning, and machine comprehension. We provide
formal granular evaluation metrics and identify areas for future research. We
make GENIE publicly available and hope that it will spur progress in language
generation models as well as their automatic and manual evaluation
- …