31 research outputs found
A Typology to Explore the Mitigation of Shortcut Behavior
As machine learning models become increasingly larger, trained weakly
supervised on large, possibly uncurated data sets, it becomes increasingly
important to establish mechanisms for inspecting, interacting, and revising
models to mitigate learning shortcuts and guarantee their learned knowledge is
aligned with human knowledge. The recently proposed XIL framework was developed
for this purpose, and several such methods have been introduced, each with
individual motivations and methodological details. In this work, we provide a
unification of various XIL methods into a single typology by establishing a
common set of basic modules. In doing so, we pave the way for a principled
comparison of existing, but, importantly, also future XIL approaches. In
addition, we discuss existing and introduce novel measures and benchmarks for
evaluating the overall abilities of a XIL method. Given this extensive toolbox,
including our typology, measures, and benchmarks, we finally compare several
recent XIL methods methodologically and quantitatively. In our evaluations, all
methods prove to revise a model successfully. However, we found remarkable
differences in individual benchmark tasks, revealing valuable
application-relevant aspects for integrating these benchmarks in developing
future methods
Revision Transformers: Instructing Language Models to Change their Values
Current transformer language models (LM) are large-scale models with billions
of parameters. They have been shown to provide high performances on a variety
of tasks but are also prone to shortcut learning and bias. Addressing such
incorrect model behavior via parameter adjustments is very costly. This is
particularly problematic for updating dynamic concepts, such as moral values,
which vary culturally or interpersonally. In this work, we question the current
common practice of storing all information in the model parameters and propose
the Revision Transformer (RiT) to facilitate easy model updating. The specific
combination of a large-scale pre-trained LM that inherently but also diffusely
encodes world knowledge with a clear-structured revision engine makes it
possible to update the model's knowledge with little effort and the help of
user interaction. We exemplify RiT on a moral dataset and simulate user
feedback demonstrating strong performance in model revision even with small
data. This way, users can easily design a model regarding their preferences,
paving the way for more transparent AI models
Boosting Object Representation Learning via Motion and Object Continuity
Recent unsupervised multi-object detection models have shown impressive
performance improvements, largely attributed to novel architectural inductive
biases. Unfortunately, they may produce suboptimal object encodings for
downstream tasks. To overcome this, we propose to exploit object motion and
continuity, i.e., objects do not pop in and out of existence. This is
accomplished through two mechanisms: (i) providing priors on the location of
objects through integration of optical flow, and (ii) a contrastive object
continuity loss across consecutive image frames. Rather than developing an
explicit deep architecture, the resulting Motion and Object Continuity (MOC)
scheme can be instantiated using any baseline object detection model. Our
results show large improvements in the performances of a SOTA model in terms of
object discovery, convergence speed and overall latent object representations,
particularly for playing Atari games. Overall, we show clear benefits of
integrating motion and object continuity for downstream tasks, moving beyond
object representation learning based only on reconstruction.Comment: 8 pages main text, 32 tables, 21 Figure
V-LoL: A Diagnostic Dataset for Visual Logical Learning
Despite the successes of recent developments in visual AI, different
shortcomings still exist; from missing exact logical reasoning, to abstract
generalization abilities, to understanding complex and noisy scenes.
Unfortunately, existing benchmarks, were not designed to capture more than a
few of these aspects. Whereas deep learning datasets focus on visually complex
data but simple visual reasoning tasks, inductive logic datasets involve
complex logical learning tasks, however, lack the visual component. To address
this, we propose the visual logical learning dataset, V-LoL, that seamlessly
combines visual and logical challenges. Notably, we introduce the first
instantiation of V-LoL, V-LoL-Trains, -- a visual rendition of a classic
benchmark in symbolic AI, the Michalski train problem. By incorporating
intricate visual scenes and flexible logical reasoning tasks within a versatile
framework, V-LoL-Trains provides a platform for investigating a wide range of
visual logical learning challenges. We evaluate a variety of AI systems
including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our
evaluations demonstrate that even state-of-the-art AI faces difficulties in
dealing with visual logical learning challenges, highlighting unique advantages
and limitations specific to each methodology. Overall, V-LoL opens up new
avenues for understanding and enhancing current abilities in visual logical
learning for AI systems
MiKlip - a National Research Project on Decadal Climate Prediction
A German national project coordinates research on improving a global decadal climate prediction system for future operational use.
MiKlip, an eight-year German national research project on decadal climate prediction, is organized around a global prediction system comprising the climate model MPI-ESM together with an initialization procedure and a model evaluation system. This paper summarizes the lessons learned from MiKlip so far; some are purely scientific, others concern strategies and structures of research that targets future operational use.
Three prediction-system generations have been constructed, characterized by alternative initialization strategies; the later generations show a marked improvement in hindcast skill for surface temperature. Hindcast skill is also identified for multi-year-mean European summer surface temperatures, extra-tropical cyclone tracks, the Quasi-Biennial Oscillation, and ocean carbon uptake, among others. Regionalization maintains or slightly enhances the skill in European surface temperature inherited from the global model and also displays hindcast skill for wind-energy output. A new volcano code package permits rapid modification of the predictions in response to a future eruption.
MiKlip has demonstrated the efficacy of subjecting a single global prediction system to a major research effort. The benefits of this strategy include the rapid cycling through the prediction-system generations, the development of a sophisticated evaluation package usable by all MiKlip researchers, and regional applications of the global predictions. Open research questions include the optimal balance between model resolution and ensemble size, the appropriate method for constructing a prediction ensemble, and the decision between full-field and anomaly initialization.
Operational use of the MiKlip system is targeted for the end of the current decade, with a recommended generational cycle of two to three years
Commitment zu aktivem Daten- und -softwaremanagement in großen Forschungsverbünden
Wir erkennen die Wichtigkeit von Forschungsdaten und -software für unsere Forschungsprozesse an und ordnen die Veröffentlichung von Forschungsdaten und -software als wesentlichen Bestandteil der wissenschaftlichen Publikationstätigkeit ein. Dafür unterstützen wir als Verbund unsere Forschenden im Umgang mit Daten und Software nach den FAIR-Prinzipien in Einvernehmen mit dem DFG-Kodex “Leitlinien zur Sicherung guter wissenschaftlicher Praxis”. In Zusammenarbeit mit unseren Institutionen und Fachcommunities stellen wir adäquate Forschungsdatenmanagement-Werkzeuge und -Dienste bereit und befähigen unsere Forschenden zum Umgang damit. Dabei bauen wir vorzugsweise auf existierenden Angeboten auf und bemühen uns im Gegenzug um deren Anpassung an unsere Bedürfnisse. Wir streben Maßnahmen für die Definition und Sicherstellung der Qualität unserer Forschungsdaten und -software an. Wir verwenden vorzugsweise existierende Daten-/Metadatenstandards und vernetzen uns nach Möglichkeit für die Erstellung und Implementierung neuer Standards mit entsprechenden nationalen und internationalen Initiativen. Wir verfolgen die Entwicklungen im Bereich des Forschungsdaten- und -softwaremanagements und prüfen neu entstehende Empfehlungen und Richtlinien zeitnah auf ihre Umsetzbarkeit
Initialization and ensemble generation for decadal climate predictions: A comparison of different methods
Five initialization and ensemble generation methods are investigated with respect to their impact on the prediction skill of the German decadal prediction system "Mittelfristige Klimaprognose" (MiKlip). Among the tested methods, three tackle aspects of model‐consistent initialization using the ensemble Kalman filter (EnKF), the filtered anomaly initialization (FAI) and the initialization method by partially coupled spin‐up (MODINI). The remaining two methods alter the ensemble generation: the ensemble dispersion filter (EDF) corrects each ensemble member with the ensemble mean during model integration. And the bred vectors (BV) perturb the climate state using the fastest growing modes. The new methods are compared against the latest MiKlip system in the low‐resolution configuration (Preop‐LR), which uses lagging the climate state by a few days for ensemble generation and nudging toward ocean and atmosphere reanalyses for initialization. Results show that the tested methods provide an added value for the prediction skill as compared to Preop‐LR in that they improve prediction skill over the eastern and central Pacific and different regions in the North Atlantic Ocean. In this respect, the EnKF and FAI show the most distinct improvements over Preop‐LR for surface temperatures and upper ocean heat content, followed by the BV, the EDF and MODINI. However, no single method exists that is superior to the others with respect to all metrics considered. In particular, all methods affect the Atlantic Meridional Overturning Circulation in different ways, both with respect to the basin‐wide long‐term mean and variability, and with respect to the temporal evolution at the 26° N latitude