31 research outputs found

    A Typology to Explore the Mitigation of Shortcut Behavior

    Full text link
    As machine learning models become increasingly larger, trained weakly supervised on large, possibly uncurated data sets, it becomes increasingly important to establish mechanisms for inspecting, interacting, and revising models to mitigate learning shortcuts and guarantee their learned knowledge is aligned with human knowledge. The recently proposed XIL framework was developed for this purpose, and several such methods have been introduced, each with individual motivations and methodological details. In this work, we provide a unification of various XIL methods into a single typology by establishing a common set of basic modules. In doing so, we pave the way for a principled comparison of existing, but, importantly, also future XIL approaches. In addition, we discuss existing and introduce novel measures and benchmarks for evaluating the overall abilities of a XIL method. Given this extensive toolbox, including our typology, measures, and benchmarks, we finally compare several recent XIL methods methodologically and quantitatively. In our evaluations, all methods prove to revise a model successfully. However, we found remarkable differences in individual benchmark tasks, revealing valuable application-relevant aspects for integrating these benchmarks in developing future methods

    Revision Transformers: Instructing Language Models to Change their Values

    Full text link
    Current transformer language models (LM) are large-scale models with billions of parameters. They have been shown to provide high performances on a variety of tasks but are also prone to shortcut learning and bias. Addressing such incorrect model behavior via parameter adjustments is very costly. This is particularly problematic for updating dynamic concepts, such as moral values, which vary culturally or interpersonally. In this work, we question the current common practice of storing all information in the model parameters and propose the Revision Transformer (RiT) to facilitate easy model updating. The specific combination of a large-scale pre-trained LM that inherently but also diffusely encodes world knowledge with a clear-structured revision engine makes it possible to update the model's knowledge with little effort and the help of user interaction. We exemplify RiT on a moral dataset and simulate user feedback demonstrating strong performance in model revision even with small data. This way, users can easily design a model regarding their preferences, paving the way for more transparent AI models

    Boosting Object Representation Learning via Motion and Object Continuity

    Full text link
    Recent unsupervised multi-object detection models have shown impressive performance improvements, largely attributed to novel architectural inductive biases. Unfortunately, they may produce suboptimal object encodings for downstream tasks. To overcome this, we propose to exploit object motion and continuity, i.e., objects do not pop in and out of existence. This is accomplished through two mechanisms: (i) providing priors on the location of objects through integration of optical flow, and (ii) a contrastive object continuity loss across consecutive image frames. Rather than developing an explicit deep architecture, the resulting Motion and Object Continuity (MOC) scheme can be instantiated using any baseline object detection model. Our results show large improvements in the performances of a SOTA model in terms of object discovery, convergence speed and overall latent object representations, particularly for playing Atari games. Overall, we show clear benefits of integrating motion and object continuity for downstream tasks, moving beyond object representation learning based only on reconstruction.Comment: 8 pages main text, 32 tables, 21 Figure

    V-LoL: A Diagnostic Dataset for Visual Logical Learning

    Full text link
    Despite the successes of recent developments in visual AI, different shortcomings still exist; from missing exact logical reasoning, to abstract generalization abilities, to understanding complex and noisy scenes. Unfortunately, existing benchmarks, were not designed to capture more than a few of these aspects. Whereas deep learning datasets focus on visually complex data but simple visual reasoning tasks, inductive logic datasets involve complex logical learning tasks, however, lack the visual component. To address this, we propose the visual logical learning dataset, V-LoL, that seamlessly combines visual and logical challenges. Notably, we introduce the first instantiation of V-LoL, V-LoL-Trains, -- a visual rendition of a classic benchmark in symbolic AI, the Michalski train problem. By incorporating intricate visual scenes and flexible logical reasoning tasks within a versatile framework, V-LoL-Trains provides a platform for investigating a wide range of visual logical learning challenges. We evaluate a variety of AI systems including traditional symbolic AI, neural AI, as well as neuro-symbolic AI. Our evaluations demonstrate that even state-of-the-art AI faces difficulties in dealing with visual logical learning challenges, highlighting unique advantages and limitations specific to each methodology. Overall, V-LoL opens up new avenues for understanding and enhancing current abilities in visual logical learning for AI systems

    MiKlip - a National Research Project on Decadal Climate Prediction

    Get PDF
    A German national project coordinates research on improving a global decadal climate prediction system for future operational use. MiKlip, an eight-year German national research project on decadal climate prediction, is organized around a global prediction system comprising the climate model MPI-ESM together with an initialization procedure and a model evaluation system. This paper summarizes the lessons learned from MiKlip so far; some are purely scientific, others concern strategies and structures of research that targets future operational use. Three prediction-system generations have been constructed, characterized by alternative initialization strategies; the later generations show a marked improvement in hindcast skill for surface temperature. Hindcast skill is also identified for multi-year-mean European summer surface temperatures, extra-tropical cyclone tracks, the Quasi-Biennial Oscillation, and ocean carbon uptake, among others. Regionalization maintains or slightly enhances the skill in European surface temperature inherited from the global model and also displays hindcast skill for wind-energy output. A new volcano code package permits rapid modification of the predictions in response to a future eruption. MiKlip has demonstrated the efficacy of subjecting a single global prediction system to a major research effort. The benefits of this strategy include the rapid cycling through the prediction-system generations, the development of a sophisticated evaluation package usable by all MiKlip researchers, and regional applications of the global predictions. Open research questions include the optimal balance between model resolution and ensemble size, the appropriate method for constructing a prediction ensemble, and the decision between full-field and anomaly initialization. Operational use of the MiKlip system is targeted for the end of the current decade, with a recommended generational cycle of two to three years

    Commitment zu aktivem Daten- und -softwaremanagement in großen Forschungsverbünden

    Get PDF
    Wir erkennen die Wichtigkeit von Forschungsdaten und -software für unsere Forschungsprozesse an und ordnen die Veröffentlichung von Forschungsdaten und -software als wesentlichen Bestandteil der wissenschaftlichen Publikationstätigkeit ein. Dafür unterstützen wir als Verbund unsere Forschenden im Umgang mit Daten und Software nach den FAIR-Prinzipien in Einvernehmen mit dem DFG-Kodex “Leitlinien zur Sicherung guter wissenschaftlicher Praxis”. In Zusammenarbeit mit unseren Institutionen und Fachcommunities stellen wir adäquate Forschungsdatenmanagement-Werkzeuge und -Dienste bereit und befähigen unsere Forschenden zum Umgang damit. Dabei bauen wir vorzugsweise auf existierenden Angeboten auf und bemühen uns im Gegenzug um deren Anpassung an unsere Bedürfnisse. Wir streben Maßnahmen für die Definition und Sicherstellung der Qualität unserer Forschungsdaten und -software an. Wir verwenden vorzugsweise existierende Daten-/Metadatenstandards und vernetzen uns nach Möglichkeit für die Erstellung und Implementierung neuer Standards mit entsprechenden nationalen und internationalen Initiativen. Wir verfolgen die Entwicklungen im Bereich des Forschungsdaten- und -softwaremanagements und prüfen neu entstehende Empfehlungen und Richtlinien zeitnah auf ihre Umsetzbarkeit

    Initialization and ensemble generation for decadal climate predictions: A comparison of different methods

    Get PDF
    Five initialization and ensemble generation methods are investigated with respect to their impact on the prediction skill of the German decadal prediction system "Mittelfristige Klimaprognose" (MiKlip). Among the tested methods, three tackle aspects of model‐consistent initialization using the ensemble Kalman filter (EnKF), the filtered anomaly initialization (FAI) and the initialization method by partially coupled spin‐up (MODINI). The remaining two methods alter the ensemble generation: the ensemble dispersion filter (EDF) corrects each ensemble member with the ensemble mean during model integration. And the bred vectors (BV) perturb the climate state using the fastest growing modes. The new methods are compared against the latest MiKlip system in the low‐resolution configuration (Preop‐LR), which uses lagging the climate state by a few days for ensemble generation and nudging toward ocean and atmosphere reanalyses for initialization. Results show that the tested methods provide an added value for the prediction skill as compared to Preop‐LR in that they improve prediction skill over the eastern and central Pacific and different regions in the North Atlantic Ocean. In this respect, the EnKF and FAI show the most distinct improvements over Preop‐LR for surface temperatures and upper ocean heat content, followed by the BV, the EDF and MODINI. However, no single method exists that is superior to the others with respect to all metrics considered. In particular, all methods affect the Atlantic Meridional Overturning Circulation in different ways, both with respect to the basin‐wide long‐term mean and variability, and with respect to the temporal evolution at the 26° N latitude
    corecore