Search CORE

155 research outputs found

NCL - a workhorse for data analysis and visualization in climate research

Author: Böttinger M.
Haley M.
Meier-Fleischer K.
Publication venue
Publication date: 01/01/2015
Field of study

Shortcut Removal for Improved OOD-Generalization

Author: Böttinger Konstantin
Jacobs Jochen
Müller Nicolas M.
Williams Jennifer
Publication venue
Publication date: 24/11/2022
Field of study

Machine learning is a data-driven discipline, and learning success is largely dependent on the quality of the underlying data sets. However, it is becoming increasingly clear that even high performance on held-out test data does not necessarily mean that a model generalizes or learns anything meaningful at all. One reason for this is the presence of machine learning shortcuts, i.e., hints in the data that are predictive but accidental and semantically unconnected to the problem. We present a new approach to detect such shortcuts and a technique to automatically remove them from datasets. Using an adversarially trained lens, any small and highly predictive clues in images can be detected and removed. We show that this approach 1) does not cause degradation of model performance in the absence of these shortcuts, and 2) reliably identifies and neutralizes shortcuts from different image datasets. In our experiments, we are able to recover up to 93,8% of model performance in the presence of different shortcuts. Finally, we apply our model to a real-world dataset from the medical domain consisting of chest x-rays and identify and remove several types of shortcuts that are known to hinder real-world applicability. Thus, we hope that our proposed approach fosters real-world applicability of machine learning

arXiv.org e-Print Archive

MEVA - An interactive visualization application for validation of multifaceted meteorological data with multiple 3D devices

Author: Bauer H.
Bilke L.
Böttinger M.
Helbig C.
Kolditz O.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

To achieve more realistic simulations, meteorologists develop and use models with increasing spatial and temporal resolution. The analyzing, comparing, and visualizing of resulting simulations becomes more and more challenging due to the growing amounts and multifaceted character of the data. Various data sources, numerous variables and multiple simulations lead to a complex database. Although a variety of software exists suited for the visualization of meteorological data, none of them fulfills all of the typical domain-specific requirements: support for quasi-standard data formats and different grid types, standard visualization techniques for scalar and vector data, visualization of the context (e.g., topography) and other static data, support for multiple presentation devices used in modern sciences (e.g., virtual reality), a user-friendly interface, and suitability for cooperative work

CiteSeerX

Directory of Open Access Journals

PubMed Central

MPG.PuRe

Shortcut Detection with Variational Autoencoders

Author: Böttinger Konstantin
Khan Shahbaz
Müller Nicolas M.
Roschmann Simon
Sperl Philip
Publication venue
Publication date: 21/07/2023
Field of study

For real-world applications of machine learning (ML), it is essential that models make predictions based on well-generalizing features rather than spurious correlations in the data. The identification of such spurious correlations, also known as shortcuts, is a challenging problem and has so far been scarcely addressed. In this work, we present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs). The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before.Comment: Accepted at the ICML 2023 Workshop on Spurious Correlations, Invariance and Stabilit

arXiv.org e-Print Archive

Protecting Publicly Available Data With Machine Learning Shortcuts

Author: Burgert Maximilian
Böttinger Konstantin
Debus Pascal
Müller Nicolas M.
Sperl Philip
Williams Jennifer
Publication venue
Publication date: 30/10/2023
Field of study

Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.Comment: Published at BMVC 202

arXiv.org e-Print Archive

Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

Author: Böttinger Konstantin
Canals Roman
Czempin Pavel
Dieckmann Franziska
Müller Nicolas M.
Williams Jennifer
Publication venue
Publication date: 23/06/2021
Field of study

We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenomenon and its impact in depth. We compare several types of models trained on a) only the duration of the leading silence and b) only on the duration of leading and trailing silence. Results show that models trained on only the duration of the leading silence perform particularly well, and achieve up to 85% percent accuracy and an equal error rate (EER) of 15.1%. At the same time, we observe that trimming silence during pre-processing and then training established antispoofing models using signal-based features leads to comparatively worse performance. In that case, EER increases from 3.6% (with silence) to 15.5% (trimmed silence). Our findings suggest that previous work may, in part, have inadvertently learned thespoof/bonafide distinction by relying on the duration of silence as it appears in the official challenge dataset. We discuss the potential consequences that this has for interpreting system scores in the challenge and discuss how the ASV community may further consider this issue

arXiv.org e-Print Archive

Southampton (e-Prints Soton)

DYAMOND++: A high resolution climate model setup

Author: Brownlee C.
Böttinger M.
Esch M.
Hohenegger C.
Mauritsen T.
Migliore M.
Redler R.
Röber N.
Stevens B.
Ziemen F.
Publication venue
Publication date: 01/01/2020
Field of study

MPG.PuRe

Mendelian randomization indicates causal effects of estradiol levels on kidney function in males

Author: Alexander Teumer
Alexander Teumer
Alexander Teumer
Claudia Schurmann
Erwin P. Böttinger
Erwin P. Böttinger
M. Kamal Nasr
M. Kamal Nasr
M. Kamal Nasr
M. Kamal Nasr
Publication venue: Frontiers Media S.A.
Publication date: 01/12/2023
Field of study

ContextChronic kidney disease (CKD) is a public health burden worldwide. Epidemiological studies observed an association between sex hormones, including estradiol, and kidney function.ObjectiveWe conducted a Mendelian randomization (MR) study to assess a possible causal effect of estradiol levels on kidney function in males and females.DesignWe performed a bidirectional two-sample MR using published genetic associations of serum levels of estradiol in men (n = 206,927) and women (n = 229,966), and of kidney traits represented by estimated glomerular filtration rate (eGFR, n = 567,460), urine albumin-to-creatinine ratio (UACR, n = 547,361), and CKD (n = 41,395 cases and n = 439,303 controls) using data obtained from the CKDGen Consortium. Additionally, we conducted a genome-wide association study using UK Biobank cohort study data (n = 11,798 men and n = 6,835 women) to identify novel genetic associations with levels of estradiol, and then used these variants as instruments in a one-sample MR.ResultsThe two-sample MR indicated that genetically predicted estradiol levels are significantly associated with eGFR in men (beta = 0.077; p = 5.2E-05). We identified a single locus at chromosome 14 associated with estradiol levels in men being significant in the one-sample MR on eGFR (beta = 0.199; p = 0.017). We revealed significant results with eGFR in postmenopausal women and with UACR in premenopausal women, which did not reach statistical significance in the sensitivity MR analyses. No causal effect of eGFR or UACR on estradiol levels was found.ConclusionsWe conclude that serum estradiol levels may have a causal effect on kidney function. Our MR results provide starting points for studies to develop therapeutic strategies to reduce kidney disease

Directory of Open Access Journals