155 research outputs found

    Shortcut Removal for Improved OOD-Generalization

    Full text link
    Machine learning is a data-driven discipline, and learning success is largely dependent on the quality of the underlying data sets. However, it is becoming increasingly clear that even high performance on held-out test data does not necessarily mean that a model generalizes or learns anything meaningful at all. One reason for this is the presence of machine learning shortcuts, i.e., hints in the data that are predictive but accidental and semantically unconnected to the problem. We present a new approach to detect such shortcuts and a technique to automatically remove them from datasets. Using an adversarially trained lens, any small and highly predictive clues in images can be detected and removed. We show that this approach 1) does not cause degradation of model performance in the absence of these shortcuts, and 2) reliably identifies and neutralizes shortcuts from different image datasets. In our experiments, we are able to recover up to 93,8% of model performance in the presence of different shortcuts. Finally, we apply our model to a real-world dataset from the medical domain consisting of chest x-rays and identify and remove several types of shortcuts that are known to hinder real-world applicability. Thus, we hope that our proposed approach fosters real-world applicability of machine learning

    MEVA - An interactive visualization application for validation of multifaceted meteorological data with multiple 3D devices

    No full text
    To achieve more realistic simulations, meteorologists develop and use models with increasing spatial and temporal resolution. The analyzing, comparing, and visualizing of resulting simulations becomes more and more challenging due to the growing amounts and multifaceted character of the data. Various data sources, numerous variables and multiple simulations lead to a complex database. Although a variety of software exists suited for the visualization of meteorological data, none of them fulfills all of the typical domain-specific requirements: support for quasi-standard data formats and different grid types, standard visualization techniques for scalar and vector data, visualization of the context (e.g., topography) and other static data, support for multiple presentation devices used in modern sciences (e.g., virtual reality), a user-friendly interface, and suitability for cooperative work

    Shortcut Detection with Variational Autoencoders

    Full text link
    For real-world applications of machine learning (ML), it is essential that models make predictions based on well-generalizing features rather than spurious correlations in the data. The identification of such spurious correlations, also known as shortcuts, is a challenging problem and has so far been scarcely addressed. In this work, we present a novel approach to detect shortcuts in image and audio datasets by leveraging variational autoencoders (VAEs). The disentanglement of features in the latent space of VAEs allows us to discover feature-target correlations in datasets and semi-automatically evaluate them for ML shortcuts. We demonstrate the applicability of our method on several real-world datasets and identify shortcuts that have not been discovered before.Comment: Accepted at the ICML 2023 Workshop on Spurious Correlations, Invariance and Stabilit

    Protecting Publicly Available Data With Machine Learning Shortcuts

    Full text link
    Machine-learning (ML) shortcuts or spurious correlations are artifacts in datasets that lead to very good training and test performance but severely limit the model's generalization capability. Such shortcuts are insidious because they go unnoticed due to good in-domain test performance. In this paper, we explore the influence of different shortcuts and show that even simple shortcuts are difficult to detect by explainable AI methods. We then exploit this fact and design an approach to defend online databases against crawlers: providers such as dating platforms, clothing manufacturers, or used car dealers have to deal with a professionalized crawling industry that grabs and resells data points on a large scale. We show that a deterrent can be created by deliberately adding ML shortcuts. Such augmented datasets are then unusable for ML use cases, which deters crawlers and the unauthorized use of data from the internet. Using real-world data from three use cases, we show that the proposed approach renders such collected data unusable, while the shortcut is at the same time difficult to notice in human perception. Thus, our proposed approach can serve as a proactive protection against illegitimate data crawling.Comment: Published at BMVC 202

    Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn?

    Full text link
    We present our analysis of a significant data artifact in the official 2019/2021 ASVspoof Challenge Dataset. We identify an uneven distribution of silence duration in the training and test splits, which tends to correlate with the target prediction label. Bonafide instances tend to have significantly longer leading and trailing silences than spoofed instances. In this paper, we explore this phenomenon and its impact in depth. We compare several types of models trained on a) only the duration of the leading silence and b) only on the duration of leading and trailing silence. Results show that models trained on only the duration of the leading silence perform particularly well, and achieve up to 85% percent accuracy and an equal error rate (EER) of 15.1%. At the same time, we observe that trimming silence during pre-processing and then training established antispoofing models using signal-based features leads to comparatively worse performance. In that case, EER increases from 3.6% (with silence) to 15.5% (trimmed silence). Our findings suggest that previous work may, in part, have inadvertently learned thespoof/bonafide distinction by relying on the duration of silence as it appears in the official challenge dataset. We discuss the potential consequences that this has for interpreting system scores in the challenge and discuss how the ASV community may further consider this issue

    Mendelian randomization indicates causal effects of estradiol levels on kidney function in males

    Get PDF
    ContextChronic kidney disease (CKD) is a public health burden worldwide. Epidemiological studies observed an association between sex hormones, including estradiol, and kidney function.ObjectiveWe conducted a Mendelian randomization (MR) study to assess a possible causal effect of estradiol levels on kidney function in males and females.DesignWe performed a bidirectional two-sample MR using published genetic associations of serum levels of estradiol in men (n = 206,927) and women (n = 229,966), and of kidney traits represented by estimated glomerular filtration rate (eGFR, n = 567,460), urine albumin-to-creatinine ratio (UACR, n = 547,361), and CKD (n = 41,395 cases and n = 439,303 controls) using data obtained from the CKDGen Consortium. Additionally, we conducted a genome-wide association study using UK Biobank cohort study data (n = 11,798 men and n = 6,835 women) to identify novel genetic associations with levels of estradiol, and then used these variants as instruments in a one-sample MR.ResultsThe two-sample MR indicated that genetically predicted estradiol levels are significantly associated with eGFR in men (beta = 0.077; p = 5.2E-05). We identified a single locus at chromosome 14 associated with estradiol levels in men being significant in the one-sample MR on eGFR (beta = 0.199; p = 0.017). We revealed significant results with eGFR in postmenopausal women and with UACR in premenopausal women, which did not reach statistical significance in the sensitivity MR analyses. No causal effect of eGFR or UACR on estradiol levels was found.ConclusionsWe conclude that serum estradiol levels may have a causal effect on kidney function. Our MR results provide starting points for studies to develop therapeutic strategies to reduce kidney disease
    corecore