16 research outputs found

    Poison is Not Traceless: Fully-Agnostic Detection of Poisoning Attacks

    Full text link
    The performance of machine learning models depends on the quality of the underlying data. Malicious actors can attack the model by poisoning the training data. Current detectors are tied to either specific data types, models, or attacks, and therefore have limited applicability in real-world scenarios. This paper presents a novel fully-agnostic framework, DIVA (Detecting InVisible Attacks), that detects attacks solely relying on analyzing the potentially poisoned data set. DIVA is based on the idea that poisoning attacks can be detected by comparing the classifier's accuracy on poisoned and clean data and pre-trains a meta-learner using Complexity Measures to estimate the otherwise unknown accuracy on a hypothetical clean dataset. The framework applies to generic poisoning attacks. For evaluation purposes, in this paper, we test DIVA on label-flipping attacks.Comment: 8 page

    Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments

    Full text link
    Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, in this paper, we introduce Memento, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. Memento has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads. A demonstration of Memento is available at: https://wickerlab.org/publication/memento

    BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability

    Full text link
    Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model's output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario

    Hitting the target: stopping active learning at the cost-based optimum

    Get PDF
    Active learning allows machine learning models to be trained using fewer labels while retaining similar performance to traditional supervised learning. An active learner selects the most informative data points, requests their labels, and retrains itself. While this approach is promising, it raises the question of how to determine when the model is ‘good enough’ without the additional labels required for traditional evaluation. Previously, different stopping criteria have been proposed aiming to identify the optimal stopping point. Yet, optimality can only be expressed as a domain-dependent trade-off between accuracy and the number of labels, and no criterion is superior in all applications. As a further complication, a comparison of criteria for a particular real-world application would require practitioners to collect additional labelled data they are aiming to avoid by using active learning in the first place. This work enables practitioners to employ active learning by providing actionable recommendations for which stopping criteria are best for a given real-world scenario. We contribute the first large-scale comparison of stopping criteria for pool-based active learning, using a cost measure to quantify the accuracy/label trade-off, public implementations of all stopping criteria we evaluate, and an open-source framework for evaluating stopping criteria. Our research enables practitioners to substantially reduce labelling costs by utilizing the stopping criterion which best suits their domain

    The Swiss STAR trial – an evaluation of target groups for sexually transmitted infection screening in the sub-sample of women

    Get PDF
    OBJECTIVES: In Switzerland, universal health insurance does not cover any routine testing for sexually transmitted infections (STIs), not even in individuals at high risk, and extra-genital swabbing is not standard of care. We compared STI prevalence in a multicentre prospective observational cohort of multi-partner women with/without sex work and evaluated associated risk factors. MATERIALS AND METHODS: Between January 2016 and June 2017, we offered free STI testing to women with multiple sexual partners (three or more in the previous 12 months), with follow-up examinations every 6 months. We used multiplex polymerase chain-reaction testing (for Neisseria gonorrhoeae, Chlamydia trachomatis, Trichomonas vaginalis, Mycoplasma genitalium) for pooled swabs (pharynx, urethra/vagina, anus), and antibody tests for human immunodeficiency virus (HIV) and Treponema pallidum at every visit, and for hepatitis B and C at baseline. RESULTS: We screened 490 female sex workers (FSWs), including 17 trans women, and 92 other multi-partner women. More than half reported a steady partner. Previously undiagnosed HIV was found in 0.2% vs 0.0%, respectively, and T. pallidum antibodies in 5.9% vs 0.0%. STIs requiring antibiotic treatment comprised: active syphilis 1.2% vs 0.0%; N. gonorrhoeae 4.9% vs 0.0%; C. trachomatis 6.3% vs 5.4%, T. vaginalis 10.4% vs 0.0%; M. genitalium 6.7% vs 6.5%. One in four FSWs vs one in nine other women had one or more of these STIs at baseline. 15.8% vs 3.8% had a history of hepatitis B, 45.5% vs 22.8% had no immunity (HBs-AB <10 IU/l). Two FSWs had hepatitis C virus antibodies (0.4%) without concurrent HIV infection. Non-condom-use (last three months) for anal/vaginal sex was not associated with STIs. Independent risk factors were group sex (adjusted odds ratio [aOR] 2.1, 95% confidence interval [CI] 1.1–4.0), age less than 25 (aOR 3.7, 95% CI 1.6–8.9), and being active in sex work for less than 1 year (aOR 2.7, 95% CI 1.3–5.3). CONCLUSION: HIV and HCV do not appear to pose a major public health problem among FSWs in Switzerland, whereas vaccination against HBV should be promoted. FSWs showed high rates of STIs requiring treatment to reduce transmission to clients and/or steady partners. FSWs should be offered low-cost or free STI screening as a public health priority

    Fear of COVID-19 among homeless individuals in Germany in mid-2021.

    Get PDF
    AIMS: To investigate the prevalence and the correlates of fear of COVID-19 among homeless individuals. METHODS: We used data from the "national survey on psychiatric and somatic health of homeless individuals during the COVID-19 pandemic" (NAPSHI-study) which took place in several large cities in Germany in Mid-2021 (n = 666 in the analytical sample). Mean age equaled 43.3 years (SD: 12.1 years), ranging from 18 to 80 years. Multiple linear regressions were performed. RESULTS: In our study, 70.9% of the homeless individuals reported no fear of COVID-19. Furthermore, 14.0% reported a little fear of COVID-19, 8.4% reported some fear of COVID-19 and 6.7% reported severe fear of COVID-19. Multiple linear regressions revealed that fear of COVID-19 was higher among individuals aged 50-64 years (compared to individuals aged 18-29 years: β = 0.28, p < 0.05), among individuals with a higher perceived own risk of contracting the coronavirus 1 day (β = 0.28, p < 0.001) as well as among individuals with a higher agreement that a diagnosis of the coronavirus would ruin his/her life (β = 0.15, p < 0.001). CONCLUSIONS: Only a small proportion of homeless individuals reported fear of COVID-19 in mid-2021 in Germany. Such knowledge about the correlates of higher levels of fear of COVID-19 may be helpful for addressing certain risk groups (e.g., homeless individuals aged 50-64 years). In a further step, avoiding extraordinarily high levels of fear of COVID-19 may be beneficial to avoid irrational thinking and acting regarding COVID-19 in this group

    Determinants of health-related quality of life (HRQoL) among homeless individuals during the COVID-19 pandemic

    Get PDF
    OBJECTIVE: Thus far, there is very limited knowledge regarding homeless individuals during the COVID-19 pandemic, particularly related to the health-related quality of life (HRQoL). Thus, our aim was to evaluate HRQoL and to clarify the determinants of HRQoL among homeless individuals during the COVID-19 pandemic in Germany. METHODS: Data were taken from the national survey on psychiatric and somatic health of homeless individuals during the COVID-19 pandemic-NAPSHI (n = 616). The established EQ-5D-5L was used to quantify problems in five health dimensions, and its visual analogue scale (EQ-VAS) was used to record self-rated health status. Sociodemographic factors were included in regression analysis. RESULTS: Pain/discomfort was the most frequently reported problem (45.3%), thereafter anxiety/depression (35.9%), mobility (25.4%), usual activities (18.5%) and self-care (11.4%). Average EQ-VAS score was 68.97 (SD: 23.83), and the mean EQ-5D-5L index was 0.85 (SD: 0.24). Regressions showed that higher age and having a health insurance were associated with several problem dimensions. Being married was associated with higher EQ-VAS scores. CONCLUSIONS: Overall, our study findings showed a quite high HRQoL among homeless individuals during the COVID-19 pandemic in Germany. Some important determinants of HRQoL were identified (e.g., age or marital status). Longitudinal studies are required to confirm our findings

    Combatting over-specialization bias in growing chemical databases

    No full text
    Abstract Background Predicting in advance the behavior of new chemical compounds can support the design process of new products by directing the research toward the most promising candidates and ruling out others. Such predictive models can be data-driven using Machine Learning or based on researchers’ experience and depend on the collection of past results. In either case: models (or researchers) can only make reliable assumptions about compounds that are similar to what they have seen before. Therefore, consequent usage of these predictive models shapes the dataset and causes a continuous specialization shrinking the applicability domain of all trained models on this dataset in the future, and increasingly harming model-based exploration of the space. Proposed solution In this paper, we propose cancels (CounterActiNg Compound spEciaLization biaS), a technique that helps to break the dataset specialization spiral. Aiming for a smooth distribution of the compounds in the dataset, we identify areas in the space that fall short and suggest additional experiments that help bridge the gap. Thereby, we generally improve the dataset quality in an entirely unsupervised manner and create awareness of potential flaws in the data. cancels does not aim to cover the entire compound space and hence retains a desirable degree of specialization to a specified research domain. Results An extensive set of experiments on the use-case of biodegradation pathway prediction not only reveals that the bias spiral can indeed be observed but also that cancels produces meaningful results. Additionally, we demonstrate that mitigating the observed bias is crucial as it cannot only intervene with the continuous specialization process, but also significantly improves a predictor’s performance while reducing the number of required experiments. Overall, we believe that cancels can support researchers in their experimentation process to not only better understand their data and potential flaws, but also to grow the dataset in a sustainable way. All code is available under github.com/KatDost/Cancels

    AdductHunter: identifying protein-metal complex adducts in mass spectra

    No full text
    Abstract Mass spectrometry (MS) is an analytical technique for molecule identification that can be used for investigating protein-metal complex interactions. Once the MS data is collected, the mass spectra are usually interpreted manually to identify the adducts formed as a result of the interactions between proteins and metal-based species. However, with increasing resolution, dataset size, and species complexity, the time required to identify adducts and the error-prone nature of manual assignment have become limiting factors in MS analysis. AdductHunter is a open-source web-based analysis tool that  automates the peak identification process using constraint integer optimization to find feasible combinations of protein and fragments, and dynamic time warping to calculate the dissimilarity between the theoretical isotope pattern of a species and its experimental isotope peak distribution. Empirical evaluation on a collection of 22 unique MS datasetsshows fast and accurate identification of protein-metal complex adducts in deconvoluted mass spectra

    Clinical presentation and long-term outcome of patients with KCNJ11/ABCC8 variants: Neonatal diabetes or MODY in the DPV registry from Germany and Austria

    No full text
    Objective: To describe clinical presentation/longterm outcomes of patients with ABCC8/KCNJ11 variants in a large cohort of patients with diabetes. Research Design and Methods: We analyzed patients in the Diabetes Prospective Follow-up (DPV) registry with diabetes and pathogenic variants in the ABCC8/ KCNJ11 genes. For patients with available data at three specific time-points— classification as K+-channel variant, 2-year follow-up and most recent visit—the longitudinal course was evaluated in addition to the cross-sectional examination. Results: We identified 93 cases with ABCC8 (n = 54)/KCNJ11 (n = 39) variants, 63 of them with neonatal diabetes. For 22 patients, follow-up data were available. Of these, 19 were treated with insulin at diagnosis, and the majority of patients was switched to sulfonylurea thereafter. However, insulin was still administered in six patients at the most recent visit. Patients were in good metabolic control with a median (IQR) A1c level of 6.0% (5.5–6.7), that is, 42.1 (36.6–49.7) mmol/mol after 2 years and 6.7% (6.0– 8.0), that is, 49.7 (42.1–63.9) mmol/mol at the most recent visit. Five patients were temporarily without medication for a median (IQR) time of 4.0 (3.5–4.4) years, while two other patients continue to be off medication at the last follow-up. Conclusions: ABCC8/KCNJ11 variants should be suspected in children diagnosed with diabetes below the age of 6 months, as a high percentage can be switched from insulin to oral antidiabetic drugs. Thirty patients with diabetes due to pathogenic variants of ABCC8 or KCNJ11 were diagnosed beyond the neonatal period. Patients maintain good metabolic control even after a diabetes duration of up to 11 year
    corecore