14,476 research outputs found
Conformal Prediction for Federated Uncertainty Quantification Under Label Shift
Federated Learning (FL) is a machine learning framework where many clients
collaboratively train models while keeping the training data decentralized.
Despite recent advances in FL, the uncertainty quantification topic (UQ)
remains partially addressed. Among UQ methods, conformal prediction (CP)
approaches provides distribution-free guarantees under minimal assumptions. We
develop a new federated conformal prediction method based on quantile
regression and take into account privacy constraints. This method takes
advantage of importance weighting to effectively address the label shift
between agents and provides theoretical guarantees for both valid coverage of
the prediction sets and differential privacy. Extensive experimental studies
demonstrate that this method outperforms current competitors.Comment: ICML 202
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
Use of record-linkage to handle non-response and improve alcohol consumption estimates in health survey data: a study protocol
<p>Introduction: Reliable estimates of health-related behaviours, such as levels of alcohol consumption in the population, are required to formulate and evaluate policies. National surveys provide such data; validity depends on generalisability, but this is threatened by declining response levels. Attempts to address bias arising from non-response are typically limited to survey weights based on sociodemographic characteristics, which do not capture differential health and related behaviours within categories. This project aims to explore and address non-response bias in health surveys with a focus on alcohol consumption.</p>
<p>Methods and analysis: The Scottish Health Surveys (SHeS) aim to provide estimates representative of the Scottish population living in private households. Survey data of consenting participants (92% of the achieved sample) have been record-linked to routine hospital admission (Scottish Morbidity Records (SMR)) and mortality (from National Records of Scotland (NRS)) data for surveys conducted in 1995, 1998, 2003, 2008, 2009 and 2010 (total adult sample size around 40 000), with maximum follow-up of 16 years. Also available are census information and SMR/NRS data for the general population. Comparisons of alcohol-related mortality and hospital admission rates in the linked SHeS-SMR/NRS with those in the general population will be made. Survey data will be augmented by quantification of differences to refine alcohol consumption estimates through the application of multiple imputation or inverse probability weighting. The resulting corrected estimates of population alcohol consumption will enable superior policy evaluation. An advanced weighting procedure will be developed for wider use.</p>
<p>Ethics and dissemination: Ethics approval for SHeS has been given by the National Health Service (NHS) Multi-Centre Research Ethics Committee and use of linked data has been approved by the Privacy Advisory Committee to the Board of NHS National Services Scotland and Registrar General. Funding has been granted by the MRC. The outputs will include four or five public health and statistical methodological international journal and conference papers.</p>
What is wrong with non-respondents? Alcohol-, drug- and smoking related mortality and morbidity in a 12-year follow up study of respondents and non-respondents in the Danish Health and Morbidity Survey
Aim:
Response rates in health surveys have diminished over the last two decades, making it difficult to obtain reliable information on health and health-related risk factors in different population groups. This study compared cause-specific mortality and morbidity among survey respondents and different types of non-respondents to estimate alcohol-, drug- and smoking related mortality and morbidity among non-respondents.
Design:
Prospective follow-up study of respondents and non-respondents in two cross-sectional health surveys.
Setting:
Denmark.
Participants:
A total sample of 39,540 Danish citizens aged 16 or older.
Measurements:
Register-based information on cause-specific mortality and morbidity at the individual level was obtained for respondents (n=28,072) and different types of non-respondents (refusals n=8,954; illness/disabled n=731, uncontactable n=1,593). Cox proportional hazards models were used to examine differences in alcohol-, drug- and smoking-related mortality and morbidity, respectively, in a 12 year follow-up period.
Findings:
Overall, non-response was associated with a significantly increased hazard ratio of 1.56 (95% CI: 1.36–1.78) for alcohol-related morbidity, 1.88 (95% CI: 1.38-2.57) for alcohol-related mortality, 1.55 (95% CI: 1.27–1.88) for drug-related morbidity, 3.04 (95% CI: 1.57–5.89) for drug-related mortality and 1.15 (95% CI: 1.03–1.29) for smoking-related morbidity. The hazard ratio for smoking-related mortality also tended to be higher among non-respondents compared with respondents although no significant association was evident (HR: 1.14; 95% CI: 0.95-1.36). Uncontactable and ill/disabled non-respondents generally had a higher hazard ratio of alcohol-, drug- and smoking related mortality and morbidity compared with refusal non-respondents.
Conclusion:
Health survey non-respondents in Denmark have an increased hazard ratio of alcohol-, drug-, and smoking-related mortality and morbidity compared with respondents, which may indicate more unfavourable health behaviours among non-respondents
Explaining Data-Driven Decisions made by AI Systems: The Counterfactual Approach
We examine counterfactual explanations for explaining the decisions made by
model-based AI systems. The counterfactual approach we consider defines an
explanation as a set of the system's data inputs that causally drives the
decision (i.e., changing the inputs in the set changes the decision) and is
irreducible (i.e., changing any subset of the inputs does not change the
decision). We (1) demonstrate how this framework may be used to provide
explanations for decisions made by general, data-driven AI systems that may
incorporate features with arbitrary data types and multiple predictive models,
and (2) propose a heuristic procedure to find the most useful explanations
depending on the context. We then contrast counterfactual explanations with
methods that explain model predictions by weighting features according to their
importance (e.g., SHAP, LIME) and present two fundamental reasons why we should
carefully consider whether importance-weight explanations are well-suited to
explain system decisions. Specifically, we show that (i) features that have a
large importance weight for a model prediction may not affect the corresponding
decision, and (ii) importance weights are insufficient to communicate whether
and how features influence decisions. We demonstrate this with several concise
examples and three detailed case studies that compare the counterfactual
approach with SHAP to illustrate various conditions under which counterfactual
explanations explain data-driven decisions better than importance weights
- …