2,870 research outputs found
Feedback-based integration of the whole process of data anonymization in a graphical interface
The interactive, web-based point-and-click application presented in this article, allows anonymizing data without any knowledge in a programming language. Anonymization in data mining, but creating safe, anonymized data is by no means a trivial task. Both the methodological issues as well as know-how from subject matter specialists should be taken into account when anonymizing data. Even though specialized software such as sdcMicro exists, it is often difficult for nonexperts in a particular software and without programming skills to actually anonymize datasets without an appropriate app. The presented app is not restricted to apply disclosure limitation techniques but rather facilitates the entire anonymization process. This interface allows uploading data to the system, modifying them and to create an object defining the disclosure scenario. Once such a statistical disclosure control (SDC) problem has been defined, users can apply anonymization techniques to this object and get instant feedback on the impact on risk and data utility after SDC methods have been applied. Additional features, such as an Undo Button, the possibility to export the anonymized dataset or the required code for reproducibility reasons, as well its interactive features, make it convenient both for experts and nonexperts in R – the free software environment for statistical computing and graphics – to protect a dataset using this app
Synthetic sequence generator for recommender systems - memory biased random walk on sequence multilayer network
Personalized recommender systems rely on each user's personal usage data in
the system, in order to assist in decision making. However, privacy policies
protecting users' rights prevent these highly personal data from being publicly
available to a wider researcher audience. In this work, we propose a memory
biased random walk model on multilayer sequence network, as a generator of
synthetic sequential data for recommender systems. We demonstrate the
applicability of the synthetic data in training recommender system models for
cases when privacy policies restrict clickstream publishing.Comment: The new updated version of the pape
Clinical Scores for Dyspnoea Severity in Children:A Prospective Validation Study
In acute dyspnoeic children, assessment of dyspnoea severity and treatment response is frequently based on clinical dyspnoea scores. Our study aim was to validate five commonly used paediatric dyspnoea scores.Fifty children aged 0-8 years with acute dyspnoea were clinically assessed before and after bronchodilator treatment, a subset of 27 children were videotaped and assessed twice by nine observers. The observers scored clinical signs necessary to calculate the Asthma Score (AS), Asthma Severity Score (ASS), Clinical Asthma Evaluation Score 2 (CAES-2), Pediatric Respiratory Assessment Measure (PRAM) and respiratory rate, accessory muscle use, decreased breath sounds (RAD).A total of 1120 observations were used to assess fourteen measurement properties within domains of validity, reliability and utility. All five dyspnoea scores showed overall poor results, scoring insufficiently on more than half of the quality criteria for measurement properties. The AS and PRAM were the most valid with good values on six and moderate values on three properties. Poor results were mainly due to insufficient measurement properties in the validity and reliability domains whereas utility properties were moderate to good in all scores.This study shows that commonly used dyspnoea scores show insufficient validity and reliability to allow for clinical use without caution
Assessing Maine’s ERAM experiment
Maine’s utility regulators have occasionally ventured into the uncharted waters of utility regulation reform. Some such efforts have been more successful than others. Leslie Hudson and Stephanie Seguino document the process and outcomes of one such attempt at alternative electric utility regulation, the Electric Revenue Adjustment Mechanism, or ERAM. They endeavor to answer several questions arising from this brief and failed, but interesting regulatory experiment
Peak-ratio analysis method for enhancement of LOM protection using M class PMUs
A novel technique for loss of mains (LOM) detection, using Phasor Measurement Unit (PMU) data, is described in this paper. The technique, known as the Peak Ratio Analysis Method (PRAM), improves both sensitivity and stability of LOM protection when compared to prevailing techniques. The technique is based on a Rate of Change of Frequency (ROCOF) measurement from M-class PMUs, but the key novelty of the method lies in the fact that it employs a new “peak-ratio” analysis of the measured ROCOF waveform during any frequency disturbance to determine whether the potentially-islanded element of the network is grid connected or not. The proposed technique is described and several examples of its operation are compared with three competing LOM protection methods that have all been widely used by industry and/or reported in the literature: standard ROCOF, Phase Offset Relay (POR) and Phase Angle Difference (PAD) methods. It is shown that the PRAM technique exhibits comparable performance to the others, and in many cases improves upon their abilities, in particular for systems where the inertia of the main power system is reduced, which may arise in future systems with increased penetrations of renewable generation and HVDC infeed
Avoiding disclosure of individually identifiable health information: a literature review
Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility
- …