27 research outputs found

    Data Shuffling Procedure for Masking Data

    Get PDF
    A method for data shuffling to preserve data confidentiality is provided. The method comprises masking of particular attributes of a dataset which are to be preserved in confidentiality, followed by a shuffling step comprising sorting the transformed dataset and a transformed confidential attribute in accordance with the same rank order criteria. For normally distributed datasets, transformation may be achieved by general additive data perturbation, followed by generating a normalized perturbed value of the confidential attribute using a conditional distribution of the confidential and non-confidential attribute. In another aspect, a software program for accomplishing the method of the present invention is provided. The method of the invention provides greater security and utility for the data, and increases user comfort by allowing use of the actual data without identifying the origin

    Fool\u27s Gold: An Illustrated Critique of Differential Privacy

    Get PDF
    Differential privacy has taken the privacy community by storm. Computer scientists developed this technique to allow researchers to submit queries to databases without being able to glean sensitive information about the individuals described in the data. Legal scholars champion differential privacy as a practical solution to the competing interests in research and confidentiality, and policymakers are poised to adopt it as the gold standard for data privacy. It would be a disastrous mistake. This Article provides an illustrated guide to the virtues and pitfalls of differential privacy. While the technique is suitable for a narrow set of research uses, the great majority of analyses would produce results that are beyond absurd--average income in the negative millions or correlations well above 1.0, for example. The legal community mistakenly believes that differential privacy can offer the benefits of data research without sacrificing privacy. In fact, differential privacy will usually produce either very wrong research results or very useless privacy protections. Policymakers and data stewards will have to rely on a mix of approaches--perhaps differential privacy where it is well suited to the task and other disclosure prevention techniques in the great majority of situations where it isn\u27t

    Appropriate sampling procedures for load research

    No full text
    Typescript (photocopy).The Public Utilities Regulatory Policies Act (PURPA) requires electric utility companies with annual sales exceeding 500 million kilowatt-hours of electricity to report specific types of load research data to the Federal Energy Regulatory Committee and the state Public Utility Commissions. One of the key requirements is the reporting of the estimated customer demand for electricity. PURPA also specifies that, in estimating the customer demand for electricity, the level of confidence must be at least 90% and the level of reliability must be 0.1 or less. In order to estimate the customer demand for electricity, electric utility companies must sample a group of its customer population. The Load Research Manual (1980), published by the United States Department of Energy, outlines a sampling procedure that could be used in load research. The procedure outlined by the manual assumes that the sampling distribution of the average annual demand for electricity is normally distributed. However, there is research evidence to indicate that the annual demand for electricity, scaled by the average annual demand, is best described by the Gamma, Weibull, or Log-Normal distribution (Liittschwager, 1971). Hence, for relatively small sample sizes, the assumption of normality of the sampling distribution of the sample mean will not be satisfied. Further, the procedure suggested by the Load Research Manual (1980) also fails to address the reliability requirement. The objective of this research is to provide an appropriate sampling procedure that can be used by electric utility companies to estimate the customer demand for electricity and to satisfy the requirements of PURPA. Specifically, using simulated distributions of electrical demand, the following key issues in estimating the customer demand for electricity were addressed: (1) The determination of the sample size necessary to satisfy the requirements of PURPA, (2) The determination of the effectiveness of stratified sampling procedures, (3) The determination of the effectiveness of using data transformations to normality and then using "normal theory" principles to determine sample size, and (4) The determination of the effectiveness of a statistical technique called bootstrapping to monitor changes in the customer population over time. The results indicate that the methodologies suggested by this study will indeed provide the electrical utility companies with a sampling procedure that is appropriate in estimating the customer demand for electricity according to PURPA requirements

    The Bootstrap Approach for Testing Skewness Persistence

    No full text
    This study presents a new methodology for testing changes in skewness between time periods (or samples) using the bootstrap method. A Monte Carlo simulation experiment was conducted to compare the effectiveness of the bootstrap method with the method suggested by Lau, Wingender and Lau (1989) to test skewness persistence. The results show the bootstrap method to be more powerful than the other method. The bootstrap method was also used to determine the persistence of skewness in stock returns. The results show that, in a large percentage of stocks, skewness persists over time.bootstrap method, skewness estimation, distribution of stock returns

    A Free Ride: Data Brokers\u27Rent-Seeking Behavior and the Future of Data Inequality

    Get PDF
    Historically, researchers obtained data from independent studies and government data. However, as public outcry for privacy regarding the government\u27s maintenance of data has increased, the discretionary release of government data has decreased or become so anonymized that its relevance is limited. Research necessarily requires access to complete and accurate data. As such, researchers are turning to data brokers for the same, and often more, data than they can obtain from the government. Data brokers base their products and services on data gathered from a variety of free public sources and via the government-created Internet. Data brokers then recategorize the existing free data and combine them with privately collected data. They sell the linked data at a profit while simultaneously preventing the public, whose data they sold, from learning how the data were gathered based on their trade secret protections. To the Authors\u27 knowledge, research has not explored data brokers\u27 rent-seeking behavior and how it will further inequality in accessing credible data--or data inequality. The Authors contend that without a federal mission to ensure cost-free access to personal data for research and public access purposes, data brokers\u27 sale of such data will potentially lead to biased or inaccurate research results. This development would further the interests of the educated wealthy at the expense of the general public. To resolve this growing data inequality, this Article recommends a variety of legal and voluntary solutions

    Evaluating laplace noise addition to satisfy differential privacy for numeric data

    No full text
    Abstract. Laplace noise addition is often advanced as an approach for satisfying differ-‐‑ ential privacy. There have been several illustrations of the application of Laplace noise addition for count data, but no evaluation of its performance for numeric data. In this study we evaluate the privacy and utility performance of Laplace noise addition for numeric data. Our results indicate that Laplace noise addition delivers the promised level of privacy only by adding a large quantity of noise for even relatively large sub-‐‑ sets. Because of this, even for simple mean queries, the responses for a masking mecha-‐‑ nism that uses Laplace noise addition is of little value. We also show that Laplace noise addition may be vulnerable to a tracker attack. In order to avoid this, it may be neces-‐‑ sary to increase the variance of the noise added as a function of the number of queries issued. This implies that the utility of the responses would be further reduced. These results raise serious questions regarding the viability of Laplace based noise addition for masking numeric data.
    corecore