110 research outputs found

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S. statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy

    Preserving user privacy in social media data processing

    Get PDF
    Social media data is used for analytics, e.g., in science, authorities or the industry. Privacy is often considered a secondary problem. However, protecting the privacy of social media users is demanded by laws and ethics. In order to prevent subsequent abuse, theft or public exposure of collected datasets, privacy-aware data processing is crucial. This dissertation presents a concept to process social media data with social media userā€™s privacy in mind. It features a data storage concept based on the cardinality estimator HyperLogLog to store social media data, so that it is not possible to extract individual items from it, but only to estimate the cardinality of items within a certain set, plus running set operations over multiple sets to extend analytical ranges. Applying this method requires to define the scope of the result before even gathering the data. This prevents the data from being misused for other purposes at a later point in time and thus follows the privacy by design principles. This work further shows methods to increase privacy through the implementation of abstraction layers. An included case study demonstrates the presented methods to be suitable for application in the field.:1 Introduction 1.1 Problem 1.2 Research objectives 1.3 Document structure 2 Related work 2.1 The notion of privacy 2.2 Privacy by design 2.3 Differential privacy 2.4 Geoprivacy 2.5 Probabilistic Data Structures 3 Concept and methods 3.1 Collateral data 3.2 Disposable data 3.3 Cardinality estimation 3.4 Data precision 3.5 Extendability 3.6 Abstraction 3.7 Time consideration 4 Summary of publications 4.1 HyperLogLog Introduction 4.2 VOST Case Study 4.3 Real-time Streaming 4.4 Abstraction Layers 4.5 VGIscience Book Chapter 4.6 Supplementary Software Materials 5 Discussion 5.1 Prevent accidental data disclosure 5.2 Feasibility in the field 5.3 Adjustability for different use cases 5.4 Limitations of HLL 5.5 Security 5.6 Outlook and further research 6 Conclusion Appendix References Publication

    Sales and Demographic Data Visualization, Analysis and Forecasting

    Get PDF
    Computerized data collection is constantly increasing. With technological progress more data is becoming available at any time on public databases. Also, private companies are collecting more data. Having thousands of observations in data sets makes it impossible for human to grasp trends and patterns. This raises a need for data mining and visualization for business intelligence. In order to optimize the use of sales territories and support companies growth it is important to understand the underlying patterns and associations between sales results and demographics. This thesis aims to accomplish three main objectives. Firstly, develop a web service to visualize demographic and sales data. Secondly, analyze demographics and sales data obtained, from company to get insight if success in sales is determined by placing representatives in ā€œgoodā€ areas, or are there other factors that might predict success. The third aim is to create a predictive model that could predict sales results

    A Taxonomy of Privacy-Preserving Record Linkage Techniques

    Get PDF
    The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research

    An Economic Analysis of Privacy Protection and Statistical Accuracy as Social Choices

    Get PDF
    Statistical agencies face a dual mandate to publish accurate statistics while protecting respondent privacy. Increasing privacy protection requires decreased accuracy. Recognizing this as a resource allocation problem, we propose an economic solution: operate where the marginal cost of increasing privacy equals the marginal benefit. Our model of production, from computer science, assumes data are published using an efficient differentially private algorithm. Optimal choice weighs the demand for accurate statistics against the demand for privacy. Examples from U.S.\ statistical programs show how our framework can guide decision-making. Further progress requires a better understanding of willingness-to-pay for privacy and statistical accuracy.Comment: Forthcoming in American Economic Revie

    New Directions in the Development of Population Estimates in the United States?

    Get PDF
    The advent of a continuously updated Master Area File (MAF) following the 2000 census represents an information resource that can be tapped for purposes of developing timely, cost-effective, and precise population estimates for even the smallest of geographical units (e.g., census blocks). We argue that the MAF can be enhanced (EMAF) for these purposes. In support of our argument we describe a set of activities needed to develop EMAF, each of which is well within the current capabilities of the U.S. Census Bureau and discuss various costs and benefits of each. We also describe how EMAF would provide population estimates containing a wide range of demographic (e.g., age, race, and sex) and socio-economic characteristics (e.g., educational attainment, income, and employment). As such, it could largely negate and eliminate the need for many of the traditional demographic methods of population estimation and possibly reduce the number of sample surveys. We identify important challenges that must be surmounted in order to realize EMAF and make suggestions for doing so. We conclude by noting that the idea of the EMAF could be of interest to other countries with MAF files and strong administrative records systems that, like the United States, are facing the challenge of producing good population information in the face of increasing census costs

    Guide to Social Science Data Preparation and Archiving: Best Practice Throughout the Data Life Cycle

    Full text link
    http://deepblue.lib.umich.edu/bitstream/2027.42/134032/1/dataprep.pdfDescription of dataprep.pdf : Boo

    Advancing security information and event management frameworks in managed enterprises using geolocation

    Get PDF
    Includes bibliographical referencesSecurity Information and Event Management (SIEM) technology supports security threat detection and response through real-time and historical analysis of security events from a range of data sources. Through the retrieval of mass feedback from many components and security systems within a computing environment, SIEMs are able to correlate and analyse events with a view to incident detection. The hypothesis of this study is that existing Security Information and Event Management techniques and solutions can be complemented by location-based information provided by feeder systems. In addition, and associated with the introduction of location information, it is hypothesised that privacy-enforcing procedures on geolocation data in SIEMs and meta- systems alike are necessary and enforceable. The method for the study was to augment a SIEM, established for the collection of events in an enterprise service management environment, with geo-location data. Through introducing the location dimension, it was possible to expand the correlation rules of the SIEM with location attributes and to see how this improved security confidence. An important co-consideration is the effect on privacy, where location information of an individual or system is propagated to a SIEM. With a theoretical consideration of the current privacy directives and regulations (specifically as promulgated in the European Union), privacy supporting techniques are introduced to diminish the accuracy of the location information - while still enabling enhanced security analysis. In the context of a European Union FP7 project relating to next generation SIEMs, the results of this work have been implemented based on systems, data, techniques and resilient features of the MASSIF project. In particular, AlienVault has been used as a platform for augmentation of a SIEM and an event set of several million events, collected over a three month period, have formed the basis for the implementation and experimentation. A "brute-force attack" misuse case scenario was selected to highlight the benefits of geolocation information as an enhancement to SIEM detection (and false-positive prevention). With respect to privacy, a privacy model is introduced for SIEM frameworks. This model utilises existing privacy legislation, that is most stringent in terms of privacy, as a basis. An analysis of the implementation and testing is conducted, focusing equally on data security and privacy, that is, assessing location-based information in enhancing SIEM capability in advanced security detection, and, determining if privacy-enforcing procedures on geolocation in SIEMs and other meta-systems are achievable and enforceable. Opportunities for geolocation enhancing various security techniques are considered, specifically for solving misuse cases identified as existing problems in enterprise environments. In summary, the research shows that additional security confidence and insight can be achieved through the augmentation of SIEM event information with geo-location information. Through the use of spatial cloaking it is also possible to incorporate location information without com- promising individual privacy. Overall the research reveals that there are significant benefits for SIEMs to make use of geo-location in their analysis calculations, and that this can be effectively conducted in ways which are acceptable to privacy considerations when considered against prevailing privacy legislation and guidelines
    • ā€¦
    corecore