7 research outputs found

    A data recipient centered de-identification method to retain statistical attributes

    Get PDF
    AbstractPrivacy has always been a great concern of patients and medical service providers. As a result of the recent advances in information technology and the government’s push for the use of Electronic Health Record (EHR) systems, a large amount of medical data is collected and stored electronically. This data needs to be made available for analysis but at the same time patient privacy has to be protected through de-identification. Although biomedical researchers often describe their research plans when they request anonymized data, most existing anonymization methods do not use this information when de-identifying the data. As a result, the anonymized data may not be useful for the planned research project. This paper proposes a data recipient centered approach to tailor the de-identification method based on input from the recipient of the data. We demonstrate our approach through an anonymization project for biomedical researchers with specific goals to improve the utility of the anonymized data for statistical models used for their research project. The selected algorithm improves a privacy protection method called Condensation by Aggarwal et al. Our methods were tested and validated on real cancer surveillance data provided by the Kentucky Cancer Registry

    Protecting Privacy When Releasing Search Results from Medical Document Data

    Get PDF
    Health information technologies have greatly facilitated sharing of personal health data for secondary use, which is critical to medical and health research. However, there is a growing concern about privacy due to data sharing and publishing. Medical and health data typically contain unstructured text documents, such as clinical narratives, pathology reports, and discharge summaries. This study concerns privacy-preserving extraction, summary, and release of information from medical documents. Existing studies on privacy-preserving data mining and publishing focus mostly on structured data. We propose a novel approach to enable privacy-preserving extract, summarize, query and report patients’ demographic, health and medical information from medical documents. The extracted data is represented in a semi-structured, set-valued data format, which can be stored in a health information system for query and analysis. The privacy preserving mechanism is based on the cutting-edge idea of differential privacy, which offers rigorous privacy guarantee

    BIG DATA ANALYTICS TOOLS: A BIBLIOMETRIC LITERATURE REVIEW

    Get PDF
    Analytics as a field is rapidly growing because of businesses need to tackle increasingly large and complex datasets to gain a competitive edge in the business environment. Currently there are two distinct types of analytics technologies: proprietary and open-source. Each has their distinct strengths and weaknesses, but which is relevant in today business environment? Questions like this are important for businesses wanting utilize these technologies in order to become larger and more efficient. This research answered this question and it was found that both proprietary and open-source technologies are equally relevant in current analytics research. Because of this, businesses should be more aware of the analytics field and how both types of technologies could benefit their current operations and should strategically utilize both types to meet their specific data needs

    Protecting Time Series Data with Minimal Forecast Loss

    Full text link
    Forecasting could be negatively impacted due to anonymization requirements in data protection legislation. To measure the potential severity of this problem, we derive theoretical bounds for the loss to forecasts from additive exponential smoothing models using protected data. Following the guidelines of anonymization from the General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA), we develop the kk-nearest Time Series (kk-nTS) Swapping and kk-means Time Series (kk-mTS) Shuffling methods to create protected time series data that minimizes the loss to forecasts while preventing a data intruder from detecting privacy issues. For efficient and effective decision making, we formally model an integer programming problem for a perfect matching for simultaneous data swapping in each cluster. We call it a two-party data privacy framework since our optimization model includes the utilities of a data provider and data intruder. We apply our data protection methods to thousands of time series and find that it maintains the forecasts and patterns (level, trend, and seasonality) of time series well compared to standard data protection methods suggested in legislation. Substantively, our paper addresses the challenge of protecting time series data when used for forecasting. Our findings suggest the managerial importance of incorporating the concerns of forecasters into the data protection itself

    Social Media Analytics and Information Privacy Decisions: Impact of User Intimate Knowledge and Co-ownership Perceptions

    Get PDF
    Social media analytics has been recognized as a distinct research field in the analytics subdomain that is developed by processing social media content to generate important business knowledge. Understanding the factors that influence privacy decisions around its use is important as it is often perceived to be opaque and mismanaged. Social media users have been reported to have low intimate knowledge and co-ownership perception of social media analytics and its information privacy decisions. This deficiency leads them to perceive privacy violations if firms make privacy decisions that conflict with their expectations. Such perceived privacy violations often lead to business disruptions caused by user rebellions, regulatory interventions, firm reputation damage, and other business continuity threats. Existing research had developed theoretical frameworks for multi-level information privacy management and called for empirical testing of which constructs would increase user self-efficacy in negotiating with firms for joint social media analytics decision making. A response to this call was studied by measuring the constructs in the literature that lead to normative social media analytics and its information privacy decisions. The study model was developed by combining the relevant constructs from the theory of psychological ownership in organizations and the theory of multilevel information privacy. From psychological ownership theory, the impact that intimate knowledge had on co-ownership perception of social media analytics was added. From the theory of multi-level information privacy, the impact of co-ownership perception on the antecedents of information privacy decisions: the social identity assumed, and information privacy norms used were examined. In addition, the moderating role of the cost and benefits components of the privacy calculus on the relationship between information privacy norms and expected information privacy decisions was measured. A quantitative research approach was used to measure these factors. A web-based survey was developed using survey items obtained from prior studies that measured these constructs with only minor wording changes made. A pilot-study of 34 participants was conducted to test and finalize the instrument. The survey was distributed to adult social media users in the United States of America on a crowdsourcing marketplace using a commercial online survey service. 372 responses were accepted and analyzed. The partial least squares structural equation modeling method was used to assess the model and analyze the data using the Smart partial least squares 3 statistical software package. An increase in intimate knowledge of social media analytics led to higher co-ownership perception among social media users. Higher levels of co-ownership perception led to higher expectation of adoption of a salient social identity and higher expected information privacy norms. In addition, higher levels of expectation of social information privacy norm use led to normative privacy decisions. Higher levels of benefit estimation in the privacy calculus negatively moderated the relationship between social norms and privacy decision making. Co-ownership perception did not have a significant effect on the cost estimation in social media analytics privacy calculus. Similarly, the cost estimation in the privacy calculus did not have a significant effect on the relationship between information privacy norm adoption and the expectation of a normative information privacy decision. The findings of the study are a notable information systems literature contribution in both theory and practice. The study is one of the few to further develop multilevel information privacy theory by adding the intimate knowledge construct. The study model is a contribution to literature since its one of first to combine and validate elements of psychological ownership in organization theory to the theory of multilevel information privacy in order to understand what social media users expect when social media analytics information privacy decisions are made. The study also contributes by suggesting approaches practitioners can use to collaboratively manage their social media analytics information privacy decisions which was previously perceived to be opaque and under examined. Practical suggestions social media firms could use to decrease negative user affectations and engender deeper information privacy collaboration with users as they seek benefit from social media analytics were offered

    Watchers, Watched, and Watching in the Digital Age: Reconceptualization of it Monitoring as Complex Action Nets

    Get PDF
    Despite increasing studies of information technology (IT) monitoring, our understanding of how IT-mediates relations between the watcher and watched remains limited in two areas. First, either traditional actor-centric frameworks assuming predefined watcher-watched relationships (e.g., panopticon or synopticon) are adopted or monitoring actors are removed to focus on data flows (e.g., dataveillance, assemblages, panspectron). Second, IT monitoring research predominantly assumes IT artifacts to be stable, bounded, designed objects, with prescribed uses which provides an oversimplified view of actor relationships. To redress these limitations, a conceptual framework of veillance applicable to a variety of possible IT or non-IT-mediated relationships between watcher and watched is developed. Using the framework, we conduct a conceptual review of the literature, identifying IT-enabled monitoring and transformations of actors, goals, mechanisms and foci and develop an action net model of IT veillance where IT artifacts are theorized as equivocal, distributable and open for diverse use, open to edits and contributions by unbounded sets of heterogenous actors characterized by diverse goals and capabilities. The action net of IT veillance is defined as a flexible decentralized interconnected web shaped by multidirectional watcher-watched relationships, enabling multiple dynamic goals and foci. Cumulative contributions by heterogenous participants organize and manipulate the net, having an impact through influencing dispositions, visibilities and the inclusion/exclusion of self and others. The model makes three important theoretical contributions to our understanding of IT monitoring of watchers and watched and their relationships. We discuss implications and avenues for future studies on IT veillance
    corecore