264 research outputs found

    A Classification of non-Cryptographic Anonymization Techniques Ensuring Privacy in Big Data

    Get PDF
    Recently, Big Data processing becomes crucial to most enterprise and government applications due to the fast growth of the collected data. However, this data often includes private personal information that arise new security and privacy concerns. Moreover, it is widely agreed that the sheer scale of big data makes many privacy preserving techniques unavailing. Therefore, in order to ensure privacy in big data, anonymization is suggested as one of the most efficient approaches. In this paper, we will provide a new detailed classification of the most used non-cryptographic anonymization techniques related to big data including generalization and randomization approaches. Besides, the paper evaluates the presented techniques through integrity, confidentiality and credibility criteria. In addition, three relevant anonymization techniques including k-anonymity, l-diversity and t-closeness are tested on an extract of a huge real data set

    EPOS Security & GDPR Compliance

    Get PDF
    Since May 2018, companies have been required to comply with the General Data Protection Regulation (GDPR). This means that many companies had to change their methods of collecting and processing EU citizens’ data. The compliance process can be very expensive, for example, more specialized human resources are needed, who need to study the regulations and then implement the changes in the IT applications and infrastructures. As a result, new measures and methods need to be developed and implemented, making this process expensive. This project is part of the EPOS project. EPOS allows data on earth sciences from various research institutes in Europe to be shared and used. The data is stored in a database and in some file systems and in addition, there is web services for data mining and control. The EPOS project is a complex distributed system and therefore it is important to guarantee not only its security, but also that it is compatible with GDPR. The need to automate and facilitate this compliance and verification process was identified, in particular the need to develop a tool capable of analyzing applications web. This tool can provide companies in general an easier and faster way to check the degree of compliance with the GDPR in order to assess and implement any necessary changes. With this, PADRES was developed that contains the main points of GDPR organized by principles in the form of checklist which are answered manually. When submitted, a security analysis is also performed based on NMAP and ZAP together with the cookie analyzer. Finally, a report is generated with the information obtained together with a set of suggestions based on the responses obtained from the checklist. Applying this tool to EPOS, most of the points related to GDPR were answered as being in compliance although the rest of the suggestions were generated to help improve the level of compliance and also improve general data management. In the exploitation of vulnerabilities, some were found to be classified as high risk, but most were found to be classified as medium risk.Desde maio de 2018 que as empresas precisam de cumprir o Regulamento Geral de Proteção de Dados (GDPR). Isso significa que muitas empresas tiveram que mudar seus métodos de como recolhem e processam os dados dos cidadãos da UE. O processo de conformidade pode ser muito caro, por exemplo, são necessários recursos humanos mais especializados, que precisam estudar os regulamentos e depois implementar as alterações nos aplicativos e infraestruturas de TI. Com isso novas medidas e métodos precisam ser desenvolvidos e implementados, tornando esse processo caro. Este projeto está inserido no projeto European Plate Observing System (EPOS). O EPOS permite que dados sobre ciências da terra de vários institutos de pesquisa na Europa sejam compartilhados e usados. Os dados são armazenados em base de dados e em alguns sistema de ficheiros e além disso, existem web services para controle e mineração de dados. O projeto EPOS é um sistema distribuído complexo e portanto, é importante garantir não apenas sua segurança, mas também que seja compatível com o GDPR. Foi identificada a necessidade de automatizar e facilitar esse processo, em particular a necessidade de desenvolver uma ferramenta capaz de analisar aplicações web. Essa ferramenta, chamada PrivAcy, Data REgulation and Security (PADRES) pode fornecer às empresas uma maneira mais fácil e rápida de verificar o grau de conformidade com o GDPR com o objetivo de avaliar e implementar quaisquer alterações necessárias. Com isto, esta ferramenta contém os pontos principais do General Data Protection Regulation (GDPR) organizado por princípios em forma duma lista de verificação, os quais são respondidos manualmente. Como os conceitos de privacidade e segurança se complementam, foi também incluída a procura por vulnerabilidades em aplicações web. Ao integrar as ferramentas de código aberto como o Network Mapper (NMAP) ou Zed Attack Proxy (ZAP), é possível então testar a aplicações contra as vulnerabilidades mais frequentes segundo o Open Web Application Security Project (OWASP) Top 10. Aplicando esta ferramenta no EPOS, a maioria dos pontos relativos ao GDPR foram respondidos como estando em conformidade apesar de nos restantes terem sido geradas as respetivas sugestões para ajudar a melhorar o nível de conformidade e também melhorar o gerenciamento geral dos dados. Na exploração das vulnerabilidades foram encontradas algumas classificadas com risco elevado mas na maioria foram encontradas mais com classificação média

    Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing

    Get PDF
    Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved l -diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved l -diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures

    Reconciliation of anti-money laundering instruments and European data protection requirements in permissionless blockchain spaces

    Get PDF
    Artykuł ten zmierza do pogodzenia wymagań unijnego rozporządzenia o ochronie danych osobowych (RODO) i instrumentów przeciwdziałania praniu brudnych pieniędzy i finansowania terroryzmu (AML/CFT) wykorzystywanych w dostępnych publicznie ekosystemach permissionless bazujących na technologi rozproszonych rejestrów (DLT). Dotychczasowe analizy skupiają się zazwyczaj jedynie na jednej z tych regulacji. Natomiast poddanie analizie ich wzajemnych oddziaływań ujawnia brak ich koherencji w sieciach permissionless DLT. RODO zmusza członków społeczności blockchain do wykorzystywania technologii anonimizujących dane albo przynajmniej zapewniających silną pseudonimizację, aby zapewnić zgodność przetwarzania danych z wymogami RODO. Jednocześnie instrumenty globalnej polityki AML/CFT, które są obecnie implementowane w wielu państwach stosowanie do wymogów ustanawianych przez Financial Action Task Force (FATF), przeciwdziałają wykorzystywaniu technologii anonimizacyjnych wbudowanych w protokoły sieci blockchain. Rozwiązania proponowane w tym artykule mają na celu spowodowanie kształtowania sieci blockchain w taki sposób, aby jednocześnie zabezpieczały one dane osobowe użytkowników zgodnie z wysokimi wymogami RODO, jednocześnie adresując ryzyka AML/CFT kreowane przez transakcje w takiej anonimowej lub silnie pseudonimowej przestrzeni. Poszukiwanie nowych instrumentów polityki państw jest konieczne aby zapewnić że państwa nie będą zwalczać rozwoju wszystkich anonimowych sieci blockchian, gdyż jest to konieczne do zapewnienia ich zdolność do realizacji wysokich wymogów RODO w zakresie ochrony danych przetwarzanych na blockchain. Ten artykuł wskazuje narzędzia AML/CFT, które mogą być pomocne do tworzenia blockchainów wspierających prywatność przy jednoczesnym zapewnieniu wykonalności tych narzędzi AML/CFT. Pierwszym z tych narzędzi jest wyjątkowy dostęp państwa do danych transakcyjnych zapisanych na zasadniczo nie-trantsparentnym rejestrze, chronionych technologiami anonimizacyjnymi. Takie narzędzie powinno być jedynie opcjonalne dla danej sieci (finansowej platformy), jak długo inne narzędzia AML/CFT są wykonalne i są zapewniane przez sieć. Jeżeli żadne takie narzędzie nie jest dostępne, a dana sieć nie zapewni wyjątkowego dostępu państwu (państwom), wówczas regulacje powinny pozwalać danemu państwu na zwalczanie danej sieci (platformy finansowej) jako całości. Efektywne narzędzia w tym zakresie powinny obejmować uderzenie przez państwo (państwa) w wartość natywnej kryptowaluty, a nie ściganie indywidualnych jej użytkowników. Takie narzędzia mogą obejmować atak (cyberatak) państwa lub państw który podważy zaufanie użytkowników do danej sieci.This article is an attempt to reconcile the requirements of the EU General Data Protection Regulation (GDPR) and anti-money laundering and combat terrorist financing (AML/CFT) instruments used in permissionless ecosystems based on distributed ledger technology (DLT). Usually, analysis is focused only on one of these regulations. Covering by this research the interplay between both regulations reveals their incoherencies in relation to permissionless DLT. The GDPR requirements force permissionless blockchain communities to use anonymization or, at the very least, strong pseudonymization technologies to ensure compliance of data processing with the GDPR. At the same time, instruments of global AML/CFT policy that are presently being implemented in many countries following the recommendations of the Financial Action Task Force, counteract the anonymity-enhanced technologies built into blockchain protocols. Solutions suggested in this article aim to induce the shaping of permissionless DLT-based networks in ways that at the same time would secure the protection of personal data according to the GDPR rules, while also addressing the money laundering and terrorist financing risks created by transactions in anonymous blockchain spaces or those with strong pseudonyms. Searching for new policy instruments is necessary to ensure that governments do not combat the development of all privacy-blockchains so as to enable a high level of privacy protection and GDPR-compliant data processing. This article indicates two AML/CFT tools which may be helpful for shaping privacy-blockchains that can enable the feasibility of such tools. The first tool is exceptional government access to transactional data written on non-transparent ledgers, obfuscated by advanced anonymization cryptography. The tool should be optional for networks as long as another effective AML/CFT measures are accessible for the intermediaries or for the government in relation to a given network. If these other measures are not available and the network does not grant exceptional access, the regulations should allow governments to combat the development of those networks. Effective tools in that scope should target the value of privacy-cryptocurrency, not its users. Such tools could include, as a tool of last resort, state attacks which would undermine the trust of the community in a specific network

    Anonymization of Event Logs for Network Security Monitoring

    Get PDF
    A managed security service provider (MSSP) must collect security event logs from their customers’ network for monitoring and cybersecurity protection. These logs need to be processed by the MSSP before displaying it to the security operation center (SOC) analysts. The employees generate event logs during their working hours at the customers’ site. One challenge is that collected event logs consist of personally identifiable information (PII) data; visible in clear text to the SOC analysts or any user with access to the SIEM platform. We explore how pseudonymization can be applied to security event logs to help protect individuals’ identities from the SOC analysts while preserving data utility when possible. We compare the impact of using different pseudonymization functions on sensitive information or PII. Non-deterministic methods provide higher level of privacy but reduced utility of the data. Our contribution in this thesis is threefold. First, we study available architectures with different threat models, including their strengths and weaknesses. Second, we study pseudonymization functions and their application to PII fields; we benchmark them individually, as well as in our experimental platform. Last, we obtain valuable feedbacks and lessons from SOC analysts based on their experience. Existing works[43, 44, 48, 39] are generally restricting to the anonymization of the IP traces, which is only one part of the SOC analysts’ investigation of PCAP files inspection. In one of the closest work[47], the authors provide useful, practical anonymization methods for the IP addresses, ports, and raw logs

    An architecture for secure data management in medical research and aided diagnosis

    Get PDF
    Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] O Regulamento Xeral de Proteccion de Datos (GDPR) implantouse o 25 de maio de 2018 e considerase o desenvolvemento mais importante na regulacion da privacidade de datos dos ultimos 20 anos. As multas fortes definense por violar esas regras e non e algo que os centros sanitarios poidan permitirse ignorar. O obxectivo principal desta tese e estudar e proponer unha capa segura/integracion para os curadores de datos sanitarios, onde: a conectividade entre sistemas illados (localizacions), a unificacion de rexistros nunha vision centrada no paciente e a comparticion de datos coa aprobacion do consentimento sexan as pedras angulares de a arquitectura controlar a sua identidade, os perfis de privacidade e as subvencions de acceso. Ten como obxectivo minimizar o medo a responsabilidade legal ao compartir os rexistros medicos mediante o uso da anonimizacion e facendo que os pacientes sexan responsables de protexer os seus propios rexistros medicos, pero preservando a calidade do tratamento do paciente. A nosa hipotese principal e: os conceptos Distributed Ledger e Self-Sovereign Identity son unha simbiose natural para resolver os retos do GDPR no contexto da saude? Requirense solucions para que os medicos e investigadores poidan manter os seus fluxos de traballo de colaboracion sen comprometer as regulacions. A arquitectura proposta logra eses obxectivos nun ambiente descentralizado adoptando perfis de privacidade de datos illados.[Resumen] El Reglamento General de Proteccion de Datos (GDPR) se implemento el 25 de mayo de 2018 y se considera el desarrollo mas importante en la regulacion de privacidad de datos en los ultimos 20 anos. Las fuertes multas estan definidas por violar esas reglas y no es algo que los centros de salud puedan darse el lujo de ignorar. El objetivo principal de esta tesis es estudiar y proponer una capa segura/de integración para curadores de datos de atencion medica, donde: la conectividad entre sistemas aislados (ubicaciones), la unificacion de registros en una vista centrada en el paciente y el intercambio de datos con la aprobacion del consentimiento son los pilares de la arquitectura propuesta. Esta propuesta otorga al titular de los datos un rol central, que le permite controlar su identidad, perfiles de privacidad y permisos de acceso. Su objetivo es minimizar el temor a la responsabilidad legal al compartir registros medicos utilizando el anonimato y haciendo que los pacientes sean responsables de proteger sus propios registros medicos, preservando al mismo tiempo la calidad del tratamiento del paciente. Nuestra hipotesis principal es: .son los conceptos de libro mayor distribuido e identidad autosuficiente una simbiosis natural para resolver los desafios del RGPD en el contexto de la atencion medica? Se requieren soluciones para que los medicos y los investigadores puedan mantener sus flujos de trabajo de colaboracion sin comprometer las regulaciones. La arquitectura propuesta logra esos objetivos en un entorno descentralizado mediante la adopcion de perfiles de privacidad de datos aislados.[Abstract] The General Data Protection Regulation (GDPR) was implemented on 25 May 2018 and is considered the most important development in data privacy regulation in the last 20 years. Heavy fines are defined for violating those rules and is not something that healthcare centers can afford to ignore. The main goal of this thesis is to study and propose a secure/integration layer for healthcare data curators, where: connectivity between isolated systems (locations), unification of records in a patientcentric view and data sharing with consent approval are the cornerstones of the proposed architecture. This proposal empowers the data subject with a central role, which allows to control their identity, privacy profiles and access grants. It aims to minimize the fear of legal liability when sharing medical records by using anonymisation and making patients responsible for securing their own medical records, yet preserving the patient’s quality of treatment. Our main hypothesis is: are the Distributed Ledger and Self-Sovereign Identity concepts a natural symbiosis to solve the GDPR challenges in the context of healthcare? Solutions are required so that clinicians and researchers can maintain their collaboration workflows without compromising regulations. The proposed architecture accomplishes those objectives in a decentralized environment by adopting isolated data privacy profiles

    The right to privacy in a Big Data society. Merits and limits of the GDPR

    Get PDF
    With the non-stop development of technology, Big Data generation has seen a rise like no other. The rise of Big Data has given a possibility to numerous ways in which personal data of consumers could be used leaving the people vulnerable. The European Union came up with GDPR as the latest way of protecting the rights of citizens. In this paper, we analyze different aspects of Big Data such as legal framework, consent, and anonymization and see in what ways GDPR has benefitted in protecting personal data and what its limitations are

    Improving Mild to Moderate Depression With an App-Based Self-Guided Intervention: Protocol for a Randomized Controlled Trial

    Get PDF
    Background: Depression is one of the most prevalent mental disorders and frequently co-occurs with other mental disorders. Despite the high direct and indirect costs to both individuals and society, more than 80% of those diagnosed with depression remain with their primary care physician and do not receive specialized treatment. Self-guided digital interventions have been shown to improve depression and, due to their scalability, have a large potential public health impact. Current digital interventions often focus on specific disorders, while recent research suggests that transdiagnostic approaches are more suitable. Objective: This paper presents the protocol for a study that aims to assess the efficacy of a self-guided transdiagnostic app-based self-management intervention in patients with mild or moderate depression with and without comorbid mental disorders. Specifically, we are investigating the impact of the intervention on symptoms of depression, quality of life, anxiety symptoms, and mental health–related patient empowerment and self-management skills. Methods: The intervention under investigation, MindDoc with Prescription, is a self-guided digital intervention aimed at supporting individuals with mild to moderate mental disorders from the internalizing spectrum, including depression. The app can be used as a low-threshold psychosocial intervention. Up to 570 adult patients will be randomized to either receive the intervention in addition to care as usual or only care as usual. We are including adults with a permanent residency in Germany and mild or moderate depression according to International Classification of Diseases, 10th Revision, criteria (F32.0, F32.1, F33.0, and F33.1). Clinical interviews will be conducted to confirm the diagnosis. Data will be collected at baseline as well as 8 weeks and 6 months after randomization. The primary outcome will be depression symptom severity after 8 weeks. Secondary outcomes will be quality of life, anxiety symptom severity, and patient empowerment and self-management behaviors. Data will be analyzed using multiple imputations, using the intention-to-treat principle, while sensitivity analyses will be based on additional imputation strategies and a per-protocol analysis. Results: Recruitment for the trial started on February 7, 2023, and the first participant was randomized on February 14, 2023. As of September 5, 2023, 275 participants have been included in the trial and 176 have provided the primary outcome. The rate of missing values in the primary outcome is approximately 20%. Conclusions: Data from this efficacy trial will be used to establish whether access to the intervention is associated with an improvement in depression symptoms in individuals diagnosed with mild or moderate depression. The study will contribute to expanding the evidence base on transdiagnostic digital interventions. Trial Registration: German Registry of Clinical Trials DRKS00030852; https://drks.de/search/de/trial/DRKS0003085

    Anonymization procedures for tabular data: an explanatory technical and legal synthesis

    Get PDF
    In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset
    corecore