264 research outputs found
A Classification of non-Cryptographic Anonymization Techniques Ensuring Privacy in Big Data
Recently, Big Data processing becomes crucial to most enterprise and government applications due to the fast growth of the collected data. However, this data often includes private personal information that arise new security and privacy concerns. Moreover, it is widely agreed that the sheer scale of big data makes many privacy preserving techniques unavailing. Therefore, in order to ensure privacy in big data, anonymization is suggested as one of the most efficient approaches. In this paper, we will provide a new detailed classification of the most used non-cryptographic anonymization techniques related to big data including generalization and randomization approaches. Besides, the paper evaluates the presented techniques through integrity, confidentiality and credibility criteria. In addition, three relevant anonymization techniques including k-anonymity, l-diversity and t-closeness are tested on an extract of a huge real data set
EPOS Security & GDPR Compliance
Since May 2018, companies have been required to comply with the General Data Protection
Regulation (GDPR). This means that many companies had to change their methods of collecting
and processing EU citizens’ data. The compliance process can be very expensive, for example,
more specialized human resources are needed, who need to study the regulations and then
implement the changes in the IT applications and infrastructures. As a result, new measures
and methods need to be developed and implemented, making this process expensive.
This project is part of the EPOS project. EPOS allows data on earth sciences from various
research institutes in Europe to be shared and used. The data is stored in a database and in
some file systems and in addition, there is web services for data mining and control. The EPOS
project is a complex distributed system and therefore it is important to guarantee not only its
security, but also that it is compatible with GDPR. The need to automate and facilitate this
compliance and verification process was identified, in particular the need to develop a tool
capable of analyzing applications web. This tool can provide companies in general an easier and
faster way to check the degree of compliance with the GDPR in order to assess and implement
any necessary changes.
With this, PADRES was developed that contains the main points of GDPR organized by principles
in the form of checklist which are answered manually. When submitted, a security analysis is
also performed based on NMAP and ZAP together with the cookie analyzer. Finally, a report
is generated with the information obtained together with a set of suggestions based on the
responses obtained from the checklist.
Applying this tool to EPOS, most of the points related to GDPR were answered as being in compliance although the rest of the suggestions were generated to help improve the level of compliance and also improve general data management. In the exploitation of vulnerabilities, some
were found to be classified as high risk, but most were found to be classified as medium risk.Desde maio de 2018 que as empresas precisam de cumprir o Regulamento Geral de Proteção
de Dados (GDPR). Isso significa que muitas empresas tiveram que mudar seus métodos de como
recolhem e processam os dados dos cidadãos da UE. O processo de conformidade pode ser muito
caro, por exemplo, são necessários recursos humanos mais especializados, que precisam estudar
os regulamentos e depois implementar as alterações nos aplicativos e infraestruturas de TI.
Com isso novas medidas e métodos precisam ser desenvolvidos e implementados, tornando esse
processo caro.
Este projeto está inserido no projeto European Plate Observing System (EPOS). O EPOS permite
que dados sobre ciências da terra de vários institutos de pesquisa na Europa sejam compartilhados e usados. Os dados são armazenados em base de dados e em alguns sistema de ficheiros
e além disso, existem web services para controle e mineração de dados. O projeto EPOS é um
sistema distribuído complexo e portanto, é importante garantir não apenas sua segurança, mas
também que seja compatível com o GDPR. Foi identificada a necessidade de automatizar e facilitar esse processo, em particular a necessidade de desenvolver uma ferramenta capaz de analisar aplicações web. Essa ferramenta, chamada PrivAcy, Data REgulation and Security (PADRES)
pode fornecer às empresas uma maneira mais fácil e rápida de verificar o grau de conformidade
com o GDPR com o objetivo de avaliar e implementar quaisquer alterações necessárias.
Com isto, esta ferramenta contém os pontos principais do General Data Protection Regulation
(GDPR) organizado por princípios em forma duma lista de verificação, os quais são respondidos
manualmente. Como os conceitos de privacidade e segurança se complementam, foi também
incluída a procura por vulnerabilidades em aplicações web. Ao integrar as ferramentas de código
aberto como o Network Mapper (NMAP) ou Zed Attack Proxy (ZAP), é possível então testar a
aplicações contra as vulnerabilidades mais frequentes segundo o Open Web Application Security
Project (OWASP) Top 10.
Aplicando esta ferramenta no EPOS, a maioria dos pontos relativos ao GDPR foram respondidos
como estando em conformidade apesar de nos restantes terem sido geradas as respetivas sugestões para ajudar a melhorar o nível de conformidade e também melhorar o gerenciamento
geral dos dados. Na exploração das vulnerabilidades foram encontradas algumas classificadas
com risco elevado mas na maioria foram encontradas mais com classificação média
Privacy Preserving Attribute-Focused Anonymization Scheme for Healthcare Data Publishing
Advancements in Industry 4.0 brought tremendous improvements in the healthcare sector, such as better quality of treatment, enhanced communication, remote monitoring, and reduced cost. Sharing healthcare data with healthcare providers is crucial for harnessing the benefits of such improvements. In general, healthcare data holds sensitive information about individuals. Hence, sharing such data is challenging because of various security and privacy issues. According to privacy regulations and ethical requirements, it is essential to preserve the privacy of patients before sharing data for medical research. State-of-the-art literature on privacy preserving studies either uses cryptographic approaches to protect the privacy or uses anonymizing techniques regardless of the type of attributes, this results in poor protection and data utility. In this paper, we propose an attribute-focused privacy preserving data publishing scheme. The proposed scheme is two-fold, comprising a fixed-interval approach to protect numerical attributes and an improved l -diverse slicing approach to protect the categorical and sensitive attributes. In the fixed-interval approach, the original values of the healthcare data are replaced with an equivalent computed value. The improved l -diverse slicing approach partitions the data both horizontally and vertically to avoid privacy leaks. Extensive experiments with real-world datasets are conducted to evaluate the performance of the proposed scheme. The classification models built on anonymized dataset yields approximately 13% better accuracy than benchmarked algorithms. Experimental analyses show that the average information loss which is measured by normalized certainty penalty (NCP) is reduced by 12% compared to similar approaches. The attribute focused scheme not only provides data utility but also prevents the data from membership disclosures, attribute disclosures, and identity disclosures
Reconciliation of anti-money laundering instruments and European data protection requirements in permissionless blockchain spaces
Artykuł ten zmierza do pogodzenia wymagań unijnego rozporządzenia o ochronie danych osobowych (RODO) i instrumentów przeciwdziałania praniu brudnych pieniędzy i finansowania terroryzmu (AML/CFT) wykorzystywanych w dostępnych publicznie ekosystemach permissionless bazujących na technologi rozproszonych rejestrów (DLT). Dotychczasowe analizy skupiają się zazwyczaj jedynie na jednej z tych regulacji. Natomiast poddanie analizie ich wzajemnych oddziaływań ujawnia brak ich koherencji w sieciach permissionless DLT. RODO zmusza członków społeczności blockchain do wykorzystywania technologii anonimizujących dane albo przynajmniej zapewniających silną pseudonimizację, aby zapewnić zgodność przetwarzania danych z wymogami RODO. Jednocześnie instrumenty globalnej polityki AML/CFT, które są obecnie implementowane w wielu państwach stosowanie do wymogów ustanawianych przez Financial Action Task Force (FATF), przeciwdziałają wykorzystywaniu technologii anonimizacyjnych wbudowanych w protokoły sieci blockchain. Rozwiązania proponowane w tym artykule mają na celu spowodowanie kształtowania sieci blockchain w taki sposób, aby jednocześnie zabezpieczały one dane osobowe użytkowników zgodnie z wysokimi wymogami RODO, jednocześnie adresując ryzyka AML/CFT kreowane przez transakcje w takiej anonimowej lub silnie pseudonimowej przestrzeni. Poszukiwanie nowych instrumentów polityki państw jest konieczne aby zapewnić że państwa nie będą zwalczać rozwoju wszystkich anonimowych sieci blockchian, gdyż jest to konieczne do zapewnienia ich zdolność do realizacji wysokich wymogów RODO w zakresie ochrony danych przetwarzanych na blockchain. Ten artykuł wskazuje narzędzia AML/CFT, które mogą być pomocne do tworzenia blockchainów wspierających prywatność przy jednoczesnym zapewnieniu wykonalności tych narzędzi AML/CFT. Pierwszym z tych narzędzi jest wyjątkowy dostęp państwa do danych transakcyjnych zapisanych na zasadniczo nie-trantsparentnym rejestrze, chronionych technologiami anonimizacyjnymi. Takie narzędzie powinno być jedynie opcjonalne dla danej sieci (finansowej platformy), jak długo inne narzędzia AML/CFT są wykonalne i są zapewniane przez sieć. Jeżeli żadne takie narzędzie nie jest dostępne, a dana sieć nie zapewni wyjątkowego dostępu państwu (państwom), wówczas regulacje powinny pozwalać danemu państwu na zwalczanie danej sieci (platformy finansowej) jako całości. Efektywne narzędzia w tym zakresie powinny obejmować uderzenie przez państwo (państwa) w wartość natywnej kryptowaluty, a nie ściganie indywidualnych jej użytkowników. Takie narzędzia mogą obejmować atak (cyberatak) państwa lub państw który podważy zaufanie użytkowników do danej sieci.This article is an attempt to reconcile the requirements of the EU General Data Protection Regulation (GDPR) and anti-money laundering and combat terrorist financing (AML/CFT) instruments used in permissionless ecosystems based on distributed ledger technology (DLT). Usually, analysis is focused only on one of these regulations. Covering by this research the interplay between both regulations reveals their incoherencies in relation to permissionless DLT. The GDPR requirements force permissionless blockchain communities to use anonymization or, at the very least, strong pseudonymization technologies to ensure compliance of data processing with the GDPR. At the same time, instruments of global AML/CFT policy that are presently being implemented in many countries following the recommendations of the Financial Action Task Force, counteract the anonymity-enhanced technologies built into blockchain protocols. Solutions suggested in this article aim to induce the shaping of permissionless DLT-based networks in ways that at the same time would secure the protection of personal data according to the GDPR rules, while also addressing the money laundering and terrorist financing risks created by transactions in anonymous blockchain spaces or those with strong pseudonyms. Searching for new policy instruments is necessary to ensure that governments do not combat the development of all privacy-blockchains so as to enable a high level of privacy protection and GDPR-compliant data processing. This article indicates two AML/CFT tools which may be helpful for shaping privacy-blockchains that can enable the feasibility of such tools. The first tool is exceptional government access to transactional data written on non-transparent ledgers, obfuscated by advanced anonymization cryptography. The tool should be optional for networks as long as another effective AML/CFT measures are accessible for the intermediaries or for the government in relation to a given network. If these other measures are not available and the network does not grant exceptional access, the regulations should allow governments to combat the development of those networks. Effective tools in that scope should target the value of privacy-cryptocurrency, not its users. Such tools could include, as a tool of last resort, state attacks which would undermine the trust of the community in a specific network
Anonymization of Event Logs for Network Security Monitoring
A managed security service provider (MSSP) must collect security event logs from
their customers’ network for monitoring and cybersecurity protection. These logs
need to be processed by the MSSP before displaying it to the security operation
center (SOC) analysts. The employees generate event logs during their working hours
at the customers’ site. One challenge is that collected event logs consist of personally
identifiable information (PII) data; visible in clear text to the SOC analysts or any
user with access to the SIEM platform.
We explore how pseudonymization can be applied to security event logs to help
protect individuals’ identities from the SOC analysts while preserving data utility
when possible. We compare the impact of using different pseudonymization functions
on sensitive information or PII. Non-deterministic methods provide higher level of
privacy but reduced utility of the data.
Our contribution in this thesis is threefold. First, we study available architectures
with different threat models, including their strengths and weaknesses. Second, we
study pseudonymization functions and their application to PII fields; we benchmark
them individually, as well as in our experimental platform. Last, we obtain valuable
feedbacks and lessons from SOC analysts based on their experience.
Existing works[43, 44, 48, 39] are generally restricting to the anonymization of
the IP traces, which is only one part of the SOC analysts’ investigation of PCAP
files inspection. In one of the closest work[47], the authors provide useful, practical
anonymization methods for the IP addresses, ports, and raw logs
An architecture for secure data management in medical research and aided diagnosis
Programa Oficial de Doutoramento en Tecnoloxías da Información e as Comunicacións. 5032V01[Resumo] O Regulamento Xeral de Proteccion de Datos (GDPR) implantouse o 25 de maio de 2018 e considerase o desenvolvemento mais importante na regulacion da privacidade de datos dos ultimos 20 anos. As multas fortes definense por violar esas regras e non e algo que os centros sanitarios poidan permitirse ignorar. O obxectivo principal desta tese e estudar e proponer unha capa segura/integracion para os curadores de datos sanitarios, onde: a conectividade entre sistemas illados (localizacions), a unificacion de rexistros nunha vision centrada no paciente e a comparticion de datos coa aprobacion do consentimento sexan as pedras angulares de a arquitectura controlar a sua identidade, os perfis de privacidade e as subvencions de acceso. Ten como obxectivo minimizar o medo a responsabilidade legal ao compartir os rexistros medicos mediante o uso da anonimizacion e facendo que os pacientes sexan responsables de protexer os seus propios rexistros medicos, pero preservando a calidade do tratamento do paciente. A nosa hipotese principal e: os conceptos Distributed Ledger e Self-Sovereign Identity son unha simbiose natural para resolver os retos do GDPR no contexto da saude? Requirense solucions para que os medicos e investigadores poidan manter os seus fluxos de traballo de colaboracion sen comprometer as regulacions.
A arquitectura proposta logra eses obxectivos nun ambiente descentralizado adoptando perfis de privacidade de datos illados.[Resumen] El Reglamento General de Proteccion de Datos (GDPR) se implemento el 25 de mayo de 2018 y se considera el desarrollo mas importante en la regulacion de privacidad de datos en los ultimos 20 anos. Las fuertes multas estan definidas por violar esas reglas y no es algo que los centros de salud puedan darse el lujo de ignorar. El objetivo principal de esta tesis es estudiar y proponer una capa segura/de integración para curadores de datos de atencion medica, donde: la conectividad entre sistemas aislados (ubicaciones), la unificacion de registros en una vista centrada en el paciente y el intercambio de datos con la aprobacion del consentimiento son los pilares de la arquitectura propuesta. Esta propuesta otorga al titular de los datos un rol central, que le permite controlar su identidad, perfiles de privacidad y permisos de acceso. Su objetivo es minimizar el temor a la responsabilidad legal al compartir registros medicos utilizando el anonimato y haciendo que los pacientes sean responsables de proteger sus propios registros medicos, preservando al mismo tiempo la calidad del tratamiento del paciente. Nuestra hipotesis principal es: .son los conceptos de libro mayor distribuido e identidad autosuficiente una simbiosis natural para resolver los desafios del RGPD en el contexto de la atencion medica? Se requieren soluciones para que los medicos y los investigadores puedan mantener sus flujos de trabajo de colaboracion sin comprometer las regulaciones.
La arquitectura propuesta logra esos objetivos en un entorno descentralizado mediante la adopcion de perfiles de privacidad de datos aislados.[Abstract] The General Data Protection Regulation (GDPR) was implemented on 25 May 2018 and is considered the most important development in data privacy regulation in the last 20 years. Heavy fines are defined for violating those rules and is not something that healthcare centers can afford to ignore. The main goal of this thesis is to study and propose a secure/integration layer for healthcare data curators, where: connectivity between isolated systems (locations), unification of records in a patientcentric view and data sharing with consent approval are the cornerstones of the proposed architecture. This proposal empowers the data subject with a central role, which allows to control their identity, privacy profiles and access grants. It aims to minimize the fear of legal liability when sharing medical records by using anonymisation and making patients responsible for securing their own medical records, yet preserving the patient’s quality of treatment. Our main hypothesis is: are the Distributed Ledger and Self-Sovereign Identity concepts a natural symbiosis to solve the GDPR challenges in the context of healthcare? Solutions are required so that clinicians and researchers can maintain their collaboration workflows without compromising regulations. The proposed architecture accomplishes those objectives in a decentralized environment by adopting isolated data privacy profiles
Recommended from our members
Novel reversible text data de-identification techniques based on native data structures
Technological development in today's digital world has resulted in the collection and storage of large amounts of personal data. These data enable both direct services and non-direct activities, known as secondary use. The secondary use of data can improve decision-making, service experiences, and healthcare systems. However, the widespread reuse of personal data raises significant privacy and policy issues, especially for health- related information; these data may contain sensitive data, leading to privacy breaches if compromised. Legal systems establish laws to protect the privacy of personal data disclosed for secondary use. A well-known example is the General Data Protection Regulation (GDPR), which outlines a specific set of rules for sharing and storing personal data to protect individual privacy. The GDPR explicitly points to data de-identification, especially pseudonymization, as one measure that can help meet the requirements for the processing of personal data.
The literature on privacy preservation approaches has largely been developed in the field of data anonymization, where personal data are irreversibly removed or obfuscated and there is no means by which to recover an individual's identity if needed. By contrast, pseudonymization is a promising technique to protect privacy while enabling the recovery of de-identified data. Significantly, many existing approaches for pseudonymization were developed long before the GDPR requirements were established, and so they may fail to satisfy its provisions. Therefore, it is worthwhile to offer technical solutions to preserve privacy while supporting the legitimate use of data.
This thesis proposes a novel de-identification system for unstructured textual data, known as ARTPHIL, that generates de-identified data in compliance with the GDPR requirement for strong pseudonymization. The system was evaluated using 2014 i2b2 testing data. The proposed system achieved a recall of 96.93% in terms of detecting and encrypting personal health information, as specified under guidelines provided by the Health Insurance Portability and Accountability Act (HIPAA). The system used a novel and lightweight cryptography algorithm E-ART to encrypt personal data cost-effectively and without compromising security. The main novelty of the E-ART algorithm is the use of the reflection property of a balanced binary tree data structure as substitution method instead of complex and multiple iterations. The performance and security of the proposed algorithm were compared to two symmetric encryption algorithms: The Advanced Encryption Standard and Data Encryption Standard. The security analysis showed comparable results, but the performance analysis indicated that E‐ART had the shortest ciphertext and running time with comparable memory usage, which indicates the feasibility of using ARTPHIL for delay-sensitive or data-intensive application
The right to privacy in a Big Data society. Merits and limits of the GDPR
With the non-stop development of technology, Big Data generation has seen a rise like no other. The rise of Big Data has given a possibility to numerous ways in which personal data of consumers could be used leaving the people vulnerable. The European Union came up with GDPR as the latest way of protecting the rights of citizens. In this paper, we analyze different aspects of Big Data such as legal framework, consent, and anonymization and see in what ways GDPR has benefitted in protecting personal data and what its limitations are
Improving Mild to Moderate Depression With an App-Based Self-Guided Intervention: Protocol for a Randomized Controlled Trial
Background: Depression is one of the most prevalent mental disorders and frequently co-occurs with other mental disorders. Despite the high direct and indirect costs to both individuals and society, more than 80% of those diagnosed with depression remain with their primary care physician and do not receive specialized treatment. Self-guided digital interventions have been shown to improve depression and, due to their scalability, have a large potential public health impact. Current digital interventions often focus on specific disorders, while recent research suggests that transdiagnostic approaches are more suitable.
Objective: This paper presents the protocol for a study that aims to assess the efficacy of a self-guided transdiagnostic app-based self-management intervention in patients with mild or moderate depression with and without comorbid mental disorders. Specifically, we are investigating the impact of the intervention on symptoms of depression, quality of life, anxiety symptoms, and mental health–related patient empowerment and self-management skills.
Methods: The intervention under investigation, MindDoc with Prescription, is a self-guided digital intervention aimed at supporting individuals with mild to moderate mental disorders from the internalizing spectrum, including depression. The app can be used as a low-threshold psychosocial intervention. Up to 570 adult patients will be randomized to either receive the intervention in addition to care as usual or only care as usual. We are including adults with a permanent residency in Germany and mild or moderate depression according to International Classification of Diseases, 10th Revision, criteria (F32.0, F32.1, F33.0, and F33.1). Clinical interviews will be conducted to confirm the diagnosis. Data will be collected at baseline as well as 8 weeks and 6 months after randomization. The primary outcome will be depression symptom severity after 8 weeks. Secondary outcomes will be quality of life, anxiety symptom severity, and patient empowerment and self-management behaviors. Data will be analyzed using multiple imputations, using the intention-to-treat principle, while sensitivity analyses will be based on additional imputation strategies and a per-protocol analysis.
Results: Recruitment for the trial started on February 7, 2023, and the first participant was randomized on February 14, 2023. As of September 5, 2023, 275 participants have been included in the trial and 176 have provided the primary outcome. The rate of missing values in the primary outcome is approximately 20%.
Conclusions: Data from this efficacy trial will be used to establish whether access to the intervention is associated with an improvement in depression symptoms in individuals diagnosed with mild or moderate depression. The study will contribute to expanding the evidence base on transdiagnostic digital interventions.
Trial Registration: German Registry of Clinical Trials DRKS00030852; https://drks.de/search/de/trial/DRKS0003085
Anonymization procedures for tabular data: an explanatory technical and legal synthesis
In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset
- …