9 research outputs found

    Sistema de anonimización de datos estructurados

    Get PDF
    Las aproximaciones más empleadas en la industria para proteger los datos privados implican deteriorar su utilidad para los ejercicios de analítica. Por ello, este trabajo propone Anonylitics, un sistema para la anonimización de datos estructurados, que se fundamenta en la preservación de la distribución de los datos numéricos, al mismo tiempo que se garantiza su privacidad. La propuesta realizada permite seguir teniendo información útil para la analítica de datos a nivel empresarial, lo cual es evidenciado a través de la validación efectuada mediante la anonimización de dos conjuntos de datos reales que demuestran el potencial del sistema y sus algoritmos.The most used approaches in the industry to protect private data imply to impair its utility for analytical exercises. For this reason, this work proposes Anonylitics, a system for the anonymization of structured data, which is based on the preservation of the distribution of numerical data, at the same time that their privacy is guaranteed. The proposal makes it possible to continue having useful information for business data analytics, which is evidenced through the validation carried out by anonymizing two sets of real data that demonstrate the potential of the system and its algorithms.Magíster en Ingeniería de Sistemas y ComputaciónMaestrí

    Supporting Autonomic Management of Clouds: Service Clustering with Random Forest

    Get PDF
    A promising solution for the management of services in clouds, as fostered by autonomic computing, is to resort to self-management. However, the obfuscation of underlying details of services in cloud computing, also due to privacy requirements, affects the effectiveness of autonomic managers. Data-driven approaches, in particular those relying on service clustering based on machine learning techniques, can assist the autonomic management and support decisions concerning, e.g., the scheduling and deployment of services. Unfortunately, applying such approaches is further complicated by the coexistence of different types of data within the information provided by the monitoring of cloud systems: both continuous (e.g., CPU load) and categorical (e.g., VM instance type) data are available. Current approaches deal with this problem in a heuristic fashion. In this paper, instead, we propose an approach that uses all types of data, and learns in a data-driven fashion the similarities and patterns among the services. More specifically, we design an unsupervised formulation of random forest to calculate service similarities and provide them as input to a clustering algorithm. For the sake of efficiency and to meet the dynamism requirement of autonomic clouds, our methodology consists of two steps: 1) off-line clustering and 2) on-line prediction. Using datasets from real-world clouds, we demonstrate the superiority of our solution with respect to others and validate the accuracy of the on-line prediction. Moreover, to show applicability of our approach, we devise a service scheduler that uses similarity among services, and evaluate its performance in a cloud test-bed using realistic data

    Visualisation of Large-Scale Call-Centre Data

    Get PDF
    The contact centre industry employs 4% of the entire United King-dom and United States’ working population and generates gigabytes of operational data that require analysis, to provide insight and to improve efficiency. This thesis is the result of a collaboration with QPC Limited who provide data collection and analysis products for call centres. They provided a large data-set featuring almost 5 million calls to be analysed. This thesis utilises novel visualisation techniques to create tools for the exploration of the large, complex call centre data-set and to facilitate unique observations into the data.A survey of information visualisation books is presented, provid-ing a thorough background of the field. Following this, a feature-rich application that visualises large call centre data sets using scatterplots that support millions of points is presented. The application utilises both the CPU and GPU acceleration for processing and filtering and is exhibited with millions of call events.This is expanded upon with the use of glyphs to depict agent behaviour in a call centre. A technique is developed to cluster over-lapping glyphs into a single parent glyph dependant on zoom level and a customizable distance metric. This hierarchical glyph repre-sents the mean value of all child agent glyphs, removing overlap and reducing visual clutter. A novel technique for visualising individually tailored glyphs using a Graphics Processing Unit is also presented, and demonstrated rendering over 100,000 glyphs at interactive frame rates. An open-source code example is provided for reproducibility.Finally, a novel interaction and layout method is introduced for improving the scalability of chord diagrams to visualise call transfers. An exploration of sketch-based methods for showing multiple links and direction is made, and a sketch-based brushing technique for filtering is proposed. Feedback from domain experts in the call centre industry is reported for all applications developed

    BRHIM - Base de Registros Hospitalares para Informações e Metadados

    Get PDF
    Os riscos de reidentificação de dados hospitalares são altos e há uma demanda por eles em projetos de desenvolvimento e validação de Inteligência Artificial (IA). Este trabalho aborda os principais métodos de preparação de registros hospitalares usados para realizar estudos observacionais de maneira direcionada de avaliar o risco de reidentificação e o impacto da perda de informações que a anonimização produz nos resultados da IA. Uma revisão sobre o assunto é apresentada no início e após são apresentados dois artigos, sempre considerando o contexto da utilização de registros hospitalares em estudos epidemiológicos. O primeiro artigo propõe uma ontologia de domínio para definir um escopo para a tratar a anonimização. Os tipos de ataques, os tipos de dados e atributos, os modelos de privacidade, os tipos de uso da inteligência artificial e os diferentes delineamentos são apresentados. Foi feito um exemplo de instância da ontologia na ferramenta Web Protegé, disponível pela Universidade de Stanford para a construção de ontologias e que permite replica-la. O segundo artigo visa definir uma receita de preparação de prontuário hospitalar com 5 etapas para implementar a pseudo-anonimização, desidentificação e anonimização de dados e comparar os efeitos dessas etapas em uma aplicação da IA. Para isto, um evento Datathon foi realizado para desenvolver um preditor de IA de mortalidade hospitalar. Comparando os resultados da IA usando os dados originais e os dados anônimos, demonstrando uma diferenca inferior a 1% nos resultados da AUC-ROC, enquanto o risco de um paciente ser identificado foi reduzido em 95%, demonstrando que o preparo pode ser sistematizado agregando privacidade e computando a perda de informações, a fim de torná-los transparentes.The risks of re-identifying hospital data is high and there is a demand for them in projects for the development and validation of Artificial Intelligence (AI). This approach addresses the main methods of preparing hospital records used to carry out observational studies and in a directed way to assess the risk of re-identification and the impact of the loss of information that anonymization produces on AI results. A review of the review on the subject is presented at the beginning and after the literature is presented two articles, always considering the context of the use of hospital records in epidemiological studies. The first article proposes a domain ontology to define a scope for the search for anonymization. The types of attacks, the types of attacks, the types of data and attributes, the privacy models, the types of use that artificial intelligence devices and the different delineations are presented. An example of an ontology instance was made in the Web Protegé tool, made available by Stanford University for building ontologies and which allows replicating pregnant children and thus disseminating anonymization atology. The article aims to define a second hospital record preparation recipe with 5 steps for implementing pseudo-anonymization, de-identification and data anonymization and to compare the effects of these steps in an AI application. A Datathon event was conducted to develop an AI predictor of hospital mortality. Comparing the AI results using the original data and the anonymized data, which were identified as less than 1% results on the AUC-ROC, while the risk of a registered patient was recorded at 95%, demonstrating that the preparation can be systematized with privacy privacy and information loss in order to make them transparent

    Optimizing the Privacy Risk - Utility Framework in Data Publication

    Get PDF

    Actas de las VI Jornadas Nacionales (JNIC2021 LIVE)

    Get PDF
    Estas jornadas se han convertido en un foro de encuentro de los actores más relevantes en el ámbito de la ciberseguridad en España. En ellas, no sólo se presentan algunos de los trabajos científicos punteros en las diversas áreas de ciberseguridad, sino que se presta especial atención a la formación e innovación educativa en materia de ciberseguridad, y también a la conexión con la industria, a través de propuestas de transferencia de tecnología. Tanto es así que, este año se presentan en el Programa de Transferencia algunas modificaciones sobre su funcionamiento y desarrollo que han sido diseñadas con la intención de mejorarlo y hacerlo más valioso para toda la comunidad investigadora en ciberseguridad
    corecore