1,019 research outputs found

    Secured Data Masking Framework and Technique for Preserving Privacy in a Business Intelligence Analytics Platform

    Get PDF
    The main concept behind business intelligence (BI) is how to use integrated data across different business systems within an enterprise to make strategic decisions. It is difficult to map internal and external BI’s users to subsets of the enterprise’s data warehouse (DW), resulting that protecting the privacy of this data while maintaining its utility is a challenging task. Today, such DW systems constitute one of the most serious privacy breach threats that an enterprise might face when many internal users of different security levels have access to BI components. This thesis proposes a data masking framework (iMaskU: Identify, Map, Apply, Sign, Keep testing, Utilize) for a BI platform to protect the data at rest, preserve the data format, and maintain the data utility on-the-fly querying level. A new reversible data masking technique (COntent BAsed Data masking - COBAD) is developed as an implementation of iMaskU. The masking algorithm in COBAD is based on the statistical content of the extracted dataset, so that, the masked data cannot be linked with specific individuals or be re-identified by any means. The strength of the re-identification risk factor for the COBAD technique has been computed using a supercomputer where, three security scheme/attacking methods are considered, a) the brute force attack, needs, on average, 55 years to crack the key of each record; b) the dictionary attack, needs 231 days to crack the same key for the entire extracted dataset (containing 50,000 records), c) a data linkage attack, the re-identification risk is very low when the common linked attributes are used. The performance validation of COBAD masking technique has been conducted. A database schema of 1GB is used in TPC-H decision support benchmark. The performance evaluation for the execution time of the selected TPC-H queries presented that the COBAD speed results are much better than AES128 and 3DES encryption. Theoretical and experimental results show that the proposed solution provides a reasonable trade-off between data security and the utility of re-identified data

    Forecasting day-ahead electricity prices in Europe: the importance of considering market integration

    Full text link
    Motivated by the increasing integration among electricity markets, in this paper we propose two different methods to incorporate market integration in electricity price forecasting and to improve the predictive performance. First, we propose a deep neural network that considers features from connected markets to improve the predictive accuracy in a local market. To measure the importance of these features, we propose a novel feature selection algorithm that, by using Bayesian optimization and functional analysis of variance, evaluates the effect of the features on the algorithm performance. In addition, using market integration, we propose a second model that, by simultaneously predicting prices from two markets, improves the forecasting accuracy even further. As a case study, we consider the electricity market in Belgium and the improvements in forecasting accuracy when using various French electricity features. We show that the two proposed models lead to improvements that are statistically significant. Particularly, due to market integration, the predictive accuracy is improved from 15.7% to 12.5% sMAPE (symmetric mean absolute percentage error). In addition, we show that the proposed feature selection algorithm is able to perform a correct assessment, i.e. to discard the irrelevant features

    FuzzTheREST - Intelligent Automated Blackbox RESTful API Fuzzer

    Get PDF
    In recent years, the pervasive influence of technology has deeply intertwined with human life, impacting diverse fields. This relationship has evolved into a dependency, with software systems playing a pivotal role, necessitating a high level of trust. Today, a substantial portion of software is accessed through Application Programming Interfaces, particularly web APIs, which predominantly adhere to the Representational State Transfer architecture. However, this architectural choice introduces a wide range of potential vulnerabilities, which are available and accessible at a network level. The significance of Software testing becomes evident when considering the widespread use of software in various daily tasks that impact personal safety and security, making the identification and assessment of faulty software of paramount importance. In this thesis, FuzzTheREST, a black-box RESTful API fuzzy testing framework, is introduced with the primary aim of addressing the challenges associated with understanding the context of each system under test and conducting comprehensive automated testing using diverse inputs. Operating from a black-box perspective, this fuzzer leverages Reinforcement Learning to efficiently uncover vulnerabilities in RESTful APIs by optimizing input values and combinations, relying on mutation methods for input exploration. The system's value is further enhanced through the provision of a thoroughly documented vulnerability discovery process for the user. This proposal stands out for its emphasis on explainability and the application of RL to learn the context of each API, thus eliminating the necessity for source code knowledge and expediting the testing process. The developed solution adheres rigorously to software engineering best practices and incorporates a novel Reinforcement Learning algorithm, comprising a customized environment for API Fuzzy Testing and a Multi-table Q-Learning Agent. The quality and applicability of the tool developed are also assessed, relying on the results achieved on two case studies, involving the Petstore API and an Emotion Detection module which was part of the CyberFactory#1 European research project. The results demonstrate the tool's effectiveness in discovering vulnerabilities, having found 7 different vulnerabilities and the agents' ability to learn different API contexts relying on API responses while maintaining reasonable code coverage levels.Ultimamente, a influência da tecnologia espalhou-se pela vida humana de uma forma abrangente, afetando uma grande diversidade dos seus aspetos. Com a evolução tecnológica esta acabou por se tornar uma dependência. Os sistemas de software começam assim a desempenhar um papel crucial, o que em contrapartida obriga a um elevado grau de confiança. Atualmente, uma parte substancial do software é implementada em formato de Web APIs, que na sua maioria seguem a arquitetura de transferência de estado representacional. No entanto, esta introduz uma série vulnerabilidade. A importância dos testes de software torna-se evidente quando consideramos o amplo uso de software em várias tarefas diárias que afetam a segurança, elevando ainda mais a importância da identificação e mitigação de falhas de software. Nesta tese é apresentado o FuzzTheREST, uma framework de teste fuzzy de APIs RESTful num modelo caixa preta, com o objetivo principal de abordar os desafios relacionados com a compreensão do contexto de cada sistema sob teste e a realização de testes automatizados usando uma variedade de possíveis valores. Este fuzzer utiliza aprendizagem por reforço de forma a compreender o contexto da API que está sob teste de forma a guiar a geração de valores de teste, recorrendo a métodos de mutação, para descobrir vulnerabilidades nas mesmas. Todo o processo desempenhado pelo sistema é devidamente documentado para que o utilizador possa tomar ações mediante os resultados obtidos. Esta explicabilidade e aplicação de inteligência artificial para aprender o contexto de cada API, eliminando a necessidade de analisar código fonte e acelerando o processo de testagem, enaltece e distingue a solução proposta de outras. A solução desenvolvida adere estritamente às melhores práticas de engenharia de software e inclui um novo algoritmo de aprendizagem por reforço, que compreende um ambiente personalizado para testagem Fuzzy de APIs e um Agente de QLearning com múltiplas Q-tables. A qualidade e aplicabilidade da ferramenta desenvolvida também são avaliadas com base nos resultados obtidos em dois casos de estudo, que envolvem a conhecida API Petstore e um módulo de Deteção de Emoções que fez parte do projeto de investigação europeu CyberFactory#1. Os resultados demonstram a eficácia da ferramenta na descoberta de vulnerabilidades, tendo identificado 7 vulnerabilidades distintas, e a capacidade dos agentes em aprender diferentes contextos de API com base nas respostas da mesma, mantendo níveis de cobertura aceitáveis

    Learning control policies of driverless vehicles from UAV video streams in complex urban environments

    Get PDF
    © 2019 by the authors. The way we drive, and the transport of today are going through radical changes. Intelligent mobility envisions to improve the e°ciency of traditional transportation through advanced digital technologies, such as robotics, artificial intelligence and Internet of Things. Central to the development of intelligent mobility technology is the emergence of connected autonomous vehicles (CAVs) where vehicles are capable of navigating environments autonomously. For this to be achieved, autonomous vehicles must be safe, trusted by passengers, and other drivers. However, it is practically impossible to train autonomous vehicles with all the possible tra°c conditions that they may encounter. The work in this paper presents an alternative solution of using infrastructure to aid CAVs to learn driving policies, specifically for complex junctions, which require local experience and knowledge to handle. The proposal is to learn safe driving policies through data-driven imitation learning of human-driven vehicles at a junction utilizing data captured from surveillance devices about vehicle movements at the junction. The proposed framework is demonstrated by processing video datasets captured from uncrewed aerial vehicles (UAVs) from three intersections around Europe which contain vehicle trajectories. An imitation learning algorithm based on long short-term memory (LSTM) neural network is proposed to learn and predict safe trajectories of vehicles. The proposed framework can be used for many purposes in intelligent mobility, such as augmenting the intelligent control algorithms in driverless vehicles, benchmarking driver behavior for insurance purposes, and for providing insights to city planning

    Data-driven approach for creating synthetic electronic medical records

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>New algorithms for disease outbreak detection are being developed to take advantage of full electronic medical records (EMRs) that contain a wealth of patient information. However, due to privacy concerns, even anonymized EMRs cannot be shared among researchers, resulting in great difficulty in comparing the effectiveness of these algorithms. To bridge the gap between novel bio-surveillance algorithms operating on full EMRs and the lack of non-identifiable EMR data, a method for generating complete and synthetic EMRs was developed.</p> <p>Methods</p> <p>This paper describes a novel methodology for generating complete synthetic EMRs both for an outbreak illness of interest (tularemia) and for background records. The method developed has three major steps: 1) synthetic patient identity and basic information generation; 2) identification of care patterns that the synthetic patients would receive based on the information present in real EMR data for similar health problems; 3) adaptation of these care patterns to the synthetic patient population.</p> <p>Results</p> <p>We generated EMRs, including visit records, clinical activity, laboratory orders/results and radiology orders/results for 203 synthetic tularemia outbreak patients. Validation of the records by a medical expert revealed problems in 19% of the records; these were subsequently corrected. We also generated background EMRs for over 3000 patients in the 4-11 yr age group. Validation of those records by a medical expert revealed problems in fewer than 3% of these background patient EMRs and the errors were subsequently rectified.</p> <p>Conclusions</p> <p>A data-driven method was developed for generating fully synthetic EMRs. The method is general and can be applied to any data set that has similar data elements (such as laboratory and radiology orders and results, clinical activity, prescription orders). The pilot synthetic outbreak records were for tularemia but our approach may be adapted to other infectious diseases. The pilot synthetic background records were in the 4-11 year old age group. The adaptations that must be made to the algorithms to produce synthetic background EMRs for other age groups are indicated.</p

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Privacy Preserving Sensitive Data Publishing using (k,n,m) Anonymity Approach

    Get PDF
    Open Science movement has enabled extensive knowledge sharing by making research publications, software, data and samples available to the society and researchers. The demand for data sharing is increasing day by day due to the tremendous knowledge hidden in the digital data that is generated by humans and machines. However, data cannot be published as such due to the information leaks that can occur by linking the published data with other publically available datasets or with the help of some background knowledge. Various anonymization techniques have been proposed by researchers for privacy preserving sensitive data publishing. This paper proposes a (k,n,m) anonymity approach for sensitive data publishing by making use of the traditional k-anonymity technique. The selection of quasi identifiers is automated in this approach using graph theoretic algorithms and is further enhanced by choosing similar quasi identifiers based on the derived and composite attributes. The usual method of choosing a single value of ‘k’ is modified in this technique by selecting different values of ‘k’ for the same dataset based on the risk of exposure and sensitivity rank of the sensitive attributes. The proposed anonymity approach can be used for sensitive big data publishing after applying few extension mechanisms. Experimental results show that the proposed technique is practical and can be implemented efficiently on a plethora of datasets
    corecore