290 research outputs found

    Integrated Intrusion Detection in Databases

    Get PDF

    Security Architecture for Tanzania Higher Learning Institutions’ Data Warehouse

    Get PDF
    In this paper we developed security architecture for the higher learning institutions in Tanzania which considers security measures to be taken at different level of the higher learning institutions’ data warehouse architecture. The primary objectives of the study was to identify security requirements of the higher learning institutions data warehouses and then study the existing security systems in and finally develop and architecture based on the requirements extracted from the study. The study was carried at three different universities in Tanzania by carrying out interviews, study of the existing systems in respective institutions and a literature review of the existing data warehouses systems and architectures. The result was the security requirements identified which lead to the development of the security architecture comprising security in source systems, data, and services to be offered by the DW, applications which use DW, networks and other physical infrastructure focusing on security controls like authentication, role-based access control, role separation of privileged users, storage of data, secure transfer of data, protective monitoring/ intrusion detection, penetration testing, trusted/secure endpoints and physical protection. Keywords: Data warehouse, security architecture, higher learning institution

    Enhancing Data Security in Data Warehousing

    Get PDF
    Tese de doutoramento do Programa de Doutoramento em Ciências e Tecnologias da Informação, apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraData Warehouses (DWs) store sensitive data that encloses many business secrets. They have become the most common data source used by analytical tools for producing business intelligence and supporting decision making in most enterprises. This makes them an extremely appealing target for both inside and outside attackers. Given these facts, securing them against data damage and information leakage is critical. This thesis proposes a security framework for integrating data confidentiality solutions and intrusion detection in DWs. Deployed as a middle tier between end user interfaces and the database server, the framework describes how the different solutions should interact with the remaining tiers. To the best of our knowledge, this framework is the first to integrate confidentiality solutions such as data masking and encryption together with intrusion detection in a unique blueprint, providing a broad scope data security architecture. Packaged database encryption solutions are been well-accepted as the best form for protecting data confidentiality while keeping high database performance. However, this thesis demonstrates that they heavily increase storage space and introduce extremely large response time overhead, among other drawbacks. Although their usefulness in their security purpose itself is indisputable, the thesis discusses the issues concerning their feasibility and efficiency in data warehousing environments. This way, solutions specifically tailored for DWs (i.e., that account for the particular characteristics of the data and workloads are capable of delivering better tradeoffs between security and performance than those proposed by standard algorithms and previous research. This thesis proposes a reversible data masking function and a novel encryption algorithm that provide diverse levels of significant security strength while adding small response time and storage space overhead. Both techniques take numerical input and produce numerical output, using data type preservation to minimize storage space overhead, and simply use arithmetical operators mixed with eXclusive OR and modulus operators in their data transformations. The operations used in these data transformations are native to standard SQL, which enables both solutions to use transparent SQL rewriting to mask or encrypt data. Transparently rewriting SQL allows discarding data roundtrips between the database and the encryption/decryption mechanisms, thus avoiding I/O and network bandwidth bottlenecks. Using operations and operators native to standard SQL also enables their full portability to any type of DataBase Management System (DBMS) and/or DW. Experimental evaluation demonstrates the proposed techniques outperform standard and state-of-the-art research algorithms while providing substantial security strength. From an intrusion detection view, most Database Intrusion Detection Systems (DIDS) rely on command-syntax analysis to compute data access patterns and dependencies for building user profiles that represent what they consider as typical user activity. However, the considerable ad hoc nature of DW user workloads makes it extremely difficult to distinguish between normal and abnormal user behavior, generating huge amounts of alerts that mostly turn out to be false alarms. Most DIDS also lack assessing the damage intrusions might cause, while many allow various intrusions to pass undetected or only inspect user actions a posteriori to their execution, which jeopardizes intrusion damage containment. This thesis proposes a DIDS specifically tailored for DWs, integrating a real-time intrusion detector and response manager at the SQL command level that acts transparently as an extension of the database server. User profiles and intrusion detection processes rely on analyzing several distinct aspects of typical DW workloads: the user command, processed data and results from processing the command. An SQL-like rule set extends data access control and statistical models are built for each feature to obtain individual user profiles, using statistical tests for intrusion detection. A self-calibration formula computes the contribution of each feature in the overall intrusion detection process. A risk exposure method is used for alert management, which is proven more efficient in damage containment than using alert correlation techniques to deal with the generation of high amounts of alerts. Experiments demonstrate the overall efficiency of the proposed DIDS.As Data Warehouses (DWs) armazenam dados sensíveis que muitas vezes encerram os segredos do negócio. São actualmente a forma mais utilizada por parte de ferramentas analíticas para produzir inteligência de negócio e proporcionar apoio à tomada de decisão em muitas empresas. Isto torna as DWs um alvo extremamente apetecível por parte de atacantes internos e externos à própria empresa. Devido a estes factos, assegurar que o seu conteúdo é devidamente protegido contra danos que possam ser causados nos dados, ou o roubo e utilização ou divulgação desses dados, é de uma importância crítica. Nesta tese, é apresentada uma framework de segurança que possibilita a integração conjunta das soluções de confidencialidade de dados e detecção de intrusões em DWs. Esta integração conjunta de soluções é definida na framework como uma camada intermédia entre os interfaces dos utilizadores e o servidor de base de dados, descrevendo como as diferentes soluções interagem com os restantes pares. Consideramos esta framework como a primeira do género que combina tipos distintos de soluções de confidencialidade, como mascaragem e encriptação de dados com detecção de intrusões, numa única arquitectura integrada, promovendo uma solução de segurança de dados transversal e de grande abrangência. A utilização de pacotes de soluções de encriptação incluídos em servidores de bases de dados tem sido considerada como a melhor forma de proteger a confidencialidade de dados sensíveis e conseguir ao mesmo tempo manter um nível elevado de desempenho nas bases de dados. Contudo, esta tese demonstra que a utilização de encriptação resulta tipicamente num aumento extremamente considerável do espaço de armazenamento de dados e no tempo de processamento e resposta dos comandos SQL, entre outras desvantagens ou aspectos negativos relativos ao seu desempenho. Apesar da sua utilidade indiscutível no cumprimento dos pressupostos em termos de segurança propriamente ditos, nesta tese discutimos os problemas inerentes que dizem respeito à sua aplicabilidade, eficiência e viabilidade em ambientes de data warehousing. Argumentamos que soluções especificamente concebidas para DWs, que tenham em conta as características particulares dos seus dados e as actividades típicas dos seus utilizadores, são capazes de produzir um melhor equilíbrio entre segurança e desempenho do que as soluções previamente disponibilizadas por algoritmos standard e outros trabalhos de investigação para bases de dados na sua generalidade. Nesta tese, propomos uma função reversível de mascaragem de dados e um novo algoritmo de encriptação, que providenciam diversos níveis de segurança consideráveis, ao mesmo tempo que adicionam pequenos aumentos de espaço de armazenamento e tempo de processamento. Ambas as técnicas recebem dados numéricos de entrada e produzem dados numéricos de saída, usam preservação do tipo de dados para minimizar o aumento do espaço de armazenamento, e simplesmente utilizam combinações de operadores aritméticos conjuntamente com OU exclusivos (XOR) e restos de divisão (MOD) nas operações de transformação de dados. Como este tipo de operações se conseguem realizar recorrendo a comandos nativos de SQL, isto permite a ambas as soluções utilizar de forma transparente a reescrita de comandos SQL para mascarar e encriptar dados. Este manuseamento transparente de comandos SQL permite requerer a execução desses mesmos comandos ao Sistema de Gestão de Base de Dados (SGBD) sem que os dados tenham de ser transportados entre a base de dados e os mecanismos de mascaragem/desmascaragem e encriptação/ decriptação, evitando assim o congestionamento em termos de I/O e rede. A utilização de operações e operadores nativos ao SQL também permite a sua portabilidade para qualquer tipo de SGBD e/ou DW. As avaliações experimentais demonstram que as técnicas propostas obtêm um desempenho significativamente superior ao obtido por algoritmos standard e outros propostos pelo estado da arte da investigação nestes domínios, enquanto providenciam um nível de segurança considerável. Numa perspectiva de detecção de intrusões, a maioria dos Sistemas de Detecção de Intrusões em Bases de Dados (SDIBD) utilizam formas de análise de sintaxe de comandos para determinar padrões de acesso e dependências que determinam os perfis que consideram representativos da actividade típica dos utilizadores. Contudo, a carga considerável de natureza ad hoc existente em muitas acções por parte dos utilizadores de DWs gera frequentemente um número avassalador de alertas que, na sua maioria, se revelam falsos alarmes. Muitos SDIBD também não fazem qualquer tipo de avaliação aos potenciais danos que as intrusões podem causar, enquanto muitos outros permitem que várias intrusões passem indetectadas ou apenas inspeccionam as acções dos utilizadores após essas acções terem completado a sua execução, o que coloca em causa a possível contenção e/ou reparação de danos causados. Nesta tese, propomos um SDIBD especificamente concebido para DWs, integrando um detector de intrusões em tempo real, com capacidade de parar ou impedir a execução da acção do utilizador, e que funciona de forma transparente como uma extensão do SGBD. Os perfis dos utilizadores e os processos de detecção de intrusões recorrem à análise de diversos aspectos distintos característicos da actividade típica de utilizadores de DWs: o comando SQL emitido, os dados processados, e os dados resultantes desse processamento. Um conjunto de regras tipo SQL estende o alcance das políticas de controlo de acesso a dados, e modelos estatísticos são construídos baseados em cada variável relevante à determinação dos perfis dos utilizadores, sendo utilizados testes estatísticos para analisar as acções dos utilizadores e detectar possíveis intrusões. Também é descrito um método de calibragem automatizado da contribuição de cada uma dessas variáveis no processo global de detecção de intrusões, com base na eficiência que vão apresentando ao longo do tempo nesse mesmo processo. Um método de exposição de risco é definido para fazer a gestão de alertas, que é mais eficiente do que as técnicas de correlação habitualmente utilizadas para este fim, de modo a lidar com a geração de quantidades elevadas de alertas. As avaliações experimentais incluídas nesta tese demonstram a eficiência do SDIBD proposto

    Diverse intrusion-tolerant database replication

    Get PDF
    Tese de mestrado em Segurança Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2012A combinação da replicação de bases de dados com mecanismos de tolerância a falhas bizantinas ainda é um campo de pesquisa recente com projetos a surgirem nestes últimos anos. No entanto, a maioria dos protótipos desenvolvidos ou se focam em problemas muito específicos, ou são baseados em suposições que são muito difíceis de garantir numa situação do mundo real, como por exemplo ter um componente confiável. Nesta tese apresentamos DivDB, um sistema de replicação de bases de dados diverso e tolerante a intrusões. O sistema está desenhado para ser incorporado dentro de um driver JDBC, o qual irá abstrair o utilizador de qualquer complexidade adicional dos mecanismos de tolerância a falhas bizantinas. O DivDB baseia-se na combinação de máquinas de estados replicadas com um algoritmo de processamento de transações, a fim de melhorar o seu desempenho. Para além disso, no DivDB é possível ligar cada réplica a um sistema de gestão de base de dados diferente, proporcionando assim diversidade ao sistema. Propusemos, resolvemos e implementamos três problemas em aberto, existentes na conceção de um sistema de gestão de base de dados replicado: autenticação, processamento de transações e transferência de estado. Estas características torna o DivDB exclusivo, pois é o único sistema que compreende essas três funcionalidades implementadas num sistema de base de dados replicado. A nossa implementação é suficientemente robusta para funcionar de forma segura num simples sistema de processamento de transações online. Para testar isso, utilizou-se o TPC-C, uma ferramenta de benchmarking que simula esse tipo de ambientes.The combination of database replication with Byzantine fault tolerance mechanism is a recent field of research with projects appearing in the last few years. However most of the prototypes produced are either focused on very specific problems or are based on assumptions that are very hard to accomplish in a real world scenario (e.g., trusted component). In this thesis we present DivDB, a Diverse Intrusion-Tolerant Database Replication system. It is designed to be incorporated inside a JDBC driver so that it abstracts the user from any added complexity from Byzantine Fault Tolerance mechanism. DivDB is based in State Machine Replication combined with a transaction handling algorithm in order to enhance its performance. DivDB is also able to have different database systems connected at each replica, enabling to achieve diversity. We proposed, solved and implemented three open problems in the design of a replicated database system: authentication, transaction handling and state-transfer. This makes DivDB unique since it is the only system that comprises all these three features in a single database replication system. Our implementation is robust enough to operate reliably in a simple Online Transaction Processing system. To test that, we used TPC-C, a benchmark tool that simulates that kind of environments

    Pragmatic development of service based real-time change data capture

    Get PDF
    This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources

    Data Warehouse Technology and Application in Data Centre Design for E-government

    Get PDF

    Pragmatic development of service based real-time change data capture

    Get PDF
    This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Squential Step Towards Pattern Warehousing

    Get PDF
    With the massive increase in the data, the demand by the analysts hyped for the proper repositories where they could analyse the concerned specific data patterns in order to make smart and quick decisions for the welfare and benefit of the business, organization or some social work. Pattern warehouse proved to be the best solution. This paper focuses on the discussion of existing architecture and moreover on the algorithms that is needed for retrieving the optimal patterns from the pattern warehouse. It also includes the detailed study about the sequential emergence of association rule algorithms which initially derive out patterns and later on those patterns are being optimized according to the interest of the analyst

    On the detection of privacy and security anomalies

    Get PDF
    Data analytics over generated personal data has the potential to derive meaningful insights to enable clarity of trends and predictions, for instance, disease outbreak prediction as well as it allows for data-driven decision making for contemporary organisations. Predominantly, the collected personal data is managed, stored, and accessed using a Database Management System (DBMS) by insiders as employees of an organisation. One of the data security and privacy concerns is of insider threats, where legitimate users of the system abuse the access privileges they hold. Insider threats come in two flavours; one is an insider threat to data security (security attacks), and the other is an insider threat to data privacy (privacy attacks). The insider threat to data security means that an insider steals or leaks sensitive personal information. The insider threat to data privacy is when the insider maliciously access information resulting in the violation of an individual’s privacy, for instance, browsing through customers bank account balances or attempting to narrow down to re-identify an individual who has the highest salary. Much past work has been done on detecting security attacks by insiders using behavioural-based anomaly detection approaches. This dissertation looks at to what extent these kinds of techniques can be used to detect privacy attacks by insiders. The dissertation proposes approaches for modelling insider querying behaviour by considering sequence and frequency-based correlations in order to identify anomalous correlations between SQL queries in the querying behaviour of a malicious insider. A behavioural-based anomaly detection using an n-gram based approach is proposed that considers sequences of SQL queries to model querying behaviour. The results demonstrate the effectiveness of detecting malicious insiders accesses to the DBMS as anomalies, based on query correlations. This dissertation looks at the modelling of normative behaviour from a DBMS perspective and proposes a record/DBMS-oriented approach by considering frequency-based correlations to detect potentially malicious insiders accesses as anomalies. Additionally, the dissertation investigates modelling of malicious insider SQL querying behaviour as rare behaviour by considering sequence and frequency-based correlations using (frequent and rare) item-sets mining. This dissertation proposes the notion of ‘Privacy-Anomaly Detection’ and considers the question whether behavioural-based anomaly detection approaches can have a privacy semantic interpretation and whether the detected anomalies can be related to the conventional (formal) definitions of privacy semantics such as k-anonymity and the discrimination rate privacy metric. The dissertation considers privacy attacks (violations of formal privacy definition) based on a sequence of SQL queries (query correlations). It is shown that interactive querying settings are vulnerable to privacy attacks based on query correlation. Whether these types of privacy attacks can potentially manifest themselves as anomalies, specifically as privacy-anomalies, is investigated. One result is that privacy attacks (violation of formal privacy definition) can be detected as privacy-anomalies by applying behavioural-based anomaly detection using n-gram over the logs of interactive querying mechanisms

    A Systematic Review of Data Quality in CPS and IoT for Industry 4.0

    Get PDF
    The Internet of Things (IoT) and Cyber-Physical Systems (CPS) are the backbones of Industry 4.0, where data quality is crucial for decision support. Data quality in these systems can deteriorate due to sensor failures or uncertain operating environments. Our objective is to summarize and assess the research efforts that address data quality in data-centric CPS/IoT industrial applications. We systematically review the state-of-the-art data quality techniques for CPS and IoT in Industry 4.0 through a systematic literature review (SLR) study. We pose three research questions, define selection and exclusion criteria for primary studies, and extract and synthesize data from these studies to answer our research questions. Our most significant results are (i) the list of data quality issues, their sources, and application domains, (ii) the best practices and metrics for managing data quality, (iii) the software engineering solutions employed to manage data quality, and (iv) the state of the data quality techniques (data repair, cleaning, and monitoring) in the application domains. The results of our SLR can help researchers obtain an overview of existing data quality issues, techniques, metrics, and best practices. We suggest research directions that require attention from the research community for follow-up work.acceptedVersio
    corecore