19 research outputs found

    Practical Detection of Entropy Loss in Pseudo-Random Number Generators : Extended Version

    Get PDF
    Pseudo-random number generators (PRNGs) are a critical infrastructure for cryptography and security of many computer applications. At the same time, PRNGs are surprisingly difficult to design, implement, and debug. This paper presents the first static analysis technique specifically for quality assurance of cryptographic PRNG implementations. The analysis targets a particular kind of implementation defect, the entropy loss. Entropy loss occurs when the entropy contained in the PRNG seed is not utilized to the full extent for generating the pseudo-random output stream. The Debian OpenSSL disaster, probably the most prominent PRNG-related security incident, was one but not the only manifestation of such a defect. Together with the static analysis technique, we present its implementation, a tool named Entroposcope. The tool offers a high degree of automation and practicality. We have applied the tool to five real-world PRNGs of different designs and show that it effectively detects both known and previously unknown instances of entropy loss

    StringENT test suite: ENT battery revisited for efficient P value computation

    Get PDF
    Random numbers play a key role in a wide variety of applications, ranging from mathematical simulation to cryptography. Generating random or pseudo-random numbers is not an easy task, especially when hardware, time and energy constraints are considered. In order to assess whether generators behave in a random fashion, there are several statistical test batteries. ENT is one of the simplest and most popular, at least in part due to its efficacy and speed. Nonetheless, only one of the tests of this suite provides a p value, which is the most useful and standard way to determine whether the randomness hypothesis holds, for a certain significance level. As a consequence of this, rather arbitrary and at times misleading bounds are set in order to decide which intervals are acceptable for its results. This paper introduces an extension of the battery, named StringENT, which, while sticking to the fast speed that makes ENT popular and useful, still succeeds in providing p values with which sound decisions can be made about the randomness of a sequence. It also highlights a flagrant randomness flaw that the classical ENT battery is not capable of detecting but the new StringENT notices, and introduces two additional tests.Depto. de Estadística e Investigación OperativaFac. de Ciencias MatemáticasTRUEpu

    StringENT test suite: ENT battery revisited for efficient P value computation

    Get PDF
    Random numbers play a key role in a wide variety of applications, ranging from mathematical simulation to cryptography. Generating random or pseudo-random numbers is not an easy task, especially when hardware, time and energy constraints are considered. In order to assess whether generators behave in a random fashion, there are several statistical test batteries. ENT is one of the simplest and most popular, at least in part due to its efficacy and speed. Nonetheless, only one of the tests of this suite provides a p value, which is the most useful and standard way to determine whether the randomness hypothesis holds, for a certain significance level. As a consequence of this, rather arbitrary and at times misleading bounds are set in order to decide which intervals are acceptable for its results. This paper introduces an extension of the battery, named StringENT, which, while sticking to the fast speed that makes ENT popular and useful, still succeeds in providing p values with which sound decisions can be made about the randomness of a sequence. It also highlights a flagrant randomness flaw that the classical ENT battery is not capable of detecting but the new StringENT notices, and introduces two additional tests

    Understanding and protecting closed-source systems through dynamic analysis

    Get PDF
    In this dissertation, we focus on dynamic analyses that examine the data handled by programs and operating systems in order to divine the undocumented constraints and implementation details that determine their behavior in the field. First, we introduce a novel technique for uncovering the constraints actually used in OS kernels to decide whether a given instance of a kernel data structure is valid. Next, we tackle the semantic gap problem in virtual machine security: we present a pair of systems that allow, on the one hand, automatic extraction of whole-system algorithms for collecting information about a running system, and, on the other, the rapid identification of “hook points” within a system or program where security tools can interpose to be notified of security-relevant events. Finally, we present and evaluate a new dynamic measure of code similarity that examines the content of the data handled by the code, rather than the syntactic structure of the code itself. This problem has implications both for understanding the capabilities of novel malware as well as understanding large binary code bases such as operating system kernels.Ph.D

    Security and Data Analysis : Three Case Studies

    Get PDF
    In recent years, techniques to automatically analyze lots of data have advanced significantly. The possibility to gather and analyze large amounts of data has challenged security research in two unique ways. First, the analysis of Big Data can threaten users’ privacy by merging and connecting data from different sources. Chapter 2 studies how patients’ medical data can be protected in a world where Big Data techniques can be used to easily analyze large amounts of DNA data. Second, Big Data techniques can be used to improve the security of software systems. In Chapter 4 I analyzed data gathered from internet-wide certificate scans to make recommendations on which certificate authorities can be removed from trust stores. In Chapter 5 I analyzed open source repositories to make predicitions of which commits introduced security-critical bugs. In total, I present three case studies that explore the application of data analysis – “Big Data” – to system security. By considering not just isolated examples but whole ecosystems, the insights become much more solid, and the results and recommendations become much stronger. Instead of manually analyzing a couple of mobile apps, we have the ability to consider a security-critical mistake in all applications of a given platform. We can identify systemic errors all developers of a given platform, a given programming language or a given security paradigm make – and fix it with the certainty that we truly found the core of the problem. Instead of manually analyzing the SSL installation of a couple of websites, we can consider all certificates – in times of Certificate Transparency even with historical data of issued certificates – and make conclusions based on the whole ecosystem. We can identify rogue certificate authorities as well as monitor the deployment of new TLS versions and features and make recommendations based on those. And instead of manually analyzing open source code bases for vulnerabilities, we can apply the same techniques and again consider all projects on e.g. GitHub. Then, instead of just fixing one vulnerability after the other, we can use these insights to develop better tooling, easier-to-use security APIs and safer programming languages

    New Classes of Binary Random Sequences for Cryptography

    Get PDF
    In the vision for the 5G wireless communications advancement that yield new security prerequisites and challenges we propose a catalog of three new classes of pseudorandom random sequence generators. This dissertation starts with a review on the requirements of 5G wireless networking systems and the most recent development of the wireless security services applied to 5G, such as private-keys generation, key protection, and flexible authentication. This dissertation proposes new complexity theory-based, number-theoretic approaches to generate lightweight pseudorandom sequences, which protect the private information using spread spectrum techniques. For the class of new pseudorandom sequences, we obtain the generalization. Authentication issues of communicating parties in the basic model of Piggy Bank cryptography is considered and a flexible authentication using a certified authority is proposed

    Automated tailoring of system software stacks

    Get PDF
    In many industrial sectors, device manufacturers are moving away from expensive special-purpose hardware units and consolidate their systems on commodity hardware. As part of this change, developers are enabled to run their applications on general-purpose operating systems like Linux, which already supports thousands of different devices out of the box and can be used in a wide range of target scenarios. Furthermore, the Linux ecosystem allows them to integrate existing implementations of standard functionality in the form of shared libraries. However, as the libraries and the Linux kernel are designed as generic building blocks in order to support as many applications as possible, they cannot make assumptions about specific use cases for a single-purpose device. This generality leads to unnecessary overheads in narrowly defined target scenarios, as unneeded components do not only take up space on the target system but have to be maintained over the lifetime of the device as well. While the Linux kernel provides a configuration system to disable unneeded functionality like device drivers, determining the required features from over 16000 options is an infeasible task. Even worse, most shared libraries cannot be customized even though only around 10 percent of their functions are ever used by applications. In this thesis, I present my approaches for the automated identification and removal of unnecessary components in all layers of the software stack. As the configuration system is an integral part of the Linux kernel, we embrace its presence and automatically generate custom-fitted configurations for observed target scenarios with the help of an extracted variability model. For the much more diverse realm of shared libraries, with different programming languages, build systems, and a lack of configurability, I demonstrate a different approach. By identifying individual functions as logically distinct units, we construct a symbol-level dependency graph across the applications and all their required libraries. We then remove unneeded code at the binary level and rearrange the remaining parts to take up minimal space in the binary file by formulating their placement as an optimization problem. To lower the number of unnecessary updates to unused components in a deployed system, I lastly present an automated method to determine the impact of software changes on a target scenario and provide guidance for developers on whether they need to update their systems. Applying these techniques to different target systems, I demonstrate that we can disable up to 87 percent of configuration options in a Debian Linux kernel, shrink the size of an embedded OpenWrt kernel by 59 percent, and speed up the boot process of the embedded system by 21 percent. As part of the shared library tailoring process, we can remove 13060 functions from all libraries in OpenWrt and reduce their total size by 31 percent. In the memcached Docker container, we identify 381 entirely unneeded shared libraries and shrink the container image size by 82 percent. An analysis of the development history of two large library projects over the course of more than two years further shows that between 68 and 82 percent of all changes are not required for an OpenWrt appliance, reducing the number of patch days by up to 69 percent. These results demonstrate the broad applicability of our automated methods for both the Linux kernel and shared libraries to a wide range of scenarios. From embedded systems to server applications, custom-tailored system software stacks contribute to the reduction of overheads in space and time

    Machine Learning and Probabilistic Methods for Network Security Assessment

    Get PDF
    Ph. D. ThesisComputer networks comprised of many hosts are vulnerable to cyber attacks. One attack can take the form of the exploitation of multiple vulnerabilities in the network along with lateral movement between hosts. In order to analyse the security of a network, it is common practice to run a vulnerability scan to report the presence of vulnerabilities in the network and prioritise them with an importance score. The scoring mechanism used primarily in the literature and in industry ignores how multiple vulnerabilities could be used in conjunction with one another to achieve a goal that previously was not possible. Attack graphs are a common solution to this problem, where a scan along with the topology of the network is turned into a graph that models how hosts and vulnerabilities can be connected. For a large network these attack graphs can be thousands of nodes in size, so in order to gain insight from them in an automated way, they can be turned into Bayesian attack graphs (BAGs) to model the security of the network probabilistically. The aim of this thesis is to work towards the automation of gathering insight from vulnerability scans of a network, primarily through the generation of BAGs. The main contributions of this thesis are as follows: 1. Creation of a unified formalism for the structure of BAGs and how other graphs can be translated into this formalism. 2. Classification of vulnerabilities using neural networks. 3. Design and evaluation of a novel technique for approximation in the computation of access probabilities in BAGs (referred to in the literature as the static analysis of BAGs) with no requirement for the base graph to be acyclic. 4. Implementation and comparison of three stochastic simulation techniques for inference on BAGs with evidence (referred to in the literature as the dynamic analysis of BAGs), enabling security measure evaluation and sensitivity analysis. 5. Demonstration of a sensitivity analysis for BAG priors and a novel method for quick computation of sensitivities that is more readily analysed than the traditional technique. 6. Development and demonstration of a fully containerised pipeline to automatically process vulnerability scans and generate the corresponding attack graph. With a single formalism for attack graphs, alongside an open-source attack graph generation pipeline, our work serves to enable future progress and collaboration in the field of processing vulnerability scans using attack graphs by simplifying the process of generating the graphs and having a mathematical basis for their evaluation. We design, implement, and evaluate various techniques for calculations on BAGs. For the process of computation of access probabilities we provide an algorithm that requires no processing or trimming of the initial graph, and for inference on BAGs we recommend likelihood weighting as the best performing sampling technique of the three we implement. We also show how inference techniques can be applied to sensitivity analysis on BAGs, and provide a new method that allows for more efficient and interpretable sensitivity analysis, enabling more productive research into the area in future. This research was originally undertaken in collaboration with XQ Cyber.EPSR

    (Fase 2) bajo minería de datos. Caso de estudio: MOTOS ELECTROMUEBLES- Departamento de Arauca

    Get PDF
    Motos Electromuebles es una empresa con 6 años de experiencia en el mercado Araucano, su actividad comercial consta de venta de repuestos para motos Honda y Hero, servicio de mantenimiento para motos; actualmente cuentan con cuatro sedes en el Departamento de Arauca: Arauca, Tame, Arauquita y Saravena. La empresa cuenta con un sistema transaccional web que almacena todos los registros de las diferentes dependencias. La siguiente propuesta presenta un diseño y desarrollo de un módulo CRM (Customer Relationship Management) adaptado a la empresa Motos Electromuebles de Arauca, aplicando un modelo de minería de datos. La minería de datos es un mecanismo para explorar grandes cantidades de datos y convertirlo en información, para este caso se utiliza para encaminar los datos que tiene la empresa de clientes, organizarla y lograr obtener la información que se solicita. Un CRM es una herramienta comercial y de marketing importante para cualquier empresa, se centra en la relación empresa - cliente. Es el pilar donde se centra la fidelización del cliente y se aplicará las acciones de mercadeo. En el estado del arte se consigna una definición precisa y clara de los dos conceptos ya que se requiere tener la definición clara para generar el análisis y el módulo con las estrategias de marketing. La base tecnológica para el desarrollo de la propuesta es el gestor de base de datos MYSQL y un lenguaje de programación PHP, se plantea la idea de desarrollo adecuado para esta empresa. Además se realiza una toma de requerimientos que se deberá utilizar para hacer la clasificación de los roles de acceso, manejo y clasificación de los clientes para que la empresa haga la toma de decisiones y ayudar a la gestión de ventas.Motos Electromuebles es una empresa con 6 años de experiencia en el mercado Araucano, su actividad comercial consta de venta de repuestos para motos Honda y Hero, servicio de mantenimiento para motos; actualmente cuentan con cuatro sedes en el Departamento de Arauca: Arauca, Tame, Arauquita y Saravena. La empresa cuenta con un sistema transaccional web que almacena todos los registros de las diferentes dependencias. La siguiente propuesta presenta un diseño y desarrollo de un módulo CRM (Customer Relationship Management) adaptado a la empresa Motos Electromuebles de Arauca, aplicando un modelo de minería de datos. La minería de datos es un mecanismo para explorar grandes cantidades de datos y convertirlo en información, para este caso se utiliza para encaminar los datos que tiene la empresa de clientes, organizarla y lograr obtener la información que se solicita. Un CRM es una herramienta comercial y de marketing importante para cualquier empresa, se centra en la relación empresa - cliente. Es el pilar donde se centra la fidelización del cliente y se aplicará las acciones de mercadeo. En el estado del arte se consigna una definición precisa y clara de los dos conceptos ya que se requiere tener la definición clara para generar el análisis y el módulo con las estrategias de marketing. La base tecnológica para el desarrollo de la propuesta es el gestor de base de datos MYSQL y un lenguaje de programación PHP, se plantea la idea de desarrollo adecuado para esta empresa. Además se realiza una toma de requerimientos que se deberá utilizar para hacer la clasificación de los roles de acceso, manejo y clasificación de los clientes para que la empresa haga la toma de decisiones y ayudar a la gestión de ventas
    corecore