Security and Data Analysis : Three Case Studies

Abstract

In recent years, techniques to automatically analyze lots of data have advanced significantly. The possibility to gather and analyze large amounts of data has challenged security research in two unique ways. First, the analysis of Big Data can threaten users’ privacy by merging and connecting data from different sources. Chapter 2 studies how patients’ medical data can be protected in a world where Big Data techniques can be used to easily analyze large amounts of DNA data. Second, Big Data techniques can be used to improve the security of software systems. In Chapter 4 I analyzed data gathered from internet-wide certificate scans to make recommendations on which certificate authorities can be removed from trust stores. In Chapter 5 I analyzed open source repositories to make predicitions of which commits introduced security-critical bugs. In total, I present three case studies that explore the application of data analysis – “Big Data” – to system security. By considering not just isolated examples but whole ecosystems, the insights become much more solid, and the results and recommendations become much stronger. Instead of manually analyzing a couple of mobile apps, we have the ability to consider a security-critical mistake in all applications of a given platform. We can identify systemic errors all developers of a given platform, a given programming language or a given security paradigm make – and fix it with the certainty that we truly found the core of the problem. Instead of manually analyzing the SSL installation of a couple of websites, we can consider all certificates – in times of Certificate Transparency even with historical data of issued certificates – and make conclusions based on the whole ecosystem. We can identify rogue certificate authorities as well as monitor the deployment of new TLS versions and features and make recommendations based on those. And instead of manually analyzing open source code bases for vulnerabilities, we can apply the same techniques and again consider all projects on e.g. GitHub. Then, instead of just fixing one vulnerability after the other, we can use these insights to develop better tooling, easier-to-use security APIs and safer programming languages

    Similar works