54 research outputs found
Essays on software vulnerability coordination
Software vulnerabilities are software bugs with security implications. Exposure to a security bug makes a software system behave in unexpected ways when the bug is exploited. As software vulnerabilities are thus a classical way to compromise a software system, these have long been coordinated in the global software industry in order to lessen the risks. This dissertation claims that the coordination occurs in a complex and open socio-technical system composed of decentralized software units and heterogeneous software agents, including not only software engineers but also other actors, from security specialists and software testers to attackers with malicious motives. Vulnerability disclosure is a classical example of the associated coordination; a security bug is made known to a software vendor by the discoverer of the bug, a third-party coordinator, or public media. The disclosure is then used to patch the bug. In addition to patching, the bug is typically archived to databases, cataloged and quantified for additional information, and communicated to users with a security advisory. Although commercial solutions have become increasingly important, the underlying coordination system is still governed by multiple stakeholders with vested interests. This governance has continued to result in different inefficiencies. Thus, this dissertation examines four themes: (i) disclosure of software vulnerabilities; (ii) coordination of these; (iii) evolution of these across time; and (iv) automation potential. The philosophical position is rooted in scientific realism and positivism, while regression analysis forms the kernel of the methodology. Based on these themes, the results indicate that (a) when vulnerability disclosure has worked, it has been relatively efficient; the obstacles have been social rather than technical in nature, originating from the diverging interests of the stakeholders who have different incentives. Furthermore, (b) the efficiency applies also to the coordination of different identifiers and classifications for the vulnerabilities disclosed. Longitudinally, (c) also the evolution of software vulnerabilities across time reflect distinct software and vulnerability life cycle models and the incentives underneath. Finally, (d) there is potential to improve the coordination efficiency through software automation
Backup To The Rescue: Automated Forensic Techniques For Advanced Website-Targeting Cyber Attacks
The last decade has seen a significant rise in non-technical users gaining a web presence, often via the easy-to-use functionalities of Content Management Systems (CMS). In fact, over 60% of the world’s websites run on CMSs. Unfortunately, this huge user population has made CMS-based websites a high-profile target for hackers. Worse still, the vast majority of the website hosting industry has shifted to a “backup and restore” model of security, which relies on error-prone AV scanners to prompt non-technical users to roll back to a pre-infection nightly snapshot. My cyber forensics research directly addresses this emergent problem by developing next-generation techniques for the investigation of advanced cyber crimes.
Driven by economic incentives, attackers abuse the trust in this economy: selling malware on legitimate marketplaces, pirating popular website plugins, and infecting websites post-deployment. Furthermore, attackers are exploiting these websites at scale by carelessly dropping thousands of obfuscated and packed malicious files on the webserver. This is counter-intuitive since attackers are assumed to be stealthy. Despite the rise in web attacks, efficiently locating and accurately analyzing the malware dropped on compromised webservers has remained an open research challenge.
This dissertation posits that the already collected webserver nightly backup snapshots contain all required information to enable automated and scalable detection of website compromises. This dissertation presents a web attack forensics framework that leverages program analysis to automatically understand the webserver’s nightly backup snapshots. This will enable the recovery of temporal phases of a webserver compromise and its origin within the website supply chain.Ph.D
A study of EU data protection regulation and appropriate security for digital services and platforms
A law often has more than one purpose, more than one intention, and more than one interpretation. A meticulously formulated and context agnostic law text will still, when faced with a field propelled by intense innovation, eventually become obsolete. The European Data Protection Directive is a good example of such legislation. It may be argued that the technological modifications brought on by the EU General Data Protection Regulation (GDPR) are nominal in comparison to the previous Directive, but from a business perspective the changes are significant and important. The Directive’s lack of direct economic incentive for companies to protect personal data has changed with the Regulation, as companies may now have to pay severe fines for violating the legislation.
The objective of the thesis is to establish the notion of trust as a key design goal for information systems handling personal data. This includes interpreting the EU legislation on data protection and using the interpretation as a foundation for further investigation. This interpretation is connected to the areas of analytics, security, and privacy concerns for intelligent service development. Finally, the centralised platform business model and its challenges is examined, and three main resolution themes for regulating platform privacy are proposed. The aims of the proposed resolutions are to create a more trustful relationship between providers and data subjects, while also improving the conditions for competition and thus providing data subjects with service alternatives.
The thesis contributes new insights into the evolving privacy practices in the digital society at an important time of transition from the service driven business models to the platform business models. Firstly, privacy-related regulation and state of the art analytics development are examined to understand their implications for intelligent services that are based on automated processing and profiling. The ability to choose between providers of intelligent services is identified as the core challenge. Secondly, the thesis examines what is meant by appropriate security for systems that handle personal data, something the GDPR requires that organisations use without however specifying what can be considered appropriate. We propose a method for active network security in web software that is developed through the use of analytics for detection and by inserting data generators into a software installation. The active network security method is proposed as a framework for achieving compliance with the GDPR requirements for services and platforms to use appropriate security. Thirdly, the platform business model is considered from the privacy point of view and the implication of “processing silos” for intelligent services. The centralised platform model is considered problematic from both the data subject and from the competition standpoint. A resolution is offered for enabling user-initiated open data flow to counter the centralised “processing silos”, and thereby to facilitate the introduction of decentralised platforms.
The thesis provides an interdisciplinary analysis considering the legal study (lex lata) and additionally the resolution (lex ferenda) is defined through argumentativist legal dogmatics and (de lege ferenda) of how the legal framework ought to be adapted to fit the described environment. User-friendly Legal Science is applied as a theory framework to provide a holistic approach to answering the research questions. The User-friendly Legal Science theory has its roots in design science and offers a way towards achieving interdisciplinary research in the fields of information systems and legal science
Empirical Notes on the Interaction Between Continuous Kernel Fuzzing and Development
Fuzzing has been studied and applied ever since the 1990s. Automated and
continuous fuzzing has recently been applied also to open source software
projects, including the Linux and BSD kernels. This paper concentrates on the
practical aspects of continuous kernel fuzzing in four open source kernels.
According to the results, there are over 800 unresolved crashes reported for
the four kernels by the syzkaller/syzbot framework. Many of these have been
reported relatively long ago. Interestingly, fuzzing-induced bugs have been
resolved in the BSD kernels more rapidly. Furthermore, assertions and debug
checks, use-after-frees, and general protection faults account for the majority
of bug types in the Linux kernel. About 23% of the fixed bugs in the Linux
kernel have either went through code review or additional testing. Finally,
only code churn provides a weak statistical signal for explaining the
associated bug fixing times in the Linux kernel.Comment: The 4th IEEE International Workshop on Reliability and Security Data
Analysis (RSDA), 2019 IEEE International Symposium on Software Reliability
Engineering Workshops (ISSREW), Berlin, IEE
Recommended from our members
Assessing the security benefits of defence in depth
Most modern computer systems are connected to the Internet. This brings many opportunities for revenue generation via e-commerce and information sharing, but also threats due to the exposure of these systems to malicious adversaries. Therefore, almost all organisations deploy security tools to improve overall detection capabilities. However, all security tools have limitations: they may fail to detect attacks, fail to uncover all vulnerabilities or generate alarms for non-malicious traffic or non-vulnerable code. Using terminology from signalling theory, we can state that security tools suffer from two types of failures: failure to correctly label a malicious event as malicious (False Negatives); and failure to correctly label a non-malicious event as non-malicious (False Positive). These failures may vary from one tool to another, since security tools are diverse in their weaknesses as well as their strengths. Therefore, an obvious design paradigm when deploying these defences is Diversity or Defence in Depth: the expectation is that employing multiple tools increases the chance of detecting malicious behaviour.
This thesis presents research to assess the benefits (or harm) from using diversity. This thesis begins with a literature review on defence in depth, diversity and fault tolerance while identifying areas for further research. This review is followed by the presentation of the overall methodology that we have used to perform the diversity assessment for three types of defence tools namely AntiVirus (AV) products, Intrusion Detection Systems (IDS) and Static Analysis Tools (SAT). The context of this project is inspired by the EPSRC D3S project in the Centre for Software Reliability (CSR) at the City, University of London as well as the previous work on diversity conducted at the same centre, but also elsewhere in the world. This thesis presents the results using the well-known metrics for binary classifiers: Sensitivity and Specificity; and assesses the various forms of adjudication that may be used: 1-out-of-N (1ooN – raise an alarm as long as ANY of the defences do so), N-out-of-N (NooN – raise an alarm only if ALL the defences do so), majority voting (raise an alarm where a MAJORITY of the defences do so) or optimal adjudication (raise an alarm in such a way that it minimises the overall loss to the system from a failure).
The first study compares the detection capabilities of nine different AV products. Additionally, for each vendor, the detection capabilities of the version of the product that is available for free in the VirusTotal platform are compared with the full capability version of that product that is available from the same vendor’s website. Counterintuitively, the free version of AVs from VirusTotal performed better (in most cases) than the commercial versions from the same vendor.
The second study compares the detection capabilities of IDS when deployed in a combined configuration. The functionally diverse combinations are shown to increase the true positive rate significantly while experiencing smaller increases in false positive rate.
The third study analyses the improvements and deteriorations of using diverse SATs to detect web vulnerabilities. The largest improvements in sensitivity, with the least deterioration in specificity was observed with the 1ooN configurations, in NooN configurations there is an improvement in specificity compared with individual systems, and there is a deterioration in sensitivity.
Finally, the benefits of “optimal adjudication” were also investigated: the result shows that the total loss that can result from the two types of failures considered (False Positives and False Negatives) can be significantly reduced with optimal adjudication configurations compared with more conventional methods of adjudication such as 1ooN, NooN or majority voting.
In conclusion, using diverse security protection tools is shown to be beneficial to improving the detection capability of three different families of products and optimal adjudication techniques can help balance the benefits of improved detection while lowering the false positive rates
Practical and Effcient Runtime Taint Tracking
Runtime taint tracking is a technique for controlling data propagation in applications.
It is typically used to prevent disclosure of confidential information or
to avoid application vulnerabilities. Taint tracking systems intercept application
operations at runtime, associate meta-data with the data being processed and
inspect the meta-data to detect unauthorised data propagation. To keep metadata
up-to-date, every attempt of the application to access and process data is
intercepted. To ensure that all data propagation is monitored, different categories
of data (e.g. confidential and public data) are kept isolated.
In practice, the interception of application operations and the isolation of different categories of data are hard to achieve. Existing applications, language
interpreters and operating systems need to be re-engineered while keeping metadata
up-to-date incurs significant overhead at runtime. In this thesis we show
that runtime taint tracking can be implemented with minimal changes to existing
infrastructure and with reduced overhead compared to previous approaches. In
other words, we suggest methods to achieve both practical and efficient runtime
taint tracking.
Our key observation is that applications in specific domains are typically implemented
in high-level languages and use a subset of the available language
features. This facilitates the implementation of a taint tracking system because
it needs to support only parts of a programming language and it may leverage
features of the execution platform. This thesis explores three different applications
domains. We start with event processing applications in Java, for which
we introduce a novel solution to achieve isolation and a practical method to
declare restrictions about data propagation. We then focus on securing PHP
web applications. We show that if taint tracking is restricted to a small part of
an application, the runtime overhead is significantly reduced without sacrificing effectiveness. Finally, we target accidental data disclosure in Ruby web applications.
Ruby emerges as an ideal choice for a practical taint tracking system
because it supports meta-programming facilities that simplify interception and
isolation
A Forensic Web Log Analysis Tool: Techniques and Implementation
Methodologies presently in use to perform forensic analysis of web applications are decidedly
lacking. Although the number of log analysis tools available is exceedingly large, most only employ
simple statistical analysis or rudimentary search capabilities. More precisely these tools were not
designed to be forensically capable. The threat of online assault, the ever growing reliance on the
performance of necessary services conducted online, and the lack of efficient forensic methods in this
area provide a background outlining the need for such a tool. The culmination of study emanating
from this thesis not only presents a forensic log analysis framework, but also outlines an innovative
methodology of analyzing log files based on a concept that uses regular expressions, and a variety
of solutions to problems associated with existing tools. The implementation is designed to detect
critical web application security flaws gleaned from event data contained within the access log files
of the underlying Apache Web Service (AWS).
Of utmost importance to a forensic investigator or incident responder is the generation of an event
timeline preceeding the incident under investigation. Regular expressions power the search capability
of our framework by enabling the detection of a variety of injection-based attacks that represent
significant timeline interactions. The knowledge of the underlying event structure of each access log
entry is essential to efficiently parse log files and determine timeline interactions. Another feature
added to our tool includes the ability to modify, remove, or add regular expressions. This feature
addresses the need for investigators to adapt the environment to include investigation specific queries
along with suggested default signatures. The regular expressions are signature definitions used to
detect attacks toward both applications whose functionality requires a web service and the service
itself. The tool provides a variety of default vulnerability signatures to scan for and outputs resulting
detections
A Common Digital Twin Platform for Education, Training and Collaboration
The world is in transition driven by digitalization; industrial companies and educational institutions are adopting Industry 4.0 and Education 4.0 technologies enabled by digitalization. Furthermore, digitalization and the availability of smart devices and virtual environments have evolved to pro- duce a generation of digital natives. These digital natives whose smart devices have surrounded them since birth have developed a new way to process information; instead of reading literature and writing essays, the digital native generation uses search engines, discussion forums, and on- line video content to study and learn. The evolved learning process of the digital native generation challenges the educational and industrial sectors to create natural training, learning, and collaboration environments for digital natives.
Digitalization provides the tools to overcome the aforementioned challenge; extended reality and digital twins enable high-level user interfaces that are natural for the digital natives and their interaction with physical devices. Simulated training and education environments enable a risk-free way of training safety aspects, programming, and controlling robots. To create a more realistic training environment, digital twins enable interfacing virtual and physical robots to train and learn on real devices utilizing the virtual environment. This thesis proposes a common digital twin platform for education, training, and collaboration. The proposed solution enables the teleoperation of physical robots from distant locations, enabling location and time-independent training and collaboration in robotics.
In addition to teleoperation, the proposed platform supports social communication, video streaming, and resource sharing for efficient collaboration and education. The proposed solution enables research collaboration in robotics by allowing collaborators to utilize each other’s equipment independent of the distance between the physical locations. Sharing of resources saves time and travel costs. Social communication provides the possibility to exchange ideas and discuss research. The students and trainees can utilize the platform to learn new skills in robotic programming, controlling, and safety aspects.
Cybersecurity is considered from the planning phase to the implementation phase. Only cybersecure methods, protocols, services, and components are used to implement the presented platform. Securing the low-level communication layer of the digital twins is essential to secure the safe teleoperation of the robots. Cybersecurity is the key enabler of the proposed platform, and after implementation, periodic vulnerability scans and updates enable maintaining cybersecurity. This thesis discusses solutions and methods for cyber securing an online digital twin platform.
In conclusion, the thesis presents a common digital twin platform for education, training, and collaboration. The presented solution is cybersecure and accessible using mobile devices. The proposed platform, digital twin, and extended reality user interfaces contribute to the transitions to Education 4.0 and Industry 4.0
Blogs as Infrastructure for Scholarly Communication.
This project systematically analyzes digital humanities blogs as an infrastructure for scholarly communication. This exploratory research maps the discourses of a scholarly community to understand the infrastructural dynamics of blogs and the Open Web. The text contents of 106,804 individual blog posts from a corpus of 396 blogs were analyzed using a mix of computational and qualitative methods. Analysis uses an experimental methodology (trace ethnography) combined with unsupervised machine learning (topic modeling), to perform an interpretive analysis at scale. Methodological findings show topic modeling can be integrated with qualitative and interpretive analysis. Special attention must be paid to data fitness, or the shape and re-shaping practices involved with preparing data for machine learning algorithms. Quantitative analysis of computationally generated topics indicates that while the community writes about diverse subject matter, individual scholars focus their attention on only a couple of topics. Four categories of informal scholarly communication emerged from the qualitative analysis: quasi-academic, para-academic, meta-academic, and extra-academic. The quasi and para-academic categories represent discourse with scholarly value within the digital humanities community, but do not necessarily have an obvious path into formal publication and preservation. A conceptual model, the (in)visible college, is introduced for situating scholarly communication on blogs and the Open Web. An (in)visible college is a kind of scholarly communication that is informal, yet visible at scale. This combination of factors opens up a new space for the study of scholarly communities and communication. While (in)invisible colleges are programmatically observable, care must be taken with any effort to count and measure knowledge work in these spaces. This is the first systematic, data driven analysis of the digital humanities and lays the groundwork for subsequent social studies of digital humanities.PhDInformationUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/111592/1/mcburton_1.pd
- …