344 research outputs found

    Human Factors in Secure Software Development

    Get PDF
    While security research has made significant progress in the development of theoretically secure methods, software and algorithms, software still comes with many possible exploits, many of those using the human factor. The human factor is often called ``the weakest link'' in software security. To solve this, human factors research in security and privacy focus on the users of technology and consider their security needs. The research then asks how technology can serve users while minimizing risks and empowering them to retain control over their own data. However, these concepts have to be implemented by developers whose security errors may proliferate to all of their software's users. For example, software that stores data in an insecure way, does not secure network traffic correctly, or otherwise fails to adhere to secure programming best practices puts all of the software's users at risk. It is therefore critical that software developers implement security correctly. However, in addition to security rarely being a primary concern while producing software, developers may also not have extensive awareness, knowledge, training or experience in secure development. A lack of focus on usability in libraries, documentation, and tools that they have to use for security-critical components may exacerbate the problem by blowing up the investment of time and effort needed to "get security right". This dissertation's focus is how to support developers throughout the process of implementing software securely. This research aims to understand developers' use of resources, their mindsets as they develop, and how their background impacts code security outcomes. Qualitative, quantitative and mixed methods were employed online and in the laboratory, and large scale datasets were analyzed to conduct this research. This research found that the information sources developers use can contribute to code (in)security: copying and pasting code from online forums leads to achieving functional code quickly compared to using official documentation resources, but may introduce vulnerable code. We also compared the usability of cryptographic APIs, finding that poor usability, unsafe (possibly obsolete) defaults and unhelpful documentation also lead to insecure code. On the flip side, well-thought out documentation and abstraction levels can help improve an API's usability and may contribute to secure API usage. We found that developer experience can contribute to better security outcomes, and that studying students in lieu of professional developers can produce meaningful insights into developers' experiences with secure programming. We found that there is a multitude of online secure development advice, but that these advice sources are incomplete and may be insufficient for developers to retrieve help, which may cause them to choose un-vetted and potentially insecure resources. This dissertation supports that (a) secure development is subject to human factor challenges and (b) security can be improved by addressing these challenges and supporting developers. The work presented in this dissertation has been seminal in establishing human factors in secure development research within the security and privacy community and has advanced the dialogue about the rigorous use of empirical methods in security and privacy research. In these research projects, we repeatedly found that usability issues of security and privacy mechanisms, development practices, and operation routines are what leads to the majority of security and privacy failures that affect millions of end users

    Dos and Don'ts of Machine Learning in Computer Security

    Get PDF
    With the growing processing power of computing systems and the increasing availability of massive datasets, machine learning algorithms have led to major breakthroughs in many different areas. This development has influenced computer security, spawning a series of work on learning-based security systems, such as for malware detection, vulnerability discovery, and binary code analysis. Despite great potential, machine learning in security is prone to subtle pitfalls that undermine its performance and render learning-based systems potentially unsuitable for security tasks and practical deployment. In this paper, we look at this problem with critical eyes. First, we identify common pitfalls in the design, implementation, and evaluation of learning-based security systems. We conduct a study of 30 papers from top-tier security conferences within the past 10 years, confirming that these pitfalls are widespread in the current security literature. In an empirical analysis, we further demonstrate how individual pitfalls can lead to unrealistic performance and interpretations, obstructing the understanding of the security problem at hand. As a remedy, we propose actionable recommendations to support researchers in avoiding or mitigating the pitfalls where possible. Furthermore, we identify open problems when applying machine learning in security and provide directions for further research.Comment: to appear at USENIX Security Symposium 202

    Challenging Software Developers:Dialectic as a Foundation for Security Assurance Techniques

    Get PDF
    Development teams are increasingly expected to deliver secure code, but how can they best achieve this? Traditional security practice, which emphasises 'telling developers what to do' using checklists, processes and errors to avoid, has proved difficult to introduce. From analysis of industry interviews with a dozen experts in app development security, we find that secure development requires dialectic: a challenging dialog between the developers and a range of counterparties, continued throughout the development cycle. Analysing a further survey of sixteen industry developer security advocates, we identify the six assurance techniques that are most effective at achieving this dialectic in existing development teams, and conclude that the introduction of these techniques is best driven by the developers themselves. Concentrating on these six assurance techniques, and the dialectical interactions they involve, has the potential to increase the security of development activities and thus improve software security for everyone

    The enchanted house:An analysis of the interaction of intelligent personal home assistants (IPHAs) with the private sphere and its legal protection

    Get PDF
    Abstract In less than five years, Alexa has become a familiar presence in many households, and even those who do not own one have stumbled into it, be it at a friend’s house or in the news. Amazon Alexa and its friend Google Assistant represent an evolution of IoT: they have an advanced ‘intelligence’ based on Cloud computing and Machine Learning; they collect data and process them to profile and understand users, and they are placed inside our home. I refer to them as intelligent personal and home assistants, or IPHAs.  This research applies multidisciplinary resources to explore the phenomenon of IPHAs from two perspectives. From a more socio-technical angle, the research reflects upon what happens to the private sphere and the home once IPHAs enter it. To do so, it looks at theories and concepts borrowed from history, behavioural science, STSs, philosophy, and behavioural design. All these disciplines contribute to highlight different attributes that individuals and society associate with the private sphere and the home. When the functioning of IPHAs is mapped against these attributes it is possible to identify where Alexa and Assistant might have an impact: there is a potential conflict between the privacy expectations and norms existing in the home (as sanctuary of the private sphere) and the marketing interests introduced in the home by IPHAs’ profiling. Because of the voice-interaction, IPHAs are also potentially highly persuasive, can influence and manipulate users and affect their autonomy and control in their daily lives. From the legal perspective, the research explores the application of the GDPR and proposal for e-Privacy Regulation to IPHAs, as legislative tools for the protection of the private sphere in horizontal relationships. The analysis focuses in particular on those provisions whose application to IPHAs is more challenging, based on the technology but also on the sociotechnical analysis above. Special attention is dedicated to the consent of users to the processing, the general principles of the GDPR, attributing the role of controllers or processors to the stakeholders involved, profiling and automated decisions, data protection by design and default, as well as spam and robocalls. For some of the issues, suggestions are offered on how to interpret and apply the legal framework, in order to mitigate undesired effects. This is the case, for instance, of determining whether the owners of IPHAs should be considered controllers vis-à-vis the data of their guests, or of the implications of data protection by design and default on the design of IPHAs. Some questions, however, require a wider debate at societal and political level. This is the case of the behavioural design techniques used to entice users and stimulate them to use the vocal assistants, which present high levels of persuasion and can affect the agency and autonomy of individuals. The research brings forward the necessity to determine where the line should be drawn between acceptable practices and unacceptable ones

    Beyond Traditional Software Development: Studying and Supporting the Role of Reusing Crowdsourced Knowledge in Software Development

    Get PDF
    As software development is becoming increasingly complex, developers often need to reuse others’ code or knowledge made available online to tackle problems encountered during software development and maintenance. This phenomenon of using others' code or knowledge, often found on online forums, is referred to as crowdsourcing. A good example of crowdsourcing is posting a coding question on the Stack Overflow website and having others contribute code that solves that question. Recently, the phenomenon of crowdsourcing has attracted much attention from researchers and practitioners and recent studies show that crowdsourcing improves productivity and reduces time-to-market. However, like any solution, crowdsourcing brings with it challenges such as quality, maintenance, and even legal issues. The research presented in this thesis presents the result of a series of large-scale empirical studies involving some of the most popular crowdsourcing platforms such as Stack Overflow, Node Package Manager (npm), and Python Package Index (PyPI). The focus of these empirical studies is to investigate the role of reusing crowdsourcing knowledge and more particularly crowd code in the software development process. We first present two empirical studies on the reuse of knowledge from crowdsourcing platforms namely Stack Overflow. We found that reusing knowledge from this crowdsourcing platform has the potential to assist software development practices, specifically through source code reuse. However, relying on such crowdsourced knowledge might also negatively affect the quality of the software projects. Second, we empirically examine the type of development knowledge constructed on crowdsourcing platforms. We examine the use of trivial packages on npm and PyPI platforms. We found that trivial packages are common and developers tend to use them because they provide them with well tested and implemented code. However, developers are concerned about the maintenance overhead of these trivial packages due to the extra dependencies that trivial packages introduce. Finally, we used the gained knowledge to propose a pragmatic solution to improve the efficiency of relying on the crowd in software development. We proposed a rule-based technique that automatically detects commits that can skip the continuous integration process. We evaluate the performance of the proposed technique on a dataset of open-source Java projects. Our results show that continuous integration can be used to improve the efficiency of the reused code from crowdsourcing platforms. Among the findings of this thesis are that the way software is developed has changed dramatically. Developers rely on crowdsourcing to address problems encountered during software development and maintenance. The results presented in this thesis provides new insights on how knowledge from these crowdsourced platforms is reused in software systems and how some of this knowledge can be better integrated into current software development processes and best practices

    The Example Guru: Suggesting Examples to Novice Programmers in an Artifact-Based Context

    Get PDF
    Programmers in artifact-based contexts could likely benefit from skills that they do not realize exist. We define artifact-based contexts as contexts where programmers have a goal project, like an application or game, which they must figure out how to accomplish and can change along the way. Artifact-based contexts do not have quantifiable goal states, like the solution to a puzzle or the resolution of a bug in task-based contexts. Currently, programmers in artifact-based contexts have to seek out information, but may be unaware of useful information or choose not to seek out new skills. This is especially problematic for young novice programmers in blocks programming environments. Blocks programming environments often lack even minimal in-context support, such as auto-complete or in-context documentation. Novices programming independently in these blocks-based programming environments often plateau in the programming skills and API methods they use. This work aims to encourage novices in artifact-based programming contexts to explore new API methods and skills. One way to support novices may be with examples, as examples are effective for learning and highly available. In order to better understand how to use examples for supporting novice programmers, I first ran two studies exploring novices\u27 use and focus on example code. I used those results to design a system called the Example Guru. The Example Guru suggests example snippets to novice programmers that contain previously unused API methods or code concepts. Finally, I present an approach for semi-automatically generating content for this type of suggestion system. This approach reduces the amount of expert effort required to create suggestions. This work contains three contributions: 1) a better understanding of difficulties novices have using example code, 2) a system that encourages exploration and use of new programming skills, and 3) an approach for generating content for a suggestion system with less expert effort

    Code similarity and clone search in large-scale source code data

    Get PDF
    Software development is tremendously benefited from the Internet by having online code corpora that enable instant sharing of source code and online developer's guides and documentation. Nowadays, duplicated code (i.e., code clones) not only exists within or across software projects but also between online code repositories and websites. We call them "online code clones."' They can lead to license violations, bug propagation, and re-use of outdated code similar to classic code clones between software systems. Unfortunately, they are difficult to locate and fix since the search space in online code corpora is large and no longer confined to a local repository. This thesis presents a combined study of code similarity and online code clones. We empirically show that many code snippets on Stack Overflow are cloned from open source projects. Several of them become outdated or violate their original license and are possibly harmful to reuse. To develop a solution for finding online code clones, we study various code similarity techniques to gain insights into their strengths and weaknesses. A framework, called OCD, for evaluating code similarity and clone search tools is introduced and used to compare 34 state-of-the-art techniques on pervasively modified code and boiler-plate code. We also found that clone detection techniques can be enhanced by compilation and decompilation. Using the knowledge from the comparison of code similarity analysers, we create and evaluate Siamese, a scalable token-based clone search technique via multiple code representations. Our evaluation shows that Siamese scales to large-scale source code data of 365 million lines of code and offers high search precision and recall. Its clone search precision is comparable to seven state-of-the-art clone detection tools on the OCD framework. Finally, we demonstrate the usefulness of Siamese by applying the tool to find online code clones, automatically analyse clone licenses, and recommend tests for reuse

    Using Graph Databases to Address Network Complexity Problems that can Hinder Security Incident Response

    Get PDF
    The network complexity problem within computer security incident response is an issue pertaining to the complexity of a computer network as it grows in both size and scale. The larger the computer network grows, the more difficult reconnaissance becomes, which is necessary to execute correction and prevention measures that address issues that arise during security incident response. Leveraging graph databases can help solve problems present in relational databases with large, tree-like structures, like those present in computer networks, and along with solving those problems adds flexibility that is needed due to the mutability of computer networks. This paper focuses on using graph databases to discover the blast radius of day zero vulnerabilities on the fly by using the properties of graph databases to find intuitive infection vectors that may be present during a day zero vulnerability. Additionally, options for visualizing security data in ways that make the data more actionable will be explored
    corecore