11 research outputs found

    Taxonomy of Attacks on Open-Source Software Supply Chains

    Full text link
    The widespread dependency on open-source software makes it a fruitful target for malicious actors, as demonstrated by recurring attacks. The complexity of today's open-source supply chains results in a significant attack surface, giving attackers numerous opportunities to reach the goal of injecting malicious code into open-source artifacts that is then downloaded and executed by victims. This work proposes a general taxonomy for attacks on open-source supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution. Taking the form of an attack tree, it covers 107 unique vectors, linked to 94 real-world incidents, and mapped to 33 mitigating safeguards. User surveys conducted with 17 domain experts and 134 software developers positively validated the correctness, comprehensiveness and comprehensibility of the taxonomy, as well as its suitability for various use-cases. Survey participants also assessed the utility and costs of the identified safeguards, and whether they are used

    A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits

    Full text link
    In order to understand the state and evolution of the entirety of open source software we need to get a handle on the set of distinct software projects. Most of open source projects presently utilize Git, which is a distributed version control system allowing easy creation of clones and resulting in numerous repositories that are almost entirely based on some parent repository from which they were cloned. Git commits are based on Merkle Tree and two commits are highly unlikely to be produced independently. Shared commits, therefore, appear like an excellent way to group cloned repositories and obtain an accurate map for such repositories. We use World of Code infrastructure containing approximately 2B commits and 100M repositories to create and share such a map. We discover that the largest group contains almost 14M repositories most of which are unrelated to each other. As it turns out, the developers can push git object to an arbitrary repository or pull objects from unrelated repositories, thus linking unrelated repositories. To address this, we apply Louvain community detection algorithm to this very large graph consisting of links between commits and projects. The approach successfully reduces the size of the megacluster with the largest group of highly interconnected projects containing under 100K repositories. We expect the tools that the resulting map of related projects as well as tools and methods to handle the very large graph will serve as a reference set for mining software projects and other applications. Further work is needed to determine different types of relationships among projects induced by shared commits and other relationships, for example, by shared source code or similar filenames.Comment: 5 page

    Risk Mitigation in Corporate Participation with Open Source Communities: Protection and Compliance in an Open Source Supply Chain

    Get PDF
    Open source communities exist in large part through increasing participation from for-profit corporations. The balance between the seemingly conflicting ideals of open source communities and corporations creates a number of complex challenges for both. In this paper, we focus on corporate risk mitigation and the mandates on corporate participation in open source communities in light of open source license requirements. In response to these challenges, we aim to understand risk mitigation options within the dialectic of corporate participation with open source communities. Rather than emphasizing risk mitigation as ad hoc and emergent process focused on bottom lines and shareholder interests, our interest is in formalized instruments and project management processes that can help corporations mitigate risks associated with participation in open source communities through shared IT projects. Accordingly, we identify two key risk domains that corporations must be attendant to: property protection and compliance. In addition, we discuss risk mitigation sourcing, arguing that tools and processes for mitigating open source project risk do not stem solely from a corporation or solely from an open source community. Instead they originate from the interface between the two and can be paired in a complementary fashion in an overall project management process of risk mitigation. This work has been funded through the National Science Foundation VOSS-IOS Grant: 112264

    Is Open Hardware Worthwhile? Learning from Thales' Experience with RISC-V

    Get PDF
    Overview In this article we frame the concept of a hardware-rich open source ecosystem (H-ROSE) that generates software and hardware components. In an H-ROSE, the designs of some components are accessible under open source licenses, while other component designs remain proprietary. We describe seven adoption factors used by the multinational French firm Thales to assess the efficacy of RISC-V to design processors. Other companies can use these adoption factors to explore whether an open hardware initiative supported by an H-ROSE is worthwhile.Peer reviewe

    An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

    Get PDF
    Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain

    Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities

    Get PDF
    The rapid acceleration of corporate engagement with open source projects is drawing out new ways for CSCW researchers to consider the dynamics of these projects. Research must now consider the complex ecosystems within which open source projects are situated, including issues of for-profit motivations, brokering foundations, and corporate collaboration. Localized project considerations cannot reveal broader workings of an open source ecosystem, yet much empirical work is constrained to a local context. In response, we present eight observations from our eight-year engaged field study about the changing nature of open source projects. We ground these observations through 24 research questions that serve as primers to spark research ideas in this new reality of open source projects. This paper contributes to CSCW in social and crowd computing by delivering a rich and fresh look at corporately-engaged open source projects with a call for renewed focus and research into newly emergent areas of interest

    Rethinking the Delivery Architecture of Data-Intensive Visualization

    Get PDF
    The web has transformed the way people create and consume information. However, data-intensive science applications have rarely been able to take full benefits of the web ecosystem so far. Analysis and visualization have remained close to large datasets on large servers and desktops, because of the vast resources that data-intensive applications require. This hampers the accessibility and on-demand availability of data-intensive science. In this work, I propose a novel architecture for the delivery of interactive, data-intensive visualization to the web ecosystem. The proposed architecture, codenamed Fabric, follows the idea of keeping the server-side oblivious of application logic as a set of scalable microservices that 1) manage data and 2) compute data products. Disconnected from application logic, the services allow interactive data-intensive visualization be simultaneously accessible to many users. Meanwhile, the client-side of this architecture perceives visualization applications as an interaction-in image-out black box with the sole responsibility of keeping track of application state and mapping interactions into well-defined and structured visualization requests. Fabric essentially provides a separation of concern that decouples the otherwise tightly coupled client and server seen in traditional data applications. Initial results show that as a result of this, Fabric enables high scalability of audience, scientific reproducibility, and improves control and protection of data products

    Security considerations in the open source software ecosystem

    Get PDF
    Open source software plays an important role in the software supply chain, allowing stakeholders to utilize open source components as building blocks in their software, tooling, and infrastructure. But relying on the open source ecosystem introduces unique challenges, both in terms of security and trust, as well as in terms of supply chain reliability. In this dissertation, I investigate approaches, considerations, and encountered challenges of stakeholders in the context of security, privacy, and trustworthiness of the open source software supply chain. Overall, my research aims to empower and support software experts with the knowledge and resources necessary to achieve a more secure and trustworthy open source software ecosystem. In the first part of this dissertation, I describe a research study investigating the security and trust practices in open source projects by interviewing 27 owners, maintainers, and contributors from a diverse set of projects to explore their behind-the-scenes processes, guidance and policies, incident handling, and encountered challenges, finding that participants’ projects are highly diverse in terms of their deployed security measures and trust processes, as well as their underlying motivations. More on the consumer side of the open source software supply chain, I investigated the use of open source components in industry projects by interviewing 25 software developers, architects, and engineers to understand their projects’ processes, decisions, and considerations in the context of external open source code, finding that open source components play an important role in many of the industry projects, and that most projects have some form of company policy or best practice for including external code. On the side of end-user focused software, I present a study investigating the use of software obfuscation in Android applications, which is a recommended practice to protect against plagiarism and repackaging. The study leveraged a multi-pronged approach including a large-scale measurement, a developer survey, and a programming experiment, finding that only 24.92% of apps are obfuscated by their developer, that developers do not fear theft of their own apps, and have difficulties obfuscating their own apps. Lastly, to involve end users themselves, I describe a survey with 200 users of cloud office suites to investigate their security and privacy perceptions and expectations, with findings suggesting that users are generally aware of basic security implications, but lack technical knowledge for envisioning some threat models. The key findings of this dissertation include that open source projects have highly diverse security measures, trust processes, and underlying motivations. That the projects’ security and trust needs are likely best met in ways that consider their individual strengths, limitations, and project stage, especially for smaller projects with limited access to resources. That open source components play an important role in industry projects, and that those projects often have some form of company policy or best practice for including external code, but developers wish for more resources to better audit included components. This dissertation emphasizes the importance of collaboration and shared responsibility in building and maintaining the open source software ecosystem, with developers, maintainers, end users, researchers, and other stakeholders alike ensuring that the ecosystem remains a secure, trustworthy, and healthy resource for everyone to rely on

    Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain

    Full text link
    "Does generative AI infringe copyright?" is an urgent question. It is also a difficult question, for two reasons. First, "generative AI" is not just one product from one company. It is a catch-all name for a massive ecosystem of loosely related technologies, including conversational text chatbots like ChatGPT, image generators like Midjourney and DALL-E, coding assistants like GitHub Copilot, and systems that compose music and create videos. These systems behave differently and raise different legal issues. The second problem is that copyright law is notoriously complicated, and generative-AI systems manage to touch on a great many corners of it: authorship, similarity, direct and indirect liability, fair use, and licensing, among much else. These issues cannot be analyzed in isolation, because there are connections everywhere. In this Article, we aim to bring order to the chaos. To do so, we introduce the generative-AI supply chain: an interconnected set of stages that transform training data (millions of pictures of cats) into generations (a new, potentially never-seen-before picture of a cat that has never existed). Breaking down generative AI into these constituent stages reveals all of the places at which companies and users make choices that have copyright consequences. It enables us to trace the effects of upstream technical designs on downstream uses, and to assess who in these complicated sociotechnical systems bears responsibility for infringement when it happens. Because we engage so closely with the technology of generative AI, we are able to shed more light on the copyright questions. We do not give definitive answers as to who should and should not be held liable. Instead, we identify the key decisions that courts will need to make as they grapple with these issues, and point out the consequences that would likely flow from different liability regimes.Comment: Forthcoming, Journal of the Copyright Society of the USA '2

    Metodología de implantación de modelos de gestión de la información dentro de los sistemas de planificación de recursos empresariales. Aplicación en la pequeña y mediana empresa

    Get PDF
    La Siguiente Generación de Sistemas de Fabricación (SGSF) trata de dar respuesta a los requerimientos de los nuevos modelos de empresas, en contextos de inteligencia, agilidad y adaptabilidad en un entono global y virtual. La Planificación de Recursos Empresariales (ERP) con soportes de gestión del producto (PDM) y el ciclo de vida del producto (PLM) proporciona soluciones de gestión empresarial sobre la base de un uso coherente de tecnologías de la información para la implantación en sistemas CIM (Computer-Integrated Manufacturing), con un alto grado de adaptabilidad a la estnictura organizativa deseada. En general, esta implementación se lleva desarrollando hace tiempo en grandes empresas, siendo menor (casi nula) su extensión a PYMEs. La presente Tesis Doctoral, define y desarrolla una nueva metodología de implementación pan la generación automática de la información en los procesos de negocio que se verifican en empresas con requerimientos adaptados a las necesidades de la SGSF, dentro de los sistemas de gestión de los recursos empresariales (ERP), atendiendo a la influencia del factor humano. La validez del modelo teórico de la metodología mencionada se ha comprobado al implementarlo en una empresa del tipo PYME, del sector de Ingeniería. Para el establecimiento del Estado del Arte de este tema se ha diseñado y aplicado una metodología específica basada en el ciclo de mejora continua de Shewhart/Deming, aplicando las herramientas de búsqueda y análisis bibliográfico disponibles en la red con acceso a las correspondientes bases de datos
    corecore