11 research outputs found
Taxonomy of Attacks on Open-Source Software Supply Chains
The widespread dependency on open-source software makes it a fruitful target
for malicious actors, as demonstrated by recurring attacks. The complexity of
today's open-source supply chains results in a significant attack surface,
giving attackers numerous opportunities to reach the goal of injecting
malicious code into open-source artifacts that is then downloaded and executed
by victims.
This work proposes a general taxonomy for attacks on open-source supply
chains, independent of specific programming languages or ecosystems, and
covering all supply chain stages from code contributions to package
distribution. Taking the form of an attack tree, it covers 107 unique vectors,
linked to 94 real-world incidents, and mapped to 33 mitigating safeguards.
User surveys conducted with 17 domain experts and 134 software developers
positively validated the correctness, comprehensiveness and comprehensibility
of the taxonomy, as well as its suitability for various use-cases. Survey
participants also assessed the utility and costs of the identified safeguards,
and whether they are used
A Complete Set of Related Git Repositories Identified via Community Detection Approaches Based on Shared Commits
In order to understand the state and evolution of the entirety of open source
software we need to get a handle on the set of distinct software projects. Most
of open source projects presently utilize Git, which is a distributed version
control system allowing easy creation of clones and resulting in numerous
repositories that are almost entirely based on some parent repository from
which they were cloned. Git commits are based on Merkle Tree and two commits
are highly unlikely to be produced independently. Shared commits, therefore,
appear like an excellent way to group cloned repositories and obtain an
accurate map for such repositories. We use World of Code infrastructure
containing approximately 2B commits and 100M repositories to create and share
such a map. We discover that the largest group contains almost 14M repositories
most of which are unrelated to each other. As it turns out, the developers can
push git object to an arbitrary repository or pull objects from unrelated
repositories, thus linking unrelated repositories. To address this, we apply
Louvain community detection algorithm to this very large graph consisting of
links between commits and projects. The approach successfully reduces the size
of the megacluster with the largest group of highly interconnected projects
containing under 100K repositories. We expect the tools that the resulting map
of related projects as well as tools and methods to handle the very large graph
will serve as a reference set for mining software projects and other
applications. Further work is needed to determine different types of
relationships among projects induced by shared commits and other relationships,
for example, by shared source code or similar filenames.Comment: 5 page
Risk Mitigation in Corporate Participation with Open Source Communities: Protection and Compliance in an Open Source Supply Chain
Open source communities exist in large part through increasing participation from for-profit corporations. The balance between the seemingly conflicting ideals of open source communities and corporations creates a number of complex challenges for both. In this paper, we focus on corporate risk mitigation and the mandates on corporate participation in open source communities in light of open source license requirements. In response to these challenges, we aim to understand risk mitigation options within the dialectic of corporate participation with open source communities. Rather than emphasizing risk mitigation as ad hoc and emergent process focused on bottom lines and shareholder interests, our interest is in formalized instruments and project management processes that can help corporations mitigate risks associated with participation in open source communities through shared IT projects. Accordingly, we identify two key risk domains that corporations must be attendant to: property protection and compliance. In addition, we discuss risk mitigation sourcing, arguing that tools and processes for mitigating open source project risk do not stem solely from a corporation or solely from an open source community. Instead they originate from the interface between the two and can be paired in a complementary fashion in an overall project management process of risk mitigation.
This work has been funded through the National Science Foundation VOSS-IOS Grant: 112264
Is Open Hardware Worthwhile? Learning from Thales' Experience with RISC-V
Overview In this article we frame the concept of a hardware-rich open source ecosystem (H-ROSE) that generates software and hardware components. In an H-ROSE, the designs of some components are accessible under open source licenses, while other component designs remain proprietary. We describe seven adoption factors used by the multinational French firm Thales to assess the efficacy of RISC-V to design processors. Other companies can use these adoption factors to explore whether an open hardware initiative supported by an H-ROSE is worthwhile.Peer reviewe
An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain
Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective.
We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain
Eight Observations and 24 Research Questions About Open Source Projects: Illuminating New Realities
The rapid acceleration of corporate engagement with open source projects is drawing out new ways for CSCW researchers to consider the dynamics of these projects. Research must now consider the complex ecosystems within which open source projects are situated, including issues of for-profit motivations, brokering foundations, and corporate collaboration. Localized project considerations cannot reveal broader workings of an open source ecosystem, yet much empirical work is constrained to a local context. In response, we present eight observations from our eight-year engaged field study about the changing nature of open source projects. We ground these observations through 24 research questions that serve as primers to spark research ideas in this new reality of open source projects. This paper contributes to CSCW in social and crowd computing by delivering a rich and fresh look at corporately-engaged open source projects with a call for renewed focus and research into newly emergent areas of interest
Rethinking the Delivery Architecture of Data-Intensive Visualization
The web has transformed the way people create and consume information. However, data-intensive science applications have rarely been able to take full benefits of the web ecosystem so far. Analysis and visualization have remained close to large datasets on large servers and desktops, because of the vast resources that data-intensive applications require. This hampers the accessibility and on-demand availability of data-intensive science. In this work, I propose a novel architecture for the delivery of interactive, data-intensive visualization to the web ecosystem. The proposed architecture, codenamed Fabric, follows the idea of keeping the server-side oblivious of application logic as a set of scalable microservices that 1) manage data and 2) compute data products. Disconnected from application logic, the services allow interactive data-intensive visualization be simultaneously accessible to many users. Meanwhile, the client-side of this architecture perceives visualization applications as an interaction-in image-out black box with the sole responsibility of keeping track of application state and mapping interactions into well-defined and structured visualization requests. Fabric essentially provides a separation of concern that decouples the otherwise tightly coupled client and server seen in traditional data applications. Initial results show that as a result of this, Fabric enables high scalability of audience, scientific reproducibility, and improves control and protection of data products
Security considerations in the open source software ecosystem
Open source software plays an important role in the software supply chain, allowing stakeholders to
utilize open source components as building blocks in their software, tooling, and infrastructure. But
relying on the open source ecosystem introduces unique challenges, both in terms of security and trust,
as well as in terms of supply chain reliability.
In this dissertation, I investigate approaches, considerations, and encountered challenges of stakeholders in the context of security, privacy, and trustworthiness of the open source software supply
chain. Overall, my research aims to empower and support software experts with the knowledge and
resources necessary to achieve a more secure and trustworthy open source software ecosystem. In the
first part of this dissertation, I describe a research study investigating the security and trust practices
in open source projects by interviewing 27 owners, maintainers, and contributors from a diverse set
of projects to explore their behind-the-scenes processes, guidance and policies, incident handling, and
encountered challenges, finding that participants’ projects are highly diverse in terms of their deployed
security measures and trust processes, as well as their underlying motivations. More on the consumer
side of the open source software supply chain, I investigated the use of open source components in
industry projects by interviewing 25 software developers, architects, and engineers to understand their
projects’ processes, decisions, and considerations in the context of external open source code, finding
that open source components play an important role in many of the industry projects, and that most
projects have some form of company policy or best practice for including external code. On the side of
end-user focused software, I present a study investigating the use of software obfuscation in Android
applications, which is a recommended practice to protect against plagiarism and repackaging. The
study leveraged a multi-pronged approach including a large-scale measurement, a developer survey, and
a programming experiment, finding that only 24.92% of apps are obfuscated by their developer, that
developers do not fear theft of their own apps, and have difficulties obfuscating their own apps. Lastly,
to involve end users themselves, I describe a survey with 200 users of cloud office suites to investigate
their security and privacy perceptions and expectations, with findings suggesting that users are generally
aware of basic security implications, but lack technical knowledge for envisioning some threat models.
The key findings of this dissertation include that open source projects have highly diverse security
measures, trust processes, and underlying motivations. That the projects’ security and trust needs are
likely best met in ways that consider their individual strengths, limitations, and project stage, especially
for smaller projects with limited access to resources. That open source components play an important
role in industry projects, and that those projects often have some form of company policy or best
practice for including external code, but developers wish for more resources to better audit included
components.
This dissertation emphasizes the importance of collaboration and shared responsibility in building and maintaining the open source software ecosystem, with developers, maintainers, end users,
researchers, and other stakeholders alike ensuring that the ecosystem remains a secure, trustworthy, and
healthy resource for everyone to rely on
Talkin' 'Bout AI Generation: Copyright and the Generative-AI Supply Chain
"Does generative AI infringe copyright?" is an urgent question. It is also a
difficult question, for two reasons. First, "generative AI" is not just one
product from one company. It is a catch-all name for a massive ecosystem of
loosely related technologies, including conversational text chatbots like
ChatGPT, image generators like Midjourney and DALL-E, coding assistants like
GitHub Copilot, and systems that compose music and create videos. These systems
behave differently and raise different legal issues. The second problem is that
copyright law is notoriously complicated, and generative-AI systems manage to
touch on a great many corners of it: authorship, similarity, direct and
indirect liability, fair use, and licensing, among much else. These issues
cannot be analyzed in isolation, because there are connections everywhere.
In this Article, we aim to bring order to the chaos. To do so, we introduce
the generative-AI supply chain: an interconnected set of stages that transform
training data (millions of pictures of cats) into generations (a new,
potentially never-seen-before picture of a cat that has never existed).
Breaking down generative AI into these constituent stages reveals all of the
places at which companies and users make choices that have copyright
consequences. It enables us to trace the effects of upstream technical designs
on downstream uses, and to assess who in these complicated sociotechnical
systems bears responsibility for infringement when it happens. Because we
engage so closely with the technology of generative AI, we are able to shed
more light on the copyright questions. We do not give definitive answers as to
who should and should not be held liable. Instead, we identify the key
decisions that courts will need to make as they grapple with these issues, and
point out the consequences that would likely flow from different liability
regimes.Comment: Forthcoming, Journal of the Copyright Society of the USA '2
Metodología de implantación de modelos de gestión de la información dentro de los sistemas de planificación de recursos empresariales. Aplicación en la pequeña y mediana empresa
La Siguiente Generación de Sistemas de Fabricación (SGSF) trata de dar respuesta a los requerimientos de los nuevos modelos de empresas, en contextos de inteligencia, agilidad y adaptabilidad en un entono global y virtual. La Planificación de Recursos Empresariales (ERP) con soportes de gestión del producto (PDM) y el ciclo de vida del producto (PLM) proporciona soluciones de gestión empresarial sobre la base de un uso coherente de tecnologías de la información para la implantación en sistemas CIM (Computer-Integrated Manufacturing), con un alto grado de adaptabilidad a la estnictura organizativa deseada. En general, esta implementación se lleva desarrollando hace tiempo en grandes empresas, siendo menor (casi nula) su extensión a PYMEs.
La presente Tesis Doctoral, define y desarrolla una nueva metodología de implementación pan la generación automática de la información en los procesos de negocio que se verifican en empresas con requerimientos adaptados a las necesidades de la SGSF, dentro de los sistemas de gestión de los recursos empresariales (ERP), atendiendo a la influencia del factor humano. La validez del modelo teórico de la metodología mencionada se ha comprobado al implementarlo en una empresa del tipo PYME, del sector de Ingeniería.
Para el establecimiento del Estado del Arte de este tema se ha diseñado y aplicado una metodología específica basada en el ciclo de mejora continua de Shewhart/Deming, aplicando las herramientas de búsqueda y análisis bibliográfico disponibles en la red con acceso a las correspondientes bases de datos