2,846 research outputs found
Learning from, Understanding, and Supporting DevOps Artifacts for Docker
With the growing use of DevOps tools and frameworks, there is an increased
need for tools and techniques that support more than code. The current
state-of-the-art in static developer assistance for tools like Docker is
limited to shallow syntactic validation. We identify three core challenges in
the realm of learning from, understanding, and supporting developers writing
DevOps artifacts: (i) nested languages in DevOps artifacts, (ii) rule mining,
and (iii) the lack of semantic rule-based analysis. To address these challenges
we introduce a toolset, binnacle, that enabled us to ingest 900,000 GitHub
repositories.
Focusing on Docker, we extracted approximately 178,000 unique Dockerfiles,
and also identified a Gold Set of Dockerfiles written by Docker experts. We
addressed challenge (i) by reducing the number of effectively uninterpretable
nodes in our ASTs by over 80% via a technique we call phased parsing. To
address challenge (ii), we introduced a novel rule-mining technique capable of
recovering two-thirds of the rules in a benchmark we curated. Through this
automated mining, we were able to recover 16 new rules that were not found
during manual rule collection. To address challenge (iii), we manually
collected a set of rules for Dockerfiles from commits to the files in the Gold
Set. These rules encapsulate best practices, avoid docker build failures, and
improve image size and build latency. We created an analyzer that used these
rules, and found that, on average, Dockerfiles on GitHub violated the rules
five times more frequently than the Dockerfiles in our Gold Set. We also found
that industrial Dockerfiles fared no better than those sourced from GitHub.
The learned rules and analyzer in binnacle can be used to aid developers in
the IDE when creating Dockerfiles, and in a post-hoc fashion to identify issues
in, and to improve, existing Dockerfiles.Comment: Published in ICSE'202
Bio-Docklets: virtualization containers for single-step execution of NGS pipelines
Processing of next-generation sequencing (NGS) data requires significant technical skills, involving installation, configuration, and execution of bioinformatics data pipelines, in addition to specialized postanalysis visualization and data mining software. In order to address some of these challenges, developers have leveraged virtualization containers toward seamless deployment of preconfigured bioinformatics software and pipelines on any computational platform. We present an approach for abstracting the complex data operations of multistep, bioinformatics pipelines for NGS data analysis. As examples, we have deployed 2 pipelines for RNA sequencing and chromatin immunoprecipitation sequencing, preconfigured within Docker virtualization containers we call Bio-Docklets. Each Bio-Docklet exposes a single data input and output endpoint and from a user perspective, running the pipelines as simply as running a single bioinformatics tool. This is achieved using a “meta-script” that automatically starts the Bio-Docklets and controls the pipeline execution through the BioBlend software library and the Galaxy Application Programming Interface. The pipeline output is postprocessed by integration with the Visual Omics Explorer framework, providing interactive data visualizations that users can access through a web browser. Our goal is to enable easy access to NGS data analysis pipelines for nonbioinformatics experts on any computing environment, whether a laboratory workstation, university computer cluster, or a cloud service provider. Beyond end users, the Bio-Docklets also enables developers to programmatically deploy and run a large number of pipeline instances for concurrent analysis of multiple datasets
Secrets Revealed in Container Images: An Internet-wide Study on Occurrence and Impact
Containerization allows bundling applications and their dependencies into a
single image. The containerization framework Docker eases the use of this
concept and enables sharing images publicly, gaining high momentum. However, it
can lead to users creating and sharing images that include private keys or API
secrets-either by mistake or out of negligence. This leakage impairs the
creator's security and that of everyone using the image. Yet, the extent of
this practice and how to counteract it remains unclear.
In this paper, we analyze 337,171 images from Docker Hub and 8,076 other
private registries unveiling that 8.5% of images indeed include secrets.
Specifically, we find 52,107 private keys and 3,158 leaked API secrets, both
opening a large attack surface, i.e., putting authentication and
confidentiality of privacy-sensitive data at stake and even allow active
attacks. We further document that those leaked keys are used in the wild: While
we discovered 1,060 certificates relying on compromised keys being issued by
public certificate authorities, based on further active Internet measurements,
we find 275,269 TLS and SSH hosts using leaked private keys for authentication.
To counteract this issue, we discuss how our methodology can be used to prevent
secret leakage and reuse.Comment: 15 pages, 7 figure
Towards Semantic Detection of Smells in Cloud Infrastructure Code
Automated deployment and management of Cloud applications relies on
descriptions of their deployment topologies, often referred to as
Infrastructure Code. As the complexity of applications and their deployment
models increases, developers inadvertently introduce software smells to such
code specifications, for instance, violations of good coding practices, modular
structure, and more. This paper presents a knowledge-driven approach enabling
developers to identify the aforementioned smells in deployment descriptions. We
detect smells with SPARQL-based rules over pattern-based OWL 2 knowledge graphs
capturing deployment models. We show the feasibility of our approach with a
prototype and three case studies.Comment: 5 pages, 6 figures. The 10 th International Conference on Web
Intelligence, Mining and Semantics (WIMS 2020
Understanding the Issues, Their Causes and Solutions in Microservices Systems: An Empirical Study
Many small to large organizations have adopted the Microservices Architecture
(MSA) style to develop and deliver their core businesses. Despite the
popularity of MSA in the software industry, there is a limited evidence-based
and thorough understanding of the types of issues (e.g., errors, faults,
failures, and bugs) that microservices system developers experience, the causes
of the issues, and the solutions as potential fixing strategies to address the
issues. To ameliorate this gap, we conducted a mixed-methods empirical study
that collected data from 2,641 issues from the issue tracking systems of 15
open-source microservices systems on GitHub, 15 interviews, and an online
survey completed by 150 practitioners from 42 countries across 6 continents.
Our analysis led to comprehensive taxonomies for the issues, causes, and
solutions. The findings of this study inform that Technical Debt, Continuous
Integration and Delivery, Exception Handling, Service Execution and
Communication, and Security are the most dominant issues in microservices
systems. Furthermore, General Programming Errors, Missing Features and
Artifacts, and Invalid Configuration and Communication are the main causes
behind the issues. Finally, we found 177 types of solutions that can be applied
to fix the identified issues. Based on our study results, we formulated future
research directions that could help researchers and practitioners to engineer
emergent and next-generation microservices systems.Comment: 35 pages, 5 images, 7 tables, Manuscript submitted to a Journal
(2023
Refactorings and Technical Debt for Docker Projects
http://deepblue.lib.umich.edu/bitstream/2027.42/170138/1/ASE2021_DockerRefactoring__Copy_.pdfSEL
- …