17 research outputs found
Youtube for software security?: Youtube Videos Provide Pointers for Microservice Security
Microservice applications are defined as software applications, which include services that interact with one another but failure of one service does not impact the execution of another. Microservice oriented design has become a popular software application design paradigm among software companies, such as Uber, Netflix, and Amazon as well as small startup companies due to delivery speed, reliability and greater flexibility. However, any insecure coding pattern in the code while developing microservice applications can make the entire system vulnerable to hackers. The goal of the abstract is to help software developers in building secure microservice applications. We have conducted a qualitative analysis of 6 youtube videos on microservice design antipatterns and an empirical study on open source microservice repositories. We have observed insecure coding patterns in those microservice repositories. We have defined 9 categories each with an associated pattern namely HTTP without TLS, authentication vs authorization, hard coded secret, weak encryption algorithm, use of default ports, violation of least privilege principle, insufficient logging, poor orchestration layer configuration, API service sharing and distributed deadlock. We advocate for future research that will create a taxonomy of insecure coding patterns so that developers can find and resolve insecure coding patterns during code review
Towards Semantic Detection of Smells in Cloud Infrastructure Code
Automated deployment and management of Cloud applications relies on
descriptions of their deployment topologies, often referred to as
Infrastructure Code. As the complexity of applications and their deployment
models increases, developers inadvertently introduce software smells to such
code specifications, for instance, violations of good coding practices, modular
structure, and more. This paper presents a knowledge-driven approach enabling
developers to identify the aforementioned smells in deployment descriptions. We
detect smells with SPARQL-based rules over pattern-based OWL 2 knowledge graphs
capturing deployment models. We show the feasibility of our approach with a
prototype and three case studies.Comment: 5 pages, 6 figures. The 10 th International Conference on Web
Intelligence, Mining and Semantics (WIMS 2020
Detecting and Characterizing Propagation of Security Weaknesses in Puppet-based Infrastructure Management
Despite being beneficial for managing computing infrastructure automatically,
Puppet manifests are susceptible to security weaknesses, e.g., hard-coded
secrets and use of weak cryptography algorithms. Adequate mitigation of
security weaknesses in Puppet manifests is thus necessary to secure computing
infrastructure that are managed with Puppet manifests. A characterization of
how security weaknesses propagate and affect Puppet-based infrastructure
management, can inform practitioners on the relevance of the detected security
weaknesses, as well as help them take necessary actions for mitigation. To that
end, we conduct an empirical study with 17,629 Puppet manifests mined from 336
open source repositories. We construct Taint Tracker for Puppet Manifests
(TaintPup), for which we observe 2.4 times more precision compared to that of a
state-of-the-art security static analysis tool. TaintPup leverages
Puppet-specific information flow analysis using which we characterize
propagation of security weaknesses. From our empirical study, we observe
security weaknesses to propagate into 4,457 resources, i.e, Puppet-specific
code elements used to manage infrastructure. A single instance of a security
weakness can propagate into as many as 35 distinct resources. We observe
security weaknesses to propagate into 7 categories of resources, which include
resources used to manage continuous integration servers and network
controllers. According to our survey with 24 practitioners, propagation of
security weaknesses into data storage-related resources is rated to have the
most severe impact for Puppet-based infrastructure management.Comment: 14 pages, currently under revie
Detecting Missing Dependencies and Notifiers in Puppet Programs
Puppet is a popular computer system configuration management tool. It
provides abstractions that enable administrators to setup their computer
systems declaratively. Its use suffers from two potential pitfalls. First, if
ordering constraints are not specified whenever an abstraction depends on
another, the non-deterministic application of abstractions can lead to race
conditions. Second, if a service is not tied to its resources through
notification constructs, the system may operate in a stale state whenever a
resource gets modified. Such faults can degrade a computing infrastructure's
availability and functionality.
We have developed an approach that identifies these issues through the
analysis of a Puppet program and its system call trace. Specifically, we
present a formal model for traces, which allows us to capture the interactions
of Puppet abstractions with the file system. By analyzing these interactions we
identify (1) abstractions that are related to each other (e.g., operate on the
same file), and (2) abstractions that should act as notifiers so that changes
are correctly propagated. We then check the relationships from the trace's
analysis against the program's dependency graph: a representation containing
all the ordering constraints and notifications declared in the program. If a
mismatch is detected, our system reports a potential fault.
We have evaluated our method on a large set of Puppet modules, and discovered
57 previously unknown issues in 30 of them. Benchmarking further shows that our
approach can analyze in minutes real-world configurations with a magnitude
measured in thousands of lines and millions of system calls
A Dataset for GitHub Repository Deduplication: Extended Description
GitHub projects can be easily replicated through the site's fork process or
through a Git clone-push sequence. This is a problem for empirical software
engineering, because it can lead to skewed results or mistrained machine
learning models. We provide a dataset of 10.6 million GitHub projects that are
copies of others, and link each record with the project's ultimate parent. The
ultimate parents were derived from a ranking along six metrics. The related
projects were calculated as the connected components of an 18.2 million node
and 12 million edge denoised graph created by directing edges to ultimate
parents. The graph was created by filtering out more than 30 hand-picked and
2.3 million pattern-matched clumping projects. Projects that introduced
unwanted clumping were identified by repeatedly visualizing shortest path
distances between unrelated important projects. Our dataset identified 30
thousand duplicate projects in an existing popular reference dataset of 1.8
million projects. An evaluation of our dataset against another created
independently with different methods found a significant overlap, but also
differences attributed to the operational definition of what projects are
considered as related.Comment: 33 pages, 33 figures, 17 listing