39,959 research outputs found
Performance-Aware Management of Cloud Resources: A Taxonomy and Future Directions
Dynamic nature of the cloud environment has made distributed resource
management process a challenge for cloud service providers. The importance of
maintaining the quality of service in accordance with customer expectations as
well as the highly dynamic nature of cloud-hosted applications add new levels
of complexity to the process. Advances to the big data learning approaches have
shifted conventional static capacity planning solutions to complex
performance-aware resource management methods. It is shown that the process of
decision making for resource adjustment is closely related to the behaviour of
the system including the utilization of resources and application components.
Therefore, a continuous monitoring of system attributes and performance metrics
provide the raw data for the analysis of problems affecting the performance of
the application. Data analytic methods such as statistical and machine learning
approaches offer the required concepts, models and tools to dig into the data,
find general rules, patterns and characteristics that define the functionality
of the system. Obtained knowledge form the data analysis process helps to find
out about the changes in the workloads, faulty components or problems that can
cause system performance to degrade. A timely reaction to performance
degradations can avoid violations of the service level agreements by performing
proper corrective actions including auto-scaling or other resource adjustment
solutions. In this paper, we investigate the main requirements and limitations
in cloud resource management including a study of the approaches in workload
and anomaly analysis in the context of the performance management in the cloud.
A taxonomy of the works on this problem is presented which identifies the main
approaches in existing researches from data analysis side to resource
adjustment techniques
Performance-oriented DevOps: A Research Agenda
DevOps is a trend towards a tighter integration between development (Dev) and
operations (Ops) teams. The need for such an integration is driven by the
requirement to continuously adapt enterprise applications (EAs) to changes in
the business environment. As of today, DevOps concepts have been primarily
introduced to ensure a constant flow of features and bug fixes into new
releases from a functional perspective. In order to integrate a non-functional
perspective into these DevOps concepts this report focuses on tools,
activities, and processes to ensure one of the most important quality
attributes of a software system, namely performance.
Performance describes system properties concerning its timeliness and use of
resources. Common metrics are response time, throughput, and resource
utilization. Performance goals for EAs are typically defined by setting upper
and/or lower bounds for these metrics and specific business transactions. In
order to ensure that such performance goals can be met, several activities are
required during development and operation of these systems as well as during
the transition from Dev to Ops. Activities during development are typically
summarized by the term Software Performance Engineering (SPE), whereas
activities during operations are called Application Performance Management
(APM). SPE and APM were historically tackled independently from each other, but
the newly emerging DevOps concepts require and enable a tighter integration
between both activity streams. This report presents existing solutions to
support this integration as well as open research challenges in this area
Configuration Testing: Testing Configuration Values as Code and with Code
This paper proposes configuration testing--evaluating configuration values
(to be deployed) by exercising the code that uses the values and assessing the
corresponding program behavior. We advocate that configuration values should be
systematically tested like software code and that configuration testing should
be a key reliability engineering practice for preventing misconfigurations from
production deployment.
The essential advantage of configuration testing is to put the configuration
values (to be deployed) in the context of the target software program under
test. In this way, the dynamic effects of configuration values and the impact
of configuration changes can be observed during testing. Configuration testing
overcomes the fundamental limitations of de facto approaches to combatting
misconfigurations, namely configuration validation and software testing--the
former is disconnected from code logic and semantics, while the latter can
hardly cover all possible configuration values and their combinations. Our
preliminary results show the effectiveness of configuration testing in
capturing real-world misconfigurations.
We present the principles of writing new configuration tests and the promises
of retrofitting existing software tests to be configuration tests. We discuss
new adequacy and quality metrics for configuration testing. We also explore
regression testing techniques to enable incremental configuration testing
during continuous integration and deployment in modern software systems
From 4G to 5G: Self-organized Network Management meets Machine Learning
In this paper, we provide an analysis of self-organized network management,
with an end-to-end perspective of the network. Self-organization as applied to
cellular networks is usually referred to Self-organizing Networks (SONs), and
it is a key driver for improving Operations, Administration, and Maintenance
(OAM) activities. SON aims at reducing the cost of installation and management
of 4G and future 5G networks, by simplifying operational tasks through the
capability to configure, optimize and heal itself. To satisfy 5G network
management requirements, this autonomous management vision has to be extended
to the end to end network. In literature and also in some instances of products
available in the market, Machine Learning (ML) has been identified as the key
tool to implement autonomous adaptability and take advantage of experience when
making decisions. In this paper, we survey how network management can
significantly benefit from ML solutions. We review and provide the basic
concepts and taxonomy for SON, network management and ML. We analyse the
available state of the art in the literature, standardization, and in the
market. We pay special attention to 3rd Generation Partnership Project (3GPP)
evolution in the area of network management and to the data that can be
extracted from 3GPP networks, in order to gain knowledge and experience in how
the network is working, and improve network performance in a proactive way.
Finally, we go through the main challenges associated with this line of
research, in both 4G and in what 5G is getting designed, while identifying new
directions for research.Comment: 23 pages, 3 figures, Surve
A Roadmap Towards Resilient Internet of Things for Cyber-Physical Systems
The Internet of Things (IoT) is a ubiquitous system connecting many different
devices - the things - which can be accessed from the distance. The
cyber-physical systems (CPS) monitor and control the things from the distance.
As a result, the concepts of dependability and security get deeply intertwined.
The increasing level of dynamicity, heterogeneity, and complexity adds to the
system's vulnerability, and challenges its ability to react to faults. This
paper summarizes state-of-the-art of existing work on anomaly detection,
fault-tolerance and self-healing, and adds a number of other methods applicable
to achieve resilience in an IoT. We particularly focus on non-intrusive methods
ensuring data integrity in the network. Furthermore, this paper presents the
main challenges in building a resilient IoT for CPS which is crucial in the era
of smart CPS with enhanced connectivity (an excellent example of such a system
is connected autonomous vehicles). It further summarizes our solutions,
work-in-progress and future work to this topic to enable "Trustworthy IoT for
CPS". Finally, this framework is illustrated on a selected use case: A smart
sensor infrastructure in the transport domain.Comment: preprint (2018-10-29
An HCI View of Configuration Problems
In recent years, configuration problems have drawn tremendous attention
because of their increasing prevalence and their big impact on system
availability. We believe that many of these problems are attributable to
today's configuration interfaces that have not evolved to accommodate the
enormous shift of the system administrator group. Plain text files, as the de
facto configuration interfaces, assume administrators' understanding of the
system under configuration. They ask administrators to directly edit the
corresponding entries with little guidance or assistance. However, this
assumption no longer holds for todays administrator group which has expanded
greatly to include non- and semi-professional administrators. In this paper, we
provide an HCI view of today's configuration problems, and articulate system
configuration as a new HCI problem. Moreover, we present the top obstacles to
correctly and efficiently configuring software systems, and most importantly
their implications on the design and implementation of new-generation
configuration interfaces.Comment: 9 pages of exploratory research on understanding system configuration
problems using Human-Computer Interaction principle
Data Management in Industry 4.0: State of the Art and Open Challenges
Information and communication technologies are permeating all aspects of
industrial and manufacturing systems, expediting the generation of large
volumes of industrial data. This article surveys the recent literature on data
management as it applies to networked industrial environments and identifies
several open research challenges for the future. As a first step, we extract
important data properties (volume, variety, traffic, criticality) and identify
the corresponding data enabling technologies of diverse fundamental industrial
use cases, based on practical applications. Secondly, we provide a detailed
outline of recent industrial architectural designs with respect to their data
management philosophy (data presence, data coordination, data computation) and
the extent of their distributiveness. Then, we conduct a holistic survey of the
recent literature from which we derive a taxonomy of the latest advances on
industrial data enabling technologies and data centric services, spanning all
the way from the field level deep in the physical deployments, up to the cloud
and applications level. Finally, motivated by the rich conclusions of this
critical analysis, we identify interesting open challenges for future research.
The concepts presented in this article thematically cover the largest part of
the industrial automation pyramid layers. Our approach is multidisciplinary, as
the selected publications were drawn from two fields; the communications,
networking and computation field as well as the industrial, manufacturing and
automation field. The article can help the readers to deeply understand how
data management is currently applied in networked industrial environments, and
select interesting open research opportunities to pursue
Assuring the Machine Learning Lifecycle: Desiderata, Methods, and Challenges
Machine learning has evolved into an enabling technology for a wide range of
highly successful applications. The potential for this success to continue and
accelerate has placed machine learning (ML) at the top of research, economic
and political agendas. Such unprecedented interest is fuelled by a vision of ML
applicability extending to healthcare, transportation, defence and other
domains of great societal importance. Achieving this vision requires the use of
ML in safety-critical applications that demand levels of assurance beyond those
needed for current ML applications. Our paper provides a comprehensive survey
of the state-of-the-art in the assurance of ML, i.e. in the generation of
evidence that ML is sufficiently safe for its intended use. The survey covers
the methods capable of providing such evidence at different stages of the
machine learning lifecycle, i.e. of the complex, iterative process that starts
with the collection of the data used to train an ML component for a system, and
ends with the deployment of that component within the system. The paper begins
with a systematic presentation of the ML lifecycle and its stages. We then
define assurance desiderata for each stage, review existing methods that
contribute to achieving these desiderata, and identify open challenges that
require further research
Mobile Cloud Business Process Management System for the Internet of Things: A Survey
The Internet of Things (IoT) represents a comprehensive environment that
consists of a large number of smart devices interconnecting heterogeneous
physical objects to the Internet. Many domains such as logistics,
manufacturing, agriculture, urban computing, home automation, ambient assisted
living and various ubiquitous computing applications have utilised IoT
technologies. Meanwhile, Business Process Management Systems (BPMS) have become
a successful and efficient solution for coordinated management and optimised
utilisation of resources/entities. However, past BPMS have not considered many
issues they will face in managing large scale connected heterogeneous IoT
entities. Without fully understanding the behaviour, capability and state of
the IoT entities, the BPMS can fail to manage the IoT integrated information
systems. In this paper, we analyse existing BPMS for IoT and identify the
limitations and their drawbacks based on Mobile Cloud Computing perspective.
Later, we discuss a number of open challenges in BPMS for IoT.Comment: 56 pages, 10 figures, 5 table
Statically Verifying Continuous Integration Configurations
Continuous Integration (CI) testing is a popular software development
technique that allows developers to easily check that their code can build
successfully and pass tests across various system environments. In order to use
a CI platform, a developer must include a set of configuration files to a code
repository for specifying build conditions. Incorrect configuration settings
lead to CI build failures, which can take hours to run, wasting valuable
developer time and delaying product release dates. Debugging CI configurations
is challenging because users must manage configurations for the build across
many system environments, to which they may not have local access. Thus, the
only way to check a CI configuration is to push a commit and wait for the build
result. To address this problem, we present the first approach, VeriCI, for
statically checking for errors in a given CI configuration before the developer
pushes a commit to build on the CI server. Our key insight is that the
repositories in a CI environment contain lists of build histories which offer
the time-aware repository build status. Driven by this insight, we introduce
the Misclassification Guided Abstraction Refinement (MiGAR) loop that automates
part of the learning process across the heterogeneous build environments in CI.
We then use decision tree learning to generate constraints on the CI
configuration that must hold for a build to succeed by training on a large
history of continuous integration repository build results. We evaluate VeriCI
on real-world data from GitHub and find that we have 83% accuracy of predicting
a build failure
- …