37 research outputs found
Scalable and responsive real time event processing using cloud computing
PhD ThesisCloud computing provides the potential for scalability and adaptability in a cost e ective
manner. However, when it comes to achieving scalability for real time applications
response time cannot be high. Many applications require good performance and low
response time, which need to be matched with the dynamic resource allocation. The
real time processing requirements can also be characterized by unpredictable rates
of incoming data streams and dynamic outbursts of data. This raises the issue of
processing the data streams across multiple cloud computing nodes. This research
analyzes possible methodologies to process the real time data in which applications
can be structured as multiple event processing networks and be partitioned over the
set of available cloud nodes. The approach is based on queuing theory principles
to encompass the cloud computing. The transformation of the raw data into useful
outputs occurs in various stages of processing networks which are distributed across
the multiple computing nodes in a cloud. A set of valid options is created to understand
the response time requirements for each application. Under a given valid set of
conditions to meet the response time criteria, multiple instances of event processing
networks are distributed in the cloud nodes. A generic methodology to scale-up and
scale-down the event processing networks in accordance to the response time criteria
is de ned. The real time applications that support sophisticated decision support
mechanisms need to comply with response time criteria consisting of interdependent
data
ow paradigms making it harder to improve the performance. Consideration is
given for ways to reduce the latency,improve response time and throughput of the real
time applications by distributing the event processing networks in multiple computing
nodes
Performance management of event processing systems
This thesis is a study of performance management of Complex Event Processing (CEP) systems. Since CEP systems have distinct characteristics from other well-studied computer systems such as batch and online transaction processing systems and database-centric applications, these characteristics introduce new challenges and opportunities to the performance management for CEP systems. Methodologies used in benchmarking CEP systems in many performance studies focus on scaling the load injection, but not considering the impact of the functional capabilities of CEP systems. This thesis proposes the approach of evaluating the performance of CEP engines’ functional behaviours on events and develops a benchmark platform for CEP systems: CEPBen. The CEPBen benchmark platform is developed to explore the fundamental functional performance of event processing systems: filtering, transformation and event pattern detection. It is also designed to provide a flexible environment for exploring new metrics and influential factors for CEP systems and evaluating the performance of CEP systems. Studies on factors and new metrics are carried out using the CEPBen benchmark platform on Esper. Different measurement points of response time in performance management of CEP systems are discussed and response time of targeted event is proposed to be used as a metric for quality of service evaluation combining with the traditional response time in CEP systems. Maximum query load as a capacity indicator regarding to the complexity of queries and number of live objects in memory as a performance indicator regarding to the memory management are proposed in performance management of CEP systems. Query depth is studied as a performance factor that influences CEP system performance
On the cloud deployment of a session abstraction for service/data aggregation
Dissertação para obtenção do Grau de Mestre em
Engenharia InformáticaThe global cyber-infrastructure comprehends a growing number of resources, spanning over several abstraction layers. These resources, which can include wireless sensor devices or mobile networks, share common requirements such as richer inter-connection capabilities and increasing data consumption demands.
Additionally, the service model is now widely spread, supporting the development
and execution of distributed applications. In this context, new challenges are emerging around the “big data” topic. These challenges include service access optimizations, such as data-access context sharing, more efficient data filtering/
aggregation mechanisms, and adaptable service access models that can respond to context changes. The service access characteristics can be aggregated
to capture specific interaction models. Moreover, ubiquitous service access is a
growing requirement, particularly regarding mobile clients such as tablets and
smartphones.
The Session concept aggregates the service access characteristics, creating specific
interaction models, which can then be re-used in similar contexts. Existing
Session abstraction implementations also allow dynamic reconfigurations of these interaction models, so that the model can adapt to context changes, based on service, client or underlying communication medium variables. Cloud computing on the other hand, provides ubiquitous access, along with large data persistence and processing services.
This thesis proposes a Session abstraction implementation, deployed on a Cloud
platform, in the form of a middleware. This middleware captures rich/dynamic
interaction models between users with similar interests, and provides a generic
mechanism for interacting with datasources based on multiple protocols. Such an abstraction contextualizes service/users interactions, can be reused by other
users in similar contexts. This Session implementation also permits data persistence
by saving all data in transit in a Cloud-based repository,
The aforementioned middleware delivers richer datasource-access interaction
models, dynamic reconfigurations, and allows the integration of heterogenous datasources. The solution also provides ubiquitous access, allowing client connections from standard Web browsers or Android based mobile devices
Workload Management for Data-Intensive Services
<p>Data-intensive web services are typically composed of three tiers: i) a display tier that interacts with users and serves rich content to them, ii) a storage tier that stores the user-generated or machine-generated data used to create this content, and iii) an analytics tier that runs data analysis tasks in order to create and optimize new content. Each tier has different workloads and requirements that result in a diverse set of systems being used in modern data-intensive web services.</p><p>Servers are provisioned dynamically in the display tier to ensure that interactive client requests are served as per the latency and throughput requirements. The challenge is not only deciding automatically how many servers to provision but also when to provision them, while ensuring stable system performance and high resource utilization. To address these challenges, we have developed a new control policy for provisioning resources dynamically in coarse-grained units (e.g., adding or removing servers or virtual machines in cloud platforms). Our new policy, called proportional thresholding, converts a user-specified performance target value into a target range in order to account for the relative effect of provisioning a server on the overall workload performance.</p><p>The storage tier is similar to the display tier in some respects, but poses the additional challenge of needing redistribution of stored data when new storage nodes are added or removed. Thus, there will be some delay before the effects of changing a resource allocation will appear. Moreover, redistributing data can cause some interference to the current workload because it uses resources that can otherwise be used for processing requests. We have developed a system, called Elastore, that addresses the new challenges found in the storage tier. Elastore not only coordinates resource allocation and data redistribution to preserve stability during dynamic resource provisioning, but it also finds the best tradeoff between workload interference and data redistribution time.</p><p>The workload in the analytics tier consists of data-parallel workflows that can either be run in a batch fashion or continuously as new data becomes available. Each workflow is composed of smaller units that have producer-consumer relationships based on data. These workflows are often generated from declarative specifications in languages like SQL, so there is a need for a cost-based optimizer that can generate an efficient execution plan for a given workflow. There are a number of challenges when building a cost-based optimizer for data-parallel workflows, which includes characterizing the large execution plan space, developing cost models to estimate the execution costs, and efficiently searching for the best execution plan. We have built two cost-based optimizers: Stubby for batch data-parallel workflows running on MapReduce systems, and Cyclops for continuous data-parallel workflows where the choice of execution system is made a part of the execution plan space.</p><p>We have conducted a comprehensive evaluation that shows the effectiveness of each tier's automated workload management solution.</p>Dissertatio
Real-Time QoS Monitoring and Anomaly Detection on Microservice-based Applications in Cloud-Edge Infrastructure
Ph. D. Thesis.Microservices have emerged as a new approach for developing and deploying cloud
applications that require higher levels of agility, scale, and reliability. A microservicebased
cloud application architecture advocates decomposition of monolithic application
components into independent software components called \microservices". As the
independent microservices can be developed, deployed, and updated independently of
each other, it leads to complex run-time performance monitoring and management
challenges. The deployment environment for microservices in multi-cloud environments
is very complex as there are numerous components running in heterogeneous
environments (VM/container) and communicating frequently with each other using
REST-based/REST-less APIs. In some cases, multiple components can also be executed
inside a VM/container making any failure or anomaly detection very complicated.
It is necessary to monitor the performance variation of all the service components
to detect any reason for failure.
Microservice and container architecture allows to design loose-coupled services and run
them in a lightweight runtime environment for more e cient scaling. Thus, containerbased
microservice deployment is now the standard model for hosting cloud applications
across industries. Despite the strongest scalability characteristic of this model
which opens the doors for further optimizations in both application structure and
performance, such characteristic adds an additional level of complexity to monitoring
application performance. Performance monitoring system can lead to severe application
outages if it is not able to successfully and quickly detecting failures and localizing
their causes. Machine learning-based techniques have been applied to detect anomalies
in microservice-based cloud-based applications. The existing research works used
di erent tracking algorithms to search the root cause if anomaly observed behaviour.
However, linking the observed failures of an application with their root causes by the
use of these techniques is still an open research problem.
Osmotic computing is a new IoT application programming paradigm that's driven
by the signi cant increase in resource capacity/capability at the network edge, along
with support for data transfer protocols that enable such resources to interact more
seamlessly with cloud-based services. Much of the di culty in Quality of Service (QoS)
and performance monitoring of IoT applications in an osmotic computing environment
is due to the massive scale and heterogeneity (IoT + edge + cloud) of computing
environments.
To handle monitoring and anomaly detection of microservices in cloud and edge datacenters,
this thesis presents multilateral research towards monitoring and anomaly
detection on microservice-based applications performance in cloud-edge infrastructure.
The key contributions of this thesis are as following:
• It introduces a novel system, Multi-microservices Multi-virtualization Multicloud
monitoring (M3 ) that provides a holistic approach to monitor the performance
of microservice-based application stacks deployed across multiple cloud
data centers.
• A framework forMonitoring, Anomaly Detection and Localization System (MADLS)
which utilizes a simpli ed approach that depends on commonly available metrics
o ering a simpli ed deployment environment for the developer.
• Developing a uni ed monitoring model for cloud-edge that provides an IoT application
administrator with detailed QoS information related to microservices
deployed across cloud and edge datacenters.Royal Embassy of Saudi Arabia Cultural
Bureau in London, government of Saudi Arabi
Automating Computational Placement for the Internet of Things
PhD ThesisThe PATH2iot platform presents a new approach to distributed data analytics for Internet of
Things applications. It automatically partitions and deploys stream-processing computations
over the available infrastructure (e.g. sensors, field gateways, clouds and the networks that
connect them) so as to meet non-functional requirements including network limitations and
energy. To enable this, the user gives a high-level declarative description of the computation as
a set of Event Processing Language queries. These are compiled, optimised, and partitioned
to meet the non-functional requirements using a combination of distributed query processing
techniques that optimise the computation, and cost models that enable PATH2iot to select the
best deployment plan given the non-functional requirements. This thesis describes the resulting
PATH2iot system, illustrated with two real-world use cases. First, a digital healthcare analytics
system in which sensor battery life is the main non-functional requirement to be optimized.
This shows that the tool can automatically partition and distribute the computation across a
healthcare wearable, a mobile phone and the cloud - increasing the battery life of the smart watch
by 453% when compared to other possible allocations. The energy cost of sending messages over
a wireless network is a key component of the cost model, and we show how this can be modelled.
Furthermore, the uncertainty of the model is addressed with two alternative approaches: one
frequentist and one Bayesian The second use case is one in which an acoustic data analytics for
transport monitoring is automatically distributed so as enable it to run over a low-bandwidth
LORA network connecting the sensor to the cloud. Overall, the paper shows how the PATH2iot
system can automatically bring the benefits of edge computing to the increasing set of IoT
applications that perform distributed data analytics
Intelligent IoT and Dynamic Network Semantic Maps for more Trustworthy Systems
As technology evolves, the Internet of Things (IoT) concept is gaining importance for constituting a foundation to reach optimum connectivity between people and things. For this to happen and to allow easier integration of sensors and other devices in these technologic environments (or networks), the configuration is a key process, promoting interoperability between heterogeneous devices and providing strategies and processes to enhance the network capabilities. The optimization of this important process of creating a truly dynamic network must be based on models that provide a standardization of communication patterns, protocols and technologies between the sensors. Despite standing as a major tendency today, many obstacles still arise when implementing an intelligent dynamic network. Existing models are not as widely adopted as expected and semantics are often not properly represented, hence resulting in complex and unsuitable configuration time. Thus, this work aims to understand the ideal models and ontologies to achieve proper architectures and semantic maps, which allow management and redundancy based on the information of the whole network, without compromising performance, and to develop a competent configuration of sensors to integrate in a contemporary industrial typical dynamic network
Semantic IoT for reasoning and BigData analytics
Recent developments in the IoT industries have led to an increase in data availability that is starting to weight heavily on the traditional idea of pushing data to the Cloud. This study focuses on identifying tasks that can be pulled from the Cloud in a semantic stream processing context
Innovative techniques for deployment of microservices in cloud-edge environment
PhD ThesisThe evolution of microservice architecture allows complex applications to be structured
into independent modular components (microservices) making them easier to develop
and manage. Complemented with containers, microservices can be deployed across
any cloud and edge environment. Although containerized microservices are getting
popular in industry, less research is available specially in the area of performance
characterization and optimized deployment of microservices.
Depending on the application type (e.g. web, streaming) and the provided functionalities
(e.g. ltering, encryption/decryption, storage), microservices are heterogeneous
with speci c functional and Quality of Service (QoS) requirements. Further, cloud
and edge environments are also complex with a huge number of cloud providers and
edge devices along with their host con gurations. Due to these complexities, nding
a suitable deployment solution for microservices becomes challenging.
To handle the deployment of microservices in cloud and edge environments, this thesis
presents multilateral research towards microservice performance characterization,
run-time evaluation and system orchestration. Considering a variety of applications,
numerous algorithms and policies have been proposed, implemented and prototyped.
The main contributions of this thesis are given below:
Characterizes the performance of containerized microservices considering various
types of interference in the cloud environment.
Proposes and models an orchestrator, SDBO for benchmarking simple webapplication
microservices in a multi-cloud environment. SDBO is validated using
an e-commerce test web-application.
Proposes and models an advanced orchestrator, GeoBench for the deployment of
complex web-application microservices in a multi-cloud environment. GeoBench
is validated using a geo-distributed test web-application.
- i -
Proposes and models a run-time deployment framework for distributed streaming
application microservices in a hybrid cloud-edge environment. The model is
validated using a real-world healthcare analytics use case for human activity
recognition.