314 research outputs found
MACHS: Mitigating the Achilles Heel of the Cloud through High Availability and Performance-aware Solutions
Cloud computing is continuously growing as a business model for hosting information and communication technology applications. However, many concerns arise regarding the quality of service (QoS) offered by the cloud. One major challenge is the high availability (HA) of cloud-based applications. The key to achieving availability requirements is to develop an approach that is immune to cloud failures while minimizing the service level agreement (SLA) violations. To this end, this thesis addresses the HA of cloud-based applications from different perspectives. First, the thesis proposes a component’s HA-ware scheduler (CHASE) to manage the deployments of carrier-grade cloud applications while maximizing their HA and satisfying the QoS requirements. Second, a Stochastic Petri Net (SPN) model is proposed to capture the stochastic characteristics of cloud services and quantify the expected availability offered by an application deployment. The SPN model is then associated with an extensible policy-driven cloud scoring system that integrates other cloud challenges (i.e. green and cost concerns) with HA objectives. The proposed HA-aware solutions are extended to include a live virtual machine migration model that provides a trade-off between the migration time and the downtime while maintaining HA objective. Furthermore, the thesis proposes a generic input template for cloud simulators, GITS, to facilitate the creation of cloud scenarios while ensuring reusability, simplicity, and portability. Finally, an availability-aware CloudSim extension, ACE, is proposed. ACE extends CloudSim simulator with failure injection, computational paths, repair, failover, load balancing, and other availability-based modules
Website Performance Evaluation and Estimation in an E-business Environment
This thesis introduces a new Predictus-model for performance evaluation and estimation in a multi-layer website environment. The model is based on soft computing ideas, i.e. simulation and statistical analysis. The aim is to improve energy consumption of the website's hardware and investment efficiency and to avoid loss of availability. The aim of optimised exploitation is reduced energy and maintenance costs on the one hand and increased end-user satisfaction due to robust and stable web services on the other.
A method based on simulation of user requests is described. Instead of ordinary static parameter set, the dynamic extraction from previous log files is used. The distribution of existing requests is exploited to generate the actual based natural load. By loading the server system with valid and well-known requests, the behaviour of the server system is natural. The control back loop on the generation of work load assures the validity of the work load in the long-term.
A method for identifying the actual performance of the website is described. Using the well-known load in simulation of usage by a large number of virtual users and observing the utilisation rate of server resources ensure the best information for the internal state of the system. The disturbance of the service website usage can be avoided using the mathematical extrapolation method to reach the saturation point on the single server resource
A Game-Theoretic Approach to Strategic Resource Allocation Mechanisms in Edge and Fog Computing
With the rapid growth of Internet of Things (IoT), cloud-centric application management raises
questions related to quality of service for real-time applications. Fog and edge computing
(FEC) provide a complement to the cloud by filling the gap between cloud and IoT. Resource
management on multiple resources from distributed and administrative FEC nodes is a key
challenge to ensure the quality of end-user’s experience. To improve resource utilisation and
system performance, researchers have been proposed many fair allocation mechanisms for
resource management. Dominant Resource Fairness (DRF), a resource allocation policy for
multiple resource types, meets most of the required fair allocation characteristics. However,
DRF is suitable for centralised resource allocation without considering the effects (or
feedbacks) of large-scale distributed environments like multi-controller software defined
networking (SDN). Nash bargaining from micro-economic theory or competitive equilibrium
equal incomes (CEEI) are well suited to solving dynamic optimisation problems proposing to
‘proportionately’ share resources among distributed participants. Although CEEI’s
decentralised policy guarantees load balancing for performance isolation, they are not faultproof
for computation offloading.
The thesis aims to propose a hybrid and fair allocation mechanism for rejuvenation of
decentralised SDN controller deployment. We apply multi-agent reinforcement learning
(MARL) with robustness against adversarial controllers to enable efficient priority scheduling
for FEC. Motivated by software cybernetics and homeostasis, weighted DRF is generalised by
applying the principles of feedback (positive or/and negative network effects) in reverse game
theory (GT) to design hybrid scheduling schemes for joint multi-resource and multitask
offloading/forwarding in FEC environments.
In the first piece of study, monotonic scheduling for joint offloading at the federated edge is
addressed by proposing truthful mechanism (algorithmic) to neutralise harmful negative and
positive distributive bargain externalities respectively. The IP-DRF scheme is a MARL
approach applying partition form game (PFG) to guarantee second-best Pareto optimality
viii | P a g e
(SBPO) in allocation of multi-resources from deterministic policy in both population and
resource non-monotonicity settings. In the second study, we propose DFog-DRF scheme to
address truthful fog scheduling with bottleneck fairness in fault-probable wireless hierarchical
networks by applying constrained coalition formation (CCF) games to implement MARL. The
multi-objective optimisation problem for fog throughput maximisation is solved via a
constraint dimensionality reduction methodology using fairness constraints for efficient
gateway and low-level controller’s placement.
For evaluation, we develop an agent-based framework to implement fair allocation policies in
distributed data centre environments. In empirical results, the deterministic policy of IP-DRF
scheme provides SBPO and reduces the average execution and turnaround time by 19% and
11.52% as compared to the Nash bargaining or CEEI deterministic policy for 57,445 cloudlets
in population non-monotonic settings. The processing cost of tasks shows significant
improvement (6.89% and 9.03% for fixed and variable pricing) for the resource non-monotonic
setting - using 38,000 cloudlets. The DFog-DRF scheme when benchmarked against asset fair
(MIP) policy shows superior performance (less than 1% in time complexity) for up to 30 FEC
nodes. Furthermore, empirical results using 210 mobiles and 420 applications prove the
efficacy of our hybrid scheduling scheme for hierarchical clustering considering latency and
network usage for throughput maximisation.Abubakar Tafawa Balewa University, Bauchi (Tetfund, Nigeria
Inoculation and amendment strategies influence switchgrass establishment in degraded soil
Bioenergy feedstock production on degraded land can serve as a means for modulating land competition for food versus energy. Due to little or no agricultural value of degraded soil, fortification of the soil with an organic amendment or inoculum will improve biomass productivity. However, as farmers struggle to rejuvenate their degraded land, there is a need for a quick screening strategy to select the best method of enhancing cellulosic (switchgrass, SG) biomass production in degraded soil. The goal of this study is to evaluate the effects of soil amendment and inoculation strategies on biomass productivities of SG in a reclaimed surface-mined soil (RMS). Experiments were conducted in the greenhouse using moisture replacement microcosms (MRM) to screen strategies for enhancing biomass productivities of SG in a RMS. Strategies included soil amendment with organic by-products (poultry litter, paper mill sludge, and vermicompost), inorganic nutrients (nitrogen and phosphorus fertilizers), or a commercial preparation of endomycorrhizae fungi (AMF, BioVam). Experiments were implemented with ten (10) treatments with six replicates for each treatment. After eight weeks of incubation in MRM systems, inoculation of RMS with AMF produced the highest aboveground and total biomass (0.9 g and 1.77 g per microcosm container) at p \u3c 0.05. The total biomass of commercial AMF significantly (p \u3c 0.05) outperformed all other treatments in the order of AMF \u3e AMF + VC \u3e PMS + N \u3e VC = PMS = PL \u3e PMS + AMF \u3e N + P \u3e ASL \u3e Control. This microcosm screening experiment served as a quick screening to establish that soil enhancement and inoculation strategies can enhance biomass productivities of SG in degraded soil
Computational solutions for addressing heterogeneity in DNA methylation data
DNA methylation, a reversible epigenetic modification, has been implicated with various bi- ological processes including gene regulation. Due to the multitude of datasets available, it is a premier candidate for computational tool development, especially for investigating hetero- geneity within and across samples. We differentiate between three levels of heterogeneity in DNA methylation data: between-group, between-sample, and within-sample heterogeneity. Here, we separately address these three levels and present new computational approaches to quantify and systematically investigate heterogeneity. Epigenome-wide association studies relate a DNA methylation aberration to a phenotype and therefore address between-group heterogeneity. To facilitate such studies, which necessar- ily include data processing, exploratory data analysis, and differential analysis of DNA methy- lation, we extended the R-package RnBeads. We implemented novel methods for calculating the epigenetic age of individuals, novel imputation methods, and differential variability analysis. A use-case of the new features is presented using samples from Ewing sarcoma patients. As an important driver of epigenetic differences between phenotypes, we systematically investigated associations between donor genotypes and DNA methylation states in methylation quantitative trait loci (methQTL). To that end, we developed a novel computational framework –MAGAR– for determining statistically significant associations between genetic and epigenetic variations. We applied the new pipeline to samples obtained from sorted blood cells and complex bowel tissues of healthy individuals and found that tissue-specific and common methQTLs have dis- tinct genomic locations and biological properties. To investigate cell-type-specific DNA methylation profiles, which are the main drivers of within-group heterogeneity, computational deconvolution methods can be used to dissect DNA methylation patterns into latent methylation components. Deconvolution methods require pro- files of high technical quality and the identified components need to be biologically interpreted. We developed a computational pipeline to perform deconvolution of complex DNA methyla- tion data, which implements crucial data processing steps and facilitates result interpretation. We applied the protocol to lung adenocarcinoma samples and found indications of tumor in- filtration by immune cells and associations of the detected components with patient survival. Within-sample heterogeneity (WSH), i.e., heterogeneous DNA methylation patterns at a ge- nomic locus within a biological sample, is often neglected in epigenomic studies. We present the first systematic benchmark of scores quantifying WSH genome-wide using simulated and experimental data. Additionally, we created two novel scores that quantify DNA methyla- tion heterogeneity at single CpG resolution with improved robustness toward technical biases. WSH scores describe different types of WSH in simulated data, quantify differential hetero- geneity, and serve as a reliable estimator of tumor purity. Due to the broad availability of DNA methylation data, the levels of heterogeneity in DNA methylation data can be comprehensively investigated. We contribute novel computational frameworks for analyzing DNA methylation data with respect to different levels of hetero- geneity. We envision that this toolbox will be indispensible for understanding the functional implications of DNA methylation patterns in health and disease.DNA Methylierung ist eine reversible, epigenetische Modifikation, die mit verschiedenen biologischen Prozessen wie beispielsweise der Genregulation in Verbindung steht. Eine Vielzahl von DNA Methylierungsdatensätzen bildet die perfekte Grundlage zur Entwicklung von Softwareanwendungen, insbesondere um Heterogenität innerhalb und zwischen Proben zu beschreiben. Wir unterscheiden drei Ebenen von Heterogenität in DNA Methylierungsdaten: zwischen Gruppen, zwischen Proben und innerhalb einer Probe. Hier betrachten wir die drei Ebenen von Heterogenität in DNA Methylierungsdaten unabhängig voneinander und präsentieren neue Ansätze um die Heterogenität zu beschreiben und zu quantifizieren. Epigenomweite Assoziationsstudien verknüpfen eine DNA Methylierungsveränderung mit einem Phänotypen und beschreiben Heterogenität zwischen Gruppen. Um solche Studien, welche Datenprozessierung, sowie exploratorische und differentielle Datenanalyse beinhalten, zu vereinfachen haben wir die R-basierte Softwareanwendung RnBeads erweitert. Die Erweiterungen beinhalten neue Methoden, um das epigenetische Alter vorherzusagen, neue Schätzungsmethoden für fehlende Datenpunkte und eine differentielle Variabilitätsanalyse. Die Analyse von Ewing-Sarkom Patientendaten wurde als Anwendungsbeispiel für die neu entwickelten Methoden gewählt. Wir untersuchten Assoziationen zwischen Genotypen und DNA Methylierung von einzelnen CpGs, um sogenannte methylation quantitative trait loci (methQTL) zu definieren. Diese stellen einen wichtiger Faktor dar, der epigenetische Unterschiede zwischen Gruppen induziert. Hierzu entwickelten wir ein neues Softwarepaket (MAGAR), um statistisch signifikante Assoziationen zwischen genetischer und epigenetischer Variation zu identifizieren. Wir wendeten diese Pipeline auf Blutzelltypen und komplexe Biopsien von gesunden Individuen an und konnten gemeinsame und gewebespezifische methQTLs in verschiedenen Bereichen des Genoms lokalisieren, die mit unterschiedlichen biologischen Eigenschaften verknüpft sind. Die Hauptursache für Heterogenität innerhalb einer Gruppe sind zelltypspezifische DNA Methylierungsmuster. Um diese genauer zu untersuchen kann Dekonvolutionssoftware die DNA Methylierungsmatrix in unabhängige Variationskomponenten zerlegen. Dekonvolutionsmethoden auf Basis von DNA Methylierung benötigen technisch hochwertige Profile und die identifizierten Komponenten müssen biologisch interpretiert werden. In dieser Arbeit entwickelten wir eine computerbasierte Pipeline zur Durchführung von Dekonvolutionsexperimenten, welche die Datenprozessierung und Interpretation der Resultate beinhaltet. Wir wendeten das entwickelte Protokoll auf Lungenadenokarzinome an und fanden Anzeichen für eine Tumorinfiltration durch Immunzellen, sowie Verbindungen zum Überleben der Patienten. Heterogenität innerhalb einer Probe (within-sample heterogeneity, WSH), d.h. heterogene Methylierungsmuster innerhalb einer Probe an einer genomischen Position, wird in epigenomischen Studien meist vernachlässigt. Wir präsentieren den ersten Vergleich verschiedener, genomweiter WSH Maße auf simulierten und experimentellen Daten. Zusätzlich entwickelten wir zwei neue Maße um WSH für einzelne CpGs zu berechnen, welche eine verbesserte Robustheit gegenüber technischen Faktoren aufweisen. WSH Maße beschreiben verschiedene Arten von WSH, quantifizieren differentielle Heterogenität und sagen Tumorreinheit vorher. Aufgrund der breiten Verfügbarkeit von DNA Methylierungsdaten können die Ebenen der Heterogenität ganzheitlich beschrieben werden. In dieser Arbeit präsentieren wir neue Softwarelösungen zur Analyse von DNA Methylierungsdaten in Bezug auf die verschiedenen Ebenen der Heterogenität. Wir sind davon überzeugt, dass die vorgestellten Softwarewerkzeuge unverzichtbar für das Verständnis von DNA Methylierung im kranken und gesunden Stadium sein werden
Bayesian models of category acquisition and meaning development
The ability to organize concepts (e.g., dog, chair) into efficient mental representations,
i.e., categories (e.g., animal, furniture) is a fundamental mechanism which allows humans
to perceive, organize, and adapt to their world. Much research has been dedicated
to the questions of how categories emerge and how they are represented. Experimental
evidence suggests that (i) concepts and categories are represented through sets of
features (e.g., dogs bark, chairs are made of wood) which are structured into different
types (e.g, behavior, material); (ii) categories and their featural representations are
learnt jointly and incrementally; and (iii) categories are dynamic and their representations
adapt to changing environments.
This thesis investigates the mechanisms underlying the incremental and dynamic formation
of categories and their featural representations through cognitively motivated
Bayesian computational models. Models of category acquisition have been extensively
studied in cognitive science and primarily tested on perceptual abstractions or artificial
stimuli. In this thesis, we focus on categories acquired from natural language stimuli,
using nouns as a stand-in for their reference concepts, and their linguistic contexts as
a representation of the concepts’ features. The use of text corpora allows us to (i) develop
large-scale unsupervised models thus simulating human learning, and (ii) model
child category acquisition, leveraging the linguistic input available to children in the
form of transcribed child-directed language.
In the first part of this thesis we investigate the incremental process of category acquisition.
We present a Bayesian model and an incremental learning algorithm which
sequentially integrates newly observed data. We evaluate our model output against
gold standard categories (elicited experimentally from human participants), and show
that high-quality categories are learnt both from child-directed data and from large,
thematically unrestricted text corpora. We find that the model performs well even under
constrained memory resources, resembling human cognitive limitations. While
lists of representative features for categories emerge from this model, they are neither
structured nor jointly optimized with the categories.
We address these shortcomings in the second part of the thesis, and present a Bayesian
model which jointly learns categories and structured featural representations. We
present both batch and incremental learning algorithms, and demonstrate the model’s
effectiveness on both encyclopedic and child-directed data. We show that high-quality
categories and features emerge in the joint learning process, and that the structured
features are intuitively interpretable through human plausibility judgment evaluation.
In the third part of the thesis we turn to the dynamic nature of meaning: categories and
their featural representations change over time, e.g., children distinguish some types
of features (such as size and shade) less clearly than adults, and word meanings adapt
to our ever changing environment and its structure. We present a dynamic Bayesian
model of meaning change, which infers time-specific concept representations as a set
of feature types and their prevalence, and captures their development as a smooth process.
We analyze the development of concept representations in their complexity over
time from child-directed data, and show that our model captures established patterns of
child concept learning. We also apply our model to diachronic change of word meaning,
modeling how word senses change internally and in prevalence over centuries.
The contributions of this thesis are threefold. Firstly, we show that a variety of experimental
results on the acquisition and representation of categories can be captured
with computational models within the framework of Bayesian modeling. Secondly,
we show that natural language text is an appropriate source of information for modeling
categorization-related phenomena suggesting that the environmental structure that
drives category formation is encoded in this data. Thirdly, we show that the experimental
findings hold on a larger scale. Our models are trained and tested on a larger
set of concepts and categories than is common in behavioral experiments and the categories
and featural representations they can learn from linguistic text are in principle
unrestricted
Bayesian Prognostic Framework for High-Availability Clusters
Critical services from domains as diverse as finance, manufacturing and healthcare are often delivered by complex enterprise applications (EAs). High-availability clusters (HACs) are software-managed IT infrastructures that enable these EAs to operate with minimum downtime. To that end, HACs monitor the health of EA layers (e.g., application servers and databases) and resources (i.e., components), and attempt to reinitialise or restart failed resources swiftly. When this is unsuccessful, HACs try to failover (i.e., relocate) the resource group to which the failed resource belongs to another server. If the resource group failover is also unsuccessful, or when a system-wide critical failure occurs, HACs initiate a complete system failover.
Despite the availability of multiple commercial and open-source HAC solutions, these HACs (i) disregard important sources of historical and runtime information, and (ii) have limited reasoning capabilities. Therefore, they may conservatively perform unnecessary resource group or system failovers or delay justified failovers for longer than necessary.
This thesis introduces the first HAC taxonomy, uses it to carry out an extensive survey of current HAC solutions, and develops a novel Bayesian prognostic (BP) framework that addresses the significant HAC limitations that are mentioned above and are identified by the survey. The BP framework comprises four \emph{modules}. The first module is a technique for modelling high availability using a combination of established and new HAC characteristics. The second is a suite of methods for obtaining and maintaining the information required by the other modules. The third is a HAC-independent Bayesian decision network (BDN) that predicts whether resource failures can be managed locally (i.e., without failovers). The fourth is a method for constructing a HAC-specific Bayesian network for the fast prediction of resource group and system failures. Used together, these modules reduce the downtime of HAC-protected EAs significantly. The experiments presented in this thesis show that the BP framework can deliver downtimes between 5.5 and 7.9 times smaller than those obtained with an established open-source HAC
Automated Extraction of Behaviour Model of Applications
Highly replicated cloud applications are deployed only when they are deemed to be func-
tional. That is, they generally perform their task and their failure rate is relatively low.
However, even though failure is rare, it does occur and is very difficult to diagnose. We
devise a tool for failure diagnosis which learns the normal behaviour of an application in
terms of the statistical properties of variables used throughout its execution, and then
monitors it for deviation from these statistical properties. Our study reveals that many
variables have unique statistical characteristics that amount to an invariant of the pro-
gram. Therefore, any significant deviation from these characteristics reflects an abnormal
behaviour of the application which may be caused by a program error.
It is difficult to get the invariant from the application’s static code analysis alone. For
example, the name of a person usually does not include a semicolon; however, an intruder
may try to do a SQL injection (which will include a semicolon) through the ‘name’ field
while entering his information and be successful if there is no checking for this case. This
scenario can only be captured at runtime and may not be tested by the application de-
veloper. The character range of the ‘name’ variable is one of its statistical properties; by
learning this range from the execution of the application it is possible to detect the above
described abnormal input. Hence, monitoring the statistics of values taken by the different
variables of an application is an effective way to detect anomalies that can help to diagnose
the failure of the application.
We build a tool that collects frequent snapshots of the application’s heap and build a
statistical model solely from the extensional knowledge of the application. The extensional
knowledge is only obtainable from runtime data of the application without having any
description or explanation of the application’s execution flow. The model characterizes
the application’s normal behaviour. Collecting snapshots in form of memory dumps and determine the application’s behaviour model from them without code instrumentation
make our tool applicable in cases where instrumentation is computationally expensive.
Our approach allows a behaviour model to be automatically and efficiently built using
the monitoring data alone. We evaluate the utility of our approach by applying it on
an e-commerce application and online bidding system, and then derive different statisti-
cal properties of variables from their runtime-exhibited values. Our experimental result
demonstrates 96% accuracy in the generated statistical model with a maximum 1% per-
formance overhead. This accuracy is measured at the basis of generating less false positive
alerts when the application is running without any anomaly. The high accuracy and low
performance overhead indicates that our tool can successfully determine the application’s
normal behaviour without affecting the performance of the application and can be used to
monitor it in production time. Moreover, our tool also correctly detected two anomalous
condition while monitoring the application with a small amount of injected fault. In ad-
dition to anomaly detection, our tool logs all the variables of the application that violates
the learned model. The log file can help to diagnose any failure caused by the variables
and gives our tool a source-code granularity in fault localization
- …