1,973 research outputs found
Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution
Cloud controllers aim at responding to application demands by automatically
scaling the compute resources at runtime to meet performance guarantees and
minimize resource costs. Existing cloud controllers often resort to scaling
strategies that are codified as a set of adaptation rules. However, for a cloud
provider, applications running on top of the cloud infrastructure are more or
less black-boxes, making it difficult at design time to define optimal or
pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions
often is delegated to the cloud application. Yet, in most cases, application
developers in turn have limited knowledge of the cloud infrastructure. In this
paper, we propose learning adaptation rules during runtime. To this end, we
introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE
learns and modifies fuzzy rules at runtime. The benefit is that for designing
cloud controllers, we do not have to rely solely on precise design-time
knowledge, which may be difficult to acquire. FQL4KE empowers users to specify
cloud controllers by simply adjusting weights representing priorities in system
goals instead of specifying complex adaptation rules. The applicability of
FQL4KE has been experimentally assessed as part of the cloud application
framework ElasticBench. The experimental results indicate that FQL4KE
outperforms our previously developed fuzzy controller without learning
mechanisms and the native Azure auto-scaling
Scalable Statistical Modeling and Query Processing over Large Scale Uncertain Databases
The past decade has witnessed a large number of novel applications that generate imprecise, uncertain and incomplete data. Examples include monitoring infrastructures such as RFIDs, sensor networks and web-based applications such as information extraction, data integration, social networking and so on. In my dissertation, I addressed several challenges in managing such data and developed algorithms for efficiently executing queries over large volumes of such data. Specifically, I focused on the following challenges.
First, for meaningful analysis of such data, we need the ability to remove noise and infer useful information from uncertain data. To address this challenge, I first developed a declarative system for applying dynamic probabilistic models to databases and data streams. The output of such probabilistic modeling is probabilistic data, i.e., data annotated with probabilities of correctness/existence. Often, the data also exhibits strong correlations. Although there is prior work in managing and querying such probabilistic data using probabilistic databases, those approaches largely assume independence and cannot handle probabilistic data with rich correlation structures. Hence, I built a probabilistic database system that can manage large-scale correlations and developed algorithms for efficient query evaluation. Our system allows users to provide uncertain data as input and to specify arbitrary correlations among the entries in the database. In the back end, we represent correlations as a forest of junction trees, an alternative representation for probabilistic graphical models (PGM). We execute queries over the probabilistic database by transforming them into message passing algorithms (inference) over the junction tree. However, traditional algorithms over junction trees typically require accessing the entire tree, even for small queries. Hence, I developed an index data structure over the junction tree called INDSEP that allows us to circumvent this process and thereby scalably evaluate inference queries, aggregation queries and SQL queries over the probabilistic database.
Finally, query evaluation in probabilistic databases typically returns output tuples along with their probability values. However, the existing query evaluation model provides very little intuition to the users: for instance, a user might want to know Why is this tuple in my result? or Why does this output tuple have such high probability? or Which are the most influential input tuples for my query ?'' Hence, I designed a query evaluation model, and a suite of algorithms, that provide users with explanations for query results, and enable users to perform sensitivity analysis to better understand the query results
Using features for automated problem solving
We motivate and present an architecture for problem solving where an abstraction
layer of "features" plays the key role in determining methods to apply. The system
is presented in the context of theorem proving with Isabelle, and we demonstrate
how this approach to encoding control knowledge is expressively different to
other common techniques. We look closely at two areas where the feature
layer may offer benefits to theorem proving — semi-automation and learning
—
and find strong evidence that in these particular domains, the approach shows
compelling promise. The system includes a graphical theorem-proving user
interface for Eclipse ProofGeneral and is available from the project web page,
http://feasch.heneveld.org
Learning Models over Relational Data using Sparse Tensors and Functional Dependencies
Integrated solutions for analytics over relational databases are of great
practical importance as they avoid the costly repeated loop data scientists
have to deal with on a daily basis: select features from data residing in
relational databases using feature extraction queries involving joins,
projections, and aggregations; export the training dataset defined by such
queries; convert this dataset into the format of an external learning tool; and
train the desired model using this tool. These integrated solutions are also a
fertile ground of theoretically fundamental and challenging problems at the
intersection of relational and statistical data models.
This article introduces a unified framework for training and evaluating a
class of statistical learning models over relational databases. This class
includes ridge linear regression, polynomial regression, factorization
machines, and principal component analysis. We show that, by synergizing key
tools from database theory such as schema information, query structure,
functional dependencies, recent advances in query evaluation algorithms, and
from linear algebra such as tensor and matrix operations, one can formulate
relational analytics problems and design efficient (query and data)
structure-aware algorithms to solve them.
This theoretical development informed the design and implementation of the
AC/DC system for structure-aware learning. We benchmark the performance of
AC/DC against R, MADlib, libFM, and TensorFlow. For typical retail forecasting
and advertisement planning applications, AC/DC can learn polynomial regression
models and factorization machines with at least the same accuracy as its
competitors and up to three orders of magnitude faster than its competitors
whenever they do not run out of memory, exceed 24-hour timeout, or encounter
internal design limitations.Comment: 61 pages, 9 figures, 2 table
Investigation into the use of evolutionary algorithms for fully automated planning
This thesis presents a new approach to the Arti cial Intelligence (AI) problem of fully
automated planning. Planning is the act of deliberation before acting that guides rational
behaviour and is a core area of AI. Many practical real-world problems can be
classed as planning problems, therefore practical and theoretical developments in AI
planning are well motivated. Unfortunately, planning for even toy domains is hard,
many different search algorithms have been proposed, and new approaches are actively
encouraged.
The approach taken in this thesis is to adopt ideas from Evolutionary Algorithms
(EAs) and apply the techniques to fully automated plan synthesis. EA methods have
enjoyed great success in many problem areas of AI. They are a new kind of search
technique that have their foundation in evolution. Previous attempts to apply EAs to
plan synthesis have promised encouraging results, but have been ad-hoc and piecemeal.
This thesis thoroughly investigates the approach of applying evolutionary search
to the fully automated planning problem. This is achieved by developing and modifying
a proof of concept planner called GENPLAN. Before EA-based systems can be
used, a thorough examination of various parameter settings must be explored. Once
this was completed, the performance of GENPLAN was evaluated using a selection of
benchmark domains and other competition style planners. The dif culties raised by
the benchmark domains and the extent to which they cause problems for the approach
are highlighted along with problems associated with EA search. Modi cations are proposed
and experimented with in an attempt to alleviate some of the identi ed problems.
EAs offer a exible framework for fully automated planning, but demonstrate a clear
weakness across a range of currently used benchmark domains for plan synthesis
Automaton Meet Algebra: A Hybrid Paradigm for Efficiently Processing XQuery over XML Stream
XML stream applications bring the challenge of efficiently processing queries on sequentially accessible token-based data streams. The automaton paradigm is naturally suited for pattern retrieval on tokenized XML streams, but requires patches for implementing the filtering or restructuring functionalities common for the XML query languages. In contrast, the algebraic paradigm is well-established for processing self-contained tuples. However, it does not traditionally support token inputs. This dissertation proposes a framework called Raindrop, which accommodates both the automaton and algebra paradigms to take advantage of both. First, we propose an architecture for Raindrop. Raindrop is an algebra framework that models queries at different abstraction levels. We represent the token-based automaton computations as an algebraic subplan at the high level while exposing the automaton details at the low level. The algebraic subplan modeling automaton computations can thus be integrated with the algebraic subplan modeling the non-automaton computations. Second, we explore a novel optimization opportunity. Other XML stream processing systems always retrieve all the patterns in a query in the automaton. In contrast, Raindrop allows a plan to retrieve some of the pattern retrieval in the automaton and some out of the automaton. This opens up an automaton-in-or-out optimization opportunity. We study this optimization in two types of run-time environments, one with stable data characteristics and one with fluctuating data characteristics. We provide search strategies catering to each environment. We also describe how to migrate from a currently running plan to a new plan at run-time. Third, we optimize the automaton computations using the schema knowledge. A set of criteria are established to decide what schema constraints are useful to a given query. Optimization rules utilizing different types of schema constraints are proposed based on the criteria. We design a rule application algorithm which ensures both completeness (i.e., no optimization is missed) and minimality (i.e., no redundant optimization is introduced). The experimentations on both real and synthetic data illustrate that these techniques bring significant performance improvement with little overhead
On the Completeness of Replacing Primitive Actions with Macro-actions and its Generalization to Planning Operators and Macro-operators
Automated planning, which deals with the problem of generating sequences of actions, is an emerging research topic due to its potentially wide range of real-world application domains. As well as developing and improving planning engines, the acquisition of domain-specific knowledge is a promising way to improve the planning process. Domain-specific knowledge can be encoded into the modelling language that a range of planning engines can accept. This makes encoding domain-specific knowledge planner-independent, and entails reformulating the domain models and/or problem specifications. While many encouraging practical results have been derived from such reformulation methods (e.g. learning macro-actions), little attention has been paid to the theoretical properties such as completeness (keeping solvability of reformulated problems). In this paper, we focus on a special case – removing primitive actions replaced by macro-actions. We provide a theoretical study and come up with conditions under which it is safe to remove primitive actions, so completeness of reformulation is preserved. We extend this study also for planning operators (actions are instances of operators)
Modeling of systems
The handbook contains the fundamentals of modeling of complex systems. The classification of mathematical models is represented and the methods of their construction are given. The analytical modeling of the basic types of processes in the complex systems is considered. The principles of simulation, statistical and business processes modeling are described. The handbook is oriented on students of higher education establishments that obtain a degree in directions of “Software engineering” and “Computer science” as well as on lecturers and specialists in the domain of computer modeling
Mining Query Plans for Finding Candidate Queries and Sub-Queries for Materialized Views in BI Systems Without Cube Generation
Materialized views are important for optimizing Business Intelligence (BI) systems when they are designed without data cubes. Selecting candidate queries from large number of queries for materialized views is a challenging task. Most of the work done in the past involves finding out frequent queries from the past workload and creating materialized views from such queries by either manually analyzing workload or using approximate string matching algorithms using query text. Most of the existing methods suggest complete queries but ignore query components such as sub queries for creation of materialized views. This paper presents a novel method to determine on which queries and query components materialized views can be created to optimize aggregate and join queries by mining database of query execution plans which are in the form of binary trees. The proposed algorithm showed significant improvement in terms of more number of optimized queries because it is using the execution plan tree of the query as a basis of selection of query to be optimized using materialized views rather than choosing query text which is used by traditional methods. For selecting a correct set of queries to be optimized using materialized views, the paper proposes efficient specialized frequent tree component mining algorithm with novel heuristics to prune search space. These frequent components are used to determine the possible set of candidate queries for creation of materialized views. Experimentation on standard, real and synthetic data sets, and also the theoretical basis, proved that the proposed method is able to optimize a large number of queries with less number of materialized views and showed a significant improvement in performance compared to traditional methods
Management Information Sources and Corporate Intelligence Systems
In this book the word “intelligence” is used in several different contexts. Intelligence can refer to the process of gathering data; it can refer to the data itself; and it can refer to the application of knowledge to product useful information from the data. We will see in this chapter how the computer can be used in business to further all three aspects of intelligence: capturing the data, storing the data in an accessible form, and adding value to the data by transforming it into useful information for decision making. This chapter is organized according to these three areas of computer support for business intelligence: 1. Transaction processing and intelligence capture 2. Data-base management and intelligence storage and retrieval 3. Decision support systems and intelligence processing We provide an overview of the concepts in transaction processing, data-base management, and decision support systems. References are listed in each of these areas for further details. Our purpose is to provide the perspective for the executive to detenmne the use of these concepts for his or her company and to understand the choices open to him or her in today’s technology
- …