Search CORE

539 research outputs found

Efficient Scalable Accurate Regression Queries in In-DBMS Analytics

Author: Anagnostopoulos Christos
Triantafillou Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2017
Field of study

Recent trends aim to incorporate advanced data analytics capabilities within DBMSs. Linear regression queries are fundamental to exploratory analytics and predictive modeling. However, computing their exact answers leaves a lot to be desired in terms of efficiency and scalability. We contribute a novel predictive analytics model and associated regression query processing algorithms, which are efficient, scalable and accurate. We focus on predicting the answers to two key query types that reveal dependencies between the values of different attributes: (i) mean-value queries and (ii) multivariate linear regression queries, both within specific data subspaces defined based on the values of other attributes. Our algorithms achieve many orders of magnitude improvement in query processing efficiency and nearperfect approximations of the underlying relationships among data attributes

Crossref

Warwick Research Archives Portal Repository

Enlighten

Leveraging Edge Computing through Collaborative Machine Learning

Author: Anagnostopoulos Christos
Portelli Kurt
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

The Internet of Things (IoT) offers the ability to analyze and predict our surroundings through sensor networks at the network edge. To facilitate this predictive functionality, Edge Computing (EC) applications are developed by considering: power consumption, network lifetime and quality of context inference. Humongous contextual data from sensors provide data scientists better knowledge extraction, albeit coming at the expense of holistic data transfer that threatens the network feasibility and lifetime. To cope with this, collaborative machine learning is applied to EC devices to (i) extract the statistical relationships and (ii) construct regression (predictive) models to maximize communication efficiency. In this paper, we propose a learning methodology that improves the prediction accuracy by quantizing the input space and leveraging the local knowledge of the EC devices

Crossref

Enlighten

Scalable aggregation predictive analytics: a query-driven machine learning approach

Author: Anagnostopoulos Christos
Savva Fotis
Triantafillou Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2018
Field of study

We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

Warwick Research Archives Portal Repository

Enlighten

Explaining Aggregates for Exploratory Analytics

Author: Anagnostopoulos Christos
Savva Fotis
Triantafillou Peter
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/01/2019
Field of study

Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information about the queried subspaces for exploratory analysis.We address this shortcoming aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism coined XAXA: eXplaining Aggregates for eXploratory Analytics. XAXA’s novel AQ explanations are represented using functions obtained by a three-fold joint optimization problem. Explanations assume the form of a set of parametric piecewise-linear functions acquired through a statistical learning model. A key feature of the proposed solution is that model training is performed by only monitoring AQs and their answers on-line. In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB. We evaluate the explanation accuracy and efficiency of XAXA through theoretically grounded metrics over real-world and synthetic datasets and query workloads

arXiv.org e-Print Archive

Crossref

Enlighten

Towards Integrated Data Analytics: Time Series Forecasting in DBMS

Author: Boehm Matthias
Dannecker Lars
Fischer Ulrike
Lehner Wolfgang
Rosenthal Frank
Siksnys Laurynas
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/01/2023
Field of study

Integrating sophisticated statistical methods into database management systems is gaining more and more attention in research and industry in order to be able to cope with increasing data volume and increasing complexity of the analytical algorithms. One important statistical method is time series forecasting, which is crucial for decision making processes in many domains. The deep integration of time series forecasting offers additional advanced functionalities within a DBMS. More importantly, however, it allows for optimizations that improve the efficiency, consistency, and transparency of the overall forecasting process. To enable efficient integrated forecasting, we propose to enhance the traditional 3-layer ANSI/SPARC architecture of a DBMS with forecasting functionalities. This article gives a general overview of our proposed enhancements and presents how forecast queries can be processed using an example from the energy data management domain. We conclude with open research topics and challenges that arise in this area

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa