6 research outputs found

    On transformation of query scheduling strategies in distributed and heterogeneous database systems

    Get PDF
    This work considers a problem of optimal query processing in heterogeneous and distributed database systems. A global query sub- mitted at a local site is decomposed into a number of queries processed at the remote sites. The partial results returned by the queries are in- tegrated at a local site. The paper addresses a problem of an optimal scheduling of queries that minimizes time spend on data integration of the partial results into the final answer. A global data model defined in this work provides a unified view of the heterogeneous data structures located at the remote sites and a system of operations is defined to ex- press the complex data integration procedures. This work shows that the transformations of an entirely simultaneous query processing strate- gies into a hybrid (simultaneous/sequential) strategy may in some cases lead to significantly faster data integration. We show how to detect such cases, what conditions must be satisfied to transform the schedules, and how to transform the schedules into the more efficient ones

    Predicting query execution time: Are optimizer cost models really unusable?

    Full text link

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

    Efficiency modelling in collaborative filtering-based recommendation systems

    Get PDF
    In the past decade, Machine Learning (ML) models have become a critical part of large scale analytics frameworks to solve different problems, such as identify trends and patterns in the data, manipulate images, classify text, and produce recommendations. For the latter (i.e., produce recommendations), ML frameworks have been extended to incorporate both specific recommendation algorithms (e.g., SlopeOne [1]), but also more generalised models (e.g., K-Nearest Neighbours (KNN) [2]) that can be applied not only to recommendation tasks, such as rating prediction or item ranking, but also other classes of ML problems. This thesis examines an important and popular area of the Recommendation Systems (RS) design space, focusing on algorithms that are both specifically designed for producing recommendations, as well as other types of algorithms that are also found in the wider ML field. However, the latter will be only showcased in RS-based use-cases to allow comparison with specific RS models. Throughout the past years, there have been increased interest in RS from both academia and industry, which led to the development of numerous recommendation algorithms [3]. While there are different families of recommendation models (e.g., Matrix Factorisation (MF)-based, K-Nearest Neighbours (KNN)-based), they can be grouped in three classes as follows: Collaborative Filtering (CF), Content-based Filtering (CBF), and Hybrid Approaches (HA). This thesis investigates the most popular class of RS, namely Collaborative Filtering-based (CF) recommendation algorithms, which recommend items to a user based on similar users’ preferences. One of the current challenges in building CF engines is the selection of the algorithms to be used for producing recommendations. It is often the case that a one-CFmodel-fits-all solution becomes unfeasible due to the dynamic relationship between users and items, and the rate at which new algorithms are proposed in the literature. This challenge is exacerbated by the constant growth of the input data, which in turn impacts the efficiency of these models, as more computational resources are required to train the algorithms on large collections to attain a predefined/desired quality of recommendations. In CF, these challenges have also impacted the way providers deliver content to the users, as they need to strike a balance between revenue maximisation (i.e., how many resources are spent for training the CF models) and the users’ satisfaction (i.e., produce relevant recommendations for the users). In addition, CF models need to be periodically retrained to capture the latest user preferences and interactions with the items, and hence, content providers have to decide whether and when to retrain their CF algorithms, such that the high training times and resource utilisation costs are kept within the operational and monetary budget. Therefore, the problem of estimating resource consumption for CF becomes of critical importance. In this thesis, we address the pressing challenge of predicting the efficiency (i.e., computational resources spent during training) of traditional and neural CF for a number of popular representatives, including algorithms based on Matrix Factorisation (MF), KNearest Neighbours (KNN), Co-clustering, Slope One schemes, as well as well-known types of Deep Learning (DL) architectures, such as Variational Autoencoder (VAE), Multilayer Perceptron (MLP), and Convolutional Neural Network (CNN). To this end, we first study the computational complexity of the training phase of said CF models and derive time and space complexity equations. Then, using characteristics of the input and the aforementioned equations, we contribute a methodology for predicting the processing time, memory overhead, and GPU utilisation of the CF’s training phase. Our contributions further include an adaptive sampling strategy, to address the trade-off between the computational cost of sampling the dataset and training the CF models on the said samples and the accuracy of the estimated resource consumption of the CF trained on a full collection. Furthermore, we provide a framework which quantifies both the training efficiency (i.e., resource consumption) of CF, as well as the quality of the recommendations produced by the said CF once it has been trained. Finally, systematic experimental evaluations demonstrate that our methodology outperforms state-of-the-art regression schemes (i.e., BB/GBM) by a considerable margin (e.g., for predicting the processing time of CF, the accuracy of WB/LR is 160% higher than the one of BB/GBM), with an overhead that is a small fraction (e.g., 3-4 times smaller) of the overall requirements of CF training
    corecore