4,411 research outputs found

    F-IVM: Analytics over Relational Databases under Updates

    Get PDF
    This article describes F-IVM, a unified approach for maintaining analytics over changing relational data. We exemplify its versatility in four disciplines: processing queries with group-by aggregates and joins; learning linear regression models using the covariance matrix of the input features; building Chow-Liu trees using pairwise mutual information of the input features; and matrix chain multiplication. F-IVM has three main ingredients: higher-order incremental view maintenance; factorized computation; and ring abstraction. F-IVM reduces the maintenance of a task to that of a hierarchy of simple views. Such views are functions mapping keys, which are tuples of input values, to payloads, which are elements from a ring. F-IVM also supports efficient factorized computation over keys, payloads, and updates. Finally, F-IVM treats uniformly seemingly disparate tasks. In the key space, all tasks require joins and variable marginalization. In the payload space, tasks differ in the definition of the sum and product ring operations. We implemented F-IVM on top of DBToaster and show that it can outperform classical first-order and fully recursive higher-order incremental view maintenance by orders of magnitude while using less memory

    Machine Learning over Static and Dynamic Relational Data

    Get PDF
    This tutorial overviews principles behind recent works on training and maintaining machine learning models over relational data, with an emphasis on the exploitation of the relational data structure to improve the runtime performance of the learning task. The tutorial has the following parts: 1) Database research for data science 2) Three main ideas to achieve performance improvements 2.1) Turn the ML problem into a DB problem 2.2) Exploit structure of the data and problem 2.3) Exploit engineering tools of a DB researcher 3) Avenues for future researchComment: arXiv admin note: text overlap with arXiv:2008.0786

    F-IVM: Learning over Fast-Evolving Relational Data

    Get PDF
    F-IVM is a system for real-time analytics such as machine learning applications over training datasets defined by queries over fast-evolving relational databases. We will demonstrate F-IVM for three such applications: model selection, Chow-Liu trees, and ridge linear regression.Comment: SIGMOD DEMO 2020, 5 page

    Scalable aggregation predictive analytics: a query-driven machine learning approach

    Get PDF
    We introduce a predictive modeling solution that provides high quality predictive analytics over aggregation queries in Big Data environments. Our predictive methodology is generally applicable in environments in which large-scale data owners may or may not restrict access to their data and allow only aggregation operators like COUNT to be executed over their data. In this context, our methodology is based on historical queries and their answers to accurately predict ad-hoc queries’ answers. We focus on the widely used set-cardinality, i.e., COUNT, aggregation query, as COUNT is a fundamental operator for both internal data system optimizations and for aggregation-oriented data exploration and predictive analytics. We contribute a novel, query-driven Machine Learning (ML) model whose goals are to: (i) learn the query-answer space from past issued queries, (ii) associate the query space with local linear regression & associative function estimators, (iii) define query similarity, and (iv) predict the cardinality of the answer set of unseen incoming queries, referred to the Set Cardinality Prediction (SCP) problem. Our ML model incorporates incremental ML algorithms for ensuring high quality prediction results. The significance of contribution lies in that it (i) is the only query-driven solution applicable over general Big Data environments, which include restricted-access data, (ii) offers incremental learning adjusted for arriving ad-hoc queries, which is well suited for query-driven data exploration, and (iii) offers a performance (in terms of scalability, SCP accuracy, processing time, and memory requirements) that is superior to data-centric approaches. We provide a comprehensive performance evaluation of our model evaluating its sensitivity, scalability and efficiency for quality predictive analytics. In addition, we report on the development and incorporation of our ML model in Spark showing its superior performance compared to the Spark’s COUNT method

    Indexing forecast models for matching and maintenance

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. There has been recent interest in integrating forecast queries inside a DBMS. Answering a forecast query requires the creation of forecast models. Creating a forecast model is an expensive process and may require several scans over the base data as well as expensive operations to estimate model parameters. However, if forecast queries are issued repeatedly, answer times can be reduced significantly if forecast models are reused. Due to the possibly high number of forecast queries, existing models need to be found quickly. Therefore, we propose a model index that efficiently stores forecast models and allows for the efficient reuse of existing ones. Our experiments illustrate that the model index shows a negligible overhead for update transactions, but it yields significant improvements during query execution

    Testes incrementais em um desenvolvimento guiado por testes baseados em modelo

    Get PDF
    Orientador: Eliane MartinsDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O desenvolvimento de sistemas pode ser realizado seguindo diversos modelos de processo. Os métodos ágeis propõem realizar implementações iterativas e incrementais e testes antecipados, buscando uma validação antecipada do sistema. Algumas técnicas ágeis adicionam a característica de um desenvolvimento de sistema baseado em testes, como as técnicas de Desenvolvimento Baseado em Teste (do inglês Test Driven Development (TDD)) e Desenvolvimento Baseado em Comportamento (do inglês Behaviour Driven Development (BDD)). Recentemente algumas técnicas propõem a união de técnicas ágeis de desenvolvimento baseado em testes com técnicas consolidadas da área de testes, com o objetivo principal de auxiliar na etapa de criação de testes, que serão utilizados para guiar o desenvolvimento do sistema. Um exemplo é a técnica de Desenvolvimento Guiado por Testes Baseados em Modelo (do inglês Model-Based Test Driven Development (MBTDD)) que une os conceitos de Testes Baseados em Modelo (do inglês Model-Based Testing (MBT)) e Desenvolvimento Baseado em Teste (TDD). Portanto em MBTDD, testes são derivados de modelos que representam os comportamentos esperados do sistema, e baseado nesses testes, o desenvolvimento iterativo e incremental ocorre. Entretanto quando lidamos com processos iterativos e incrementais, surgem problemas decorrente da evolução do sistema, como por exemplo: como reutilizar os artefatos de testes, e como selecionar os testes relevantes para a codificação da nova versão do sistema. Nesse contexto, este trabalho explora um processo no qual o desenvolvimento ágil de sistema é guiado por testes baseados em modelos, com o enfoque no auxílio do reúso dos artefatos de testes e no processo de identificação de testes relevantes para o desenvolvimento de uma nova versão do sistema. Para tanto, características do processo de MBTDD são unidas com características de uma técnica que busca o reúso de artefatos de testes baseado em princípios de testes de regressão, denominada Testes de Regressão SPL Baseados em Modelo Delta (do inglês Delta-Oriented Model-Based SPL Regression Testing). Para realizar a avaliação da solução proposta, ela foi aplicada em exemplos existentes e comparada com a abordagem no qual nenhum caso de teste é reutilizadoAbstract: Systems can be developed following different process models. Agile methods propose iterative and incremental implementations and anticipating tests, in order to anticipate system validation. Some agile techniques add the characteristic of development based on tests, like in Test Driven Development (TDD) and Behaviour Driven Development (BDD). Recently some techniques proposed joining the agile techniques of development based on tests with techniques consolidated in the field of testing, with the main purpose of aiding in the test creation stage, which are used to guide the development of the system. An example is Model-Based Test Driven Development (MBTDD) which joins the concepts of Model-Based Testing (MBT) and Test Driven Development (TDD). Therefore in MBTDD, tests are derived from models that represent the expected behaviour of the system, and based on those tests, iterative and incremental development is performed. However, when iterative and incremental processes are used, problems appear as the consequence of the evolution of the system, such as: how to reuse the test artefacts, and how to select the relevant tests for implementing the new version of the system. In this context, this work proposes a process in which the agile development of a system is guided by model based tests, focusing on helping with the reuse of test artefacts and on the process of identifying tests relevant to development. To achieve this goal, MBTDD process characteristics are joined with characteristics from a technique that aims to find reusability of test artefacts based on principles of regression tests, called Delta-Oriented Model-Based SPL Regression Testing. To evaluate the proposed solution, it was applied to existing examples and compared to the approach without any test case reuseMestradoCiência da ComputaçãoMestra em Ciência da Computação151647/2013-5CNP
    corecore