15,967 research outputs found

    Maintaining Triangle Queries under Updates

    Get PDF
    We consider the problem of incrementally maintaining the triangle queries with arbitrary free variables under single-tuple updates to the input relations. We introduce an approach called IVMϵ^\epsilon that exhibits a trade-off between the update time, the space, and the delay for the enumeration of the query result, such that the update time ranges from the square root to linear in the database size while the delay ranges from constant to linear time. IVMϵ^\epsilon achieves Pareto worst-case optimality in the update-delay space conditioned on the Online Matrix-Vector Multiplication conjecture. It is strongly Pareto optimal for the triangle queries with zero or three free variables and weakly Pareto optimal for the triangle queries with one or two free variables.Comment: 47 pages, 18 figure

    A unified view of data-intensive flows in business intelligence systems : a survey

    Get PDF
    Data-intensive flows are central processes in today’s business intelligence (BI) systems, deploying different technologies to deliver data, from a multitude of data sources, in user-preferred and analysis-ready formats. To meet complex requirements of next generation BI systems, we often need an effective combination of the traditionally batched extract-transform-load (ETL) processes that populate a data warehouse (DW) from integrated data sources, and more real-time and operational data flows that integrate source data at runtime. Both academia and industry thus must have a clear understanding of the foundations of data-intensive flows and the challenges of moving towards next generation BI environments. In this paper we present a survey of today’s research on data-intensive flows and the related fundamental fields of database theory. The study is based on a proposed set of dimensions describing the important challenges of data-intensive flows in the next generation BI setting. As a result of this survey, we envision an architecture of a system for managing the lifecycle of data-intensive flows. The results further provide a comprehensive understanding of data-intensive flows, recognizing challenges that still are to be addressed, and how the current solutions can be applied for addressing these challenges.Peer ReviewedPostprint (author's final draft

    Query Rewriting Using Multitier Materialized Views For Cyber Manufacturing Reporting

    Get PDF
    Within cyber manufacturing context,Internet of Data (IoD) technology has enabled manufacturing sector to store and transfer mass data rapidly for processing.Data growth which is driven by advancement in the way data are produced and interconnected has caused volume of data a crucial issue to address.As such,in monitoring delicate wafer processing in semiconductor manufacturing,reporting delay problem caused by databases of high data volumes is intolerable.This is because,various reports (that require access to large databases) need to be frequently generated in the shortest time possible.Reporting delay is usually handled through SQL query rewriting.In this paper,the results of experimenting SQL query rewriting by utilizing multitier materialized views structure is presented.In particular,we define sub-materialized views (SMVs) concept,and implement it using real data sets from SilTerra (a semiconductor industry).The outcome of the experiment supports the hypothesis that SQL query rewriting using SMV outperforms the classic rewriting. The results reveal that the performance of SMV is far better (than without SMV) for complex queries against large data sets.The benefits of SMV are not limited to cyber manufacturing domain as the use of SMV can contribute other industries with similar problem
    corecore