164 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Current and Future Challenges in Knowledge Representation and Reasoning
Knowledge Representation and Reasoning is a central, longstanding, and active
area of Artificial Intelligence. Over the years it has evolved significantly;
more recently it has been challenged and complemented by research in areas such
as machine learning and reasoning under uncertainty. In July 2022 a Dagstuhl
Perspectives workshop was held on Knowledge Representation and Reasoning. The
goal of the workshop was to describe the state of the art in the field,
including its relation with other areas, its shortcomings and strengths,
together with recommendations for future progress. We developed this manifesto
based on the presentations, panels, working groups, and discussions that took
place at the Dagstuhl Workshop. It is a declaration of our views on Knowledge
Representation: its origins, goals, milestones, and current foci; its relation
to other disciplines, especially to Artificial Intelligence; and on its
challenges, along with key priorities for the next decade
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Linear Programs with Conjunctive Database Queries
In this paper, we study the problem of optimizing a linear program whose
variables are the answers to a conjunctive query. For this we propose the
language LP(CQ) for specifying linear programs whose constraints and objective
functions depend on the answer sets of conjunctive queries. We contribute an
efficient algorithm for solving programs in a fragment of LP(CQ). The natural
approach constructs a linear program having as many variables as there are
elements in the answer set of the queries. Our approach constructs a linear
program having the same optimal value but fewer variables. This is done by
exploiting the structure of the conjunctive queries using generalized hypertree
decompositions of small width to factorize elements of the answer set together.
We illustrate the various applications of LP(CQ) programs on three examples:
optimizing deliveries of resources, minimizing noise for differential privacy,
and computing the s-measure of patterns in graphs as needed for data mining
Efficiently computing the Shapley value of connectivity games in low-treewidth graphs
The Shapley value is the solution concept in cooperative game theory that is most used in both theoretical and practical settings. Unfortunately, in general, computing the Shapley value is computationally intractable. This paper focuses on computing the Shapley value of (weighted) connectivity games. For these connectivity games, which are defined on an underlying (weighted) graph, computing the Shapley value is #P-hard, and thus (likely) intractable even for graphs with a moderate number of vertices. We present an algorithm that can efficiently compute the Shapley value if the underlying graph has bounded treewidth. Next, we apply our algorithm to several real-world (covert) networks. We show that our algorithm can quickly compute exact Shapley values for these networks, whereas in prior work these values could only be approximated using a heuristic method. Finally, it is demonstrated that our algorithm can also efficiently compute the Shapley value time for several larger (artificial) benchmark graphs from the PACE 2018 challenge
Robust Query Optimization for Analytical Database Systems
Querying and efficiently analyzing complex data is required to gain valuable business insights, to support machine learning applications, and to make up-to-date information available. Therefore, this thesis investigates opportunities and challenges of selecting the most efficient execution strategy for analytical queries. These challenges include hard-to-capture data characteristics such as skew and correlation, the support of arbitrary data types, and the optimization time overhead of complex queries. Existing approaches often rely on optimistic assumptions about the data distribution, which can result in significant response time delays when these assumptions are not met. On the contrary, we focus on robust query optimization, emphasizing consistent query performance and applicability. Our presentation follows the general select-project-join query pattern, representing the fundamental stages of analytical query processing. To support arbitrary data types and complex filter expressions in the select stage, a novel sampling-based selectivity estimator is developed. Our approach exploits information from filter subexpressions and estimates correlations that are not captured by existing sampling-based methods. We demonstrate improved estimation accuracy and query execution time. Further, to minimize the runtime overhead of sampling, we propose new techniques that exploit access patterns and auxiliary database objects such as indices. For the join stage, we introduce a robust optimization approach by developing an upper-bound join enumeration strategy that connects accurate filter selectivity estimates –e.g., using our sampling-based approach– to join ordering. We demonstrate that join orders based on our upper-bound join ordering strategy achieve more consistent performance and faster workload execution on state-of-the-art database systems. However, besides identifying good logical join orders, it is crucial to determine appropriate physical join operators before query plan execution. To understand the importance of fine-grained physical operator selections, we exhaustively execute fixed join orders with all possible operator combinations. This analysis reveals that none of the investigated query optimizers fully reaches the potential of optimal operator decisions. Based on these insights and to achieve fine-grained operator selections for the previously determined join orders, the thesis presents a lightweight learning-based physical execution plan refinement component called. We show that this refinement component consistently outperforms existing approaches for physical operator selection while enabling a novel two-stage optimizer design. We conclude the thesis by providing a framework for the two-stage optimizer design that allows users to modify, replicate, and further analyze the concepts discussed throughout this thesis.:1 INTRODUCTION
1.1 Analytical Query Processing . . . . . . . . . . . . . . . . . . . 12
1.2 Select-Project-Join Queries . . . . . . . . . . . . . . . . . . . 13
1.3 Basics of SPJ Query Optimization . . . . . . . . . . . . . . . . . 14
1.3.1 Plan Enumeration . . . . . . . . . . . . . . . . . . . . . . . . 14
1.3.2 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3.3 Cardinality Estimation . . . . . . . . . . . . . . . . . . . . . 15
1.4 Robust SPJ Query Optimization . . . . . . . . . . . . . . . . . . 16
1.4.1 Tail Latency Root Cause Analysis . . . . . . . . . . . . . . . . 17
1.4.2 Tenets of Robust Query Optimization . . . . . . . . . . . . . . 19
1.5 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.6 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 SELECT (-PROJECT) STAGE
2.1 Sampling for Selectivity Estimation . . . . . . . . . . . . . . . 24
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.2.1 Combined Selectivity Estimation (CSE) . . . . . . . . . . . . . 29
2.2.2 Kernel Density Estimator . . . . . . . . . . . . . . . . . . . . 31
2.2.3 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 Beta Estimator for 0-Tuple-Situations . . . . . . . . . . . . . . 33
2.3.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Beta Distribution in Non-0-TS . . . . . . . . . . . . . . . . . 35
2.3.3 Parameter Estimation in 0-TS . . . . . . . . . . . . . . . . . . 37
2.3.4 Selectivity Estimation and Predicate Ordering . . . . . . . . . 39
2.3.5 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4 Customized Sampling Techniques . . . . . . . . . . . . . . . . . . 53
2.4.1 Focused Sampling . . . . . . . . . . . . . . . . . . . . . . . . 54
2.4.2 Conditional Sampling . . . . . . . . . . . . . . . . . . . . . . 56
2.4.3 Zone Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 JOIN STAGE: LOGICAL ENUMERATION
3.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.1.1 Point Estimates . . . . . . . . . . . . . . . . . . . . . . . . 63
3.1.2 Join Cardinality Upper Bound . . . . . . . . . . . . . . . . . . 64
3.2 Upper Bound Join Enumeration with Synopsis (UES) . . . . . . . . . 66
3.2.1 U-Block: Simple Upper Bound for Joins . . . . . . . . . . . . . 67
3.2.2 E-Block: Customized Enumeration Scheme . . . . . . . . . . . . . 68
3.2.3 UES Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.3 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3.1 General Performance . . . . . . . . . . . . . . . . . . . . . . 72
3.3.2 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4 JOIN STAGE: PHYSICAL OPERATOR SELECTION
4.1 Operator Selection vs Join Ordering . . . . . . . . . . . . . . . 77
4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.1 Adaptive Query Processing . . . . . . . . . . . . . . . . . . . 80
4.2.2 Bandit Optimizer (Bao) . . . . . . . . . . . . . . . . . . . . . 81
4.3 TONIC: Learned Physical Join Operator Selection . . . . . . . . . 82
4.3.1 Query Execution Plan Synopsis (QEP-S) . . . . . . . . . . . . . 83
4.3.2 QEP-S Life-Cycle . . . . . . . . . . . . . . . . . . . . . . . . 84
4.3.3 QEP-S Design Considerations . . . . . . . . . . . . . . . . . . 87
4.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4.1 Performance Factors . . . . . . . . . . . . . . . . . . . . . . 90
4.4.2 Rate of Improvement . . . . . . . . . . . . . . . . . . . . . . 92
4.4.3 Data Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.4 TONIC - Runtime Traits . . . . . . . . . . . . . . . . . . . . . 97
4.4.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5 TWO-STAGE OPTIMIZER FRAMEWORK
5.1 Upper-Bound-Driven Join Ordering Component . . . . . . . . . . . . 101
5.2 Physical Operator Selection Component . . . . . . . . . . . . . . 103
5.3 Example Query Optimization . . . . . . . . . . . . . . . . . . . . 103
6 CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A APPENDIX
A.1 Basics of Query Execution . . . . . . . . . . . . . . . . . . . . 123
A.2 Why Q? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
A.3 0-TS Proof of Unbiased Estimate . . . . . . . . . . . . . . . . . 125
A.4 UES Upper Bound Property . . . . . . . . . . . . . . . . . . . . . 127
A.5 TONIC – Selectivity-Aware Branching . . . . . . . . . . . . . . . 128
A.6 TONIC – Sequences of Query Execution . . . . . . . . . . . . . . . 12
Geometric Inhomogeneous Random Graphs for Algorithm Engineering
The design and analysis of graph algorithms is heavily based on the worst case.
In practice, however, many algorithms perform much better than the worst case would suggest.
Furthermore, various problems can be tackled more efficiently if one assumes the input to be, in a sense, realistic.
The field of network science, which studies the structure and emergence of real-world networks, identifies locality and heterogeneity as two frequently occurring properties.
A popular model that captures these properties are geometric inhomogeneous random graphs (GIRGs), which is a generalization of hyperbolic random graphs (HRGs).
Aside from their importance to network science, GIRGs can be an immensely valuable tool in algorithm engineering.
Since they convincingly mimic real-world networks, guarantees about quality and performance of an algorithm on instances of the model can be transferred to real-world applications.
They have model parameters to control the amount of heterogeneity and locality, which allows to evaluate those properties in isolation while keeping the rest fixed.
Moreover, they can be efficiently generated which allows for experimental analysis.
While realistic instances are often rare, generated instances are readily available.
Furthermore, the underlying geometry of GIRGs helps to visualize the network, e.g.,~for debugging or to improve understanding of its structure.
The aim of this work is to demonstrate the capabilities of geometric inhomogeneous random graphs in algorithm engineering and establish them as routine tools to replace previous models like the Erd\H{o}s-R{\\u27e}nyi model, where each edge exists with equal probability.
We utilize geometric inhomogeneous random graphs to design, evaluate, and optimize efficient algorithms for realistic inputs.
In detail, we provide the currently fastest sequential generator for GIRGs and HRGs and describe algorithms for maximum flow, directed spanning arborescence, cluster editing, and hitting set.
For all four problems, our implementations beat the state-of-the-art on realistic inputs.
On top of providing crucial benchmark instances, GIRGs allow us to obtain valuable insights.
Most notably, our efficient generator allows us to
experimentally show sublinear running time of our flow algorithm,
investigate the solution structure of cluster editing,
complement our benchmark set of arborescence instances with a density for which there are no real-world networks available,
and generate networks with adjustable locality and heterogeneity to reveal the effects of these properties on our algorithms
LIPIcs, Volume 274, ESA 2023, Complete Volume
LIPIcs, Volume 274, ESA 2023, Complete Volum
- …