141 research outputs found
Indexer++: Workload-Aware Online Index Tuning With Transformers and Reinforcement Learning
With the increasing workload complexity in modern databases, the manual process of index selection is a challenging task. There is a growing need for a database with an ability to learn and adapt to evolving workloads. This paper proposes Indexer++, an autonomous, workload-aware, online index tuner. Unlike existing approaches, Indexer++ imposes low overhead on the DBMS, is responsive to changes in query workloads and swiftly selects indexes. Our approach uses a combination of text analytic techniques and reinforcement learning. Indexer++ consist of two phases: Phase (i) learns workload trends using a novel trend detection technique based on a pre-trained transformer model. Phase (ii) performs online, i.e., continuous or while the DBMS is processing workloads, index selection using a novel online deep reinforcement learning technique using our proposed priority experience sweeping. This paper provides an experimental evaluation of Indexer++ in multiple scenarios using benchmark (TPC-H) and real-world datasets (IMDB). In our experiments, Indexer++ effectively identifies changes in workload trends and selects the set of optimal indexes
On the Semantics of "Now" in Databases
While "now" is expressed in SQL as CURRENT-TIMESTAMP within queries, this value cannot be
stored in the database. However, this notion of an ever-increasing current-time value has been
reflected in some temporal data models by inclusion of database-resident variables, such as
"now," "until-changed," "â," "@" and "-." Time variables are very desirable, but their use
also leads to a new type of database, consisting of tuples with variables, termed a variable
database.
This paper proposes a framework for defining the semantics of the variable databases of temporal
relational data models. A framework is presented because several reasonable meanings
may be given to databases that use some of the specific temporal variables that have appeared
in the literature. Using the framework, the paper defines a useful semantics for such databases.
Because situations occur where the existing time variables are inadequate, two new types of
modeling entities that address these shortcomings, timestamps which we call now-relative and
now-relative indeterminate, are introduced and defined within the framework. Moreover, the paper
provides a foundation, using algebraic bind operators, for the querying of variable databases
via existing query languages. This transition to variable databases presented here requires minimal
change to the query processor. Finally, to underline the practical feasibility of variable
databases, we show that database variables can be precisely specified and efficiently implemented
in conventional query languages, such as SQL, and in temporal query languages, such
as TSQL2.Information Systems Working Papers Serie
Recommended from our members
Valid-time indeterminacy.
In valid-time indeterminacy, it is known that an event stored in a temporal database did in fact occur, but it is not known exactly when the event occurred. We extend a tuple-timestamped temporal data model to support valid-time indeterminacy and outline its implementation. This work is novel in that previous research, although quite extensive, has not studied this particular kind of incomplete information. To model the occurrence time of an event, we introduce a new data type called an indeterminate instant. Our thesis is that by representing an indeterminate instant with a set of contiguous chronons and a probability distribution over that set, it is possible to characterize a large number of (possibly weighted) alternatives, to devise intuitive query language constructs, including schema specification, temporal constants, temporal predicates and constructors, and aggregates, and to implement these constructs efficiently. We extend the TQuel and TSQL2 query languages with constructs to retrieve information in the presence of indeterminacy. Although the extended data model and query language provide needed modeling capabilities, these extensions appear to carry a significant execution cost. The cost of support for indeterminacy is empirically measured, and is shown to be modest. We then show how indeterminacy can provide a much richer modeling of granularity and now. Granularity is the unit of measure of a temporal datum (e.g., days, months, weeks). Indeterminacy and granularity are two sides of the same coin insofar as a time at a given granularity is indeterminate at all finer granularities. Now is a distinguished temporal value. We describe a new kind of instant, a now-relative indeterminate instant, which has the same storage requirements as other instants, but can be used to model situations such as that an employee is currently employed but will not work beyond the year 1995. In summary, support for indeterminacy dramatically increases the modeling capabilities of a temporal database without adversely impacting performance
Content-based Navigation in a Mini-World Web
Several database query languages have recently been developed to locate and retrieve documents in the vast network of World-Wide Web pages. These languages combine path expressions, which specify the structure of a path through the network to the desired information, with content predicates, which force the path to pass through pages with particular content. The straightforward implementation of these languages is based on breadth-first search of the network, with heavy reliance placed on the user's understanding of network topology to both direct and constrain the search via the appropriate use of the path expressions. In this paper we describe a system that removes the reliance on path expressions to safeguard the search during a query and enables the user to navigate by refining content rather than by specifying structure. Our system uses a cost-constrained model for query evaluation. Links between pages are assigned costs. The user controls how far a query can navigate by specify..
Automatic Filtering of Now-centric Data
A now-centric collection of data is characterised by the property that as data in the collection ages, each datum individually becomes less relevant, but remains relevant in aggregate. Such data can be filtered by materialising an aggregate view on the data and then compressing, moving to backup, or deleting the data from which that view was materialised, yielding a smaller collection of data. This paper describes a tool to automatically filter data by building a statistical database from the now-centric collection of data. To build the statistical database, the user supplies a list of filters. Each filter consists of a filter unit and a filter measure. The filter unit specifies a pattern (a regular expression) to match as the now-centric data is filtered. The filter measure is the system of measurement in which occurrences of that pattern are counted. A key feature of the tool is that users may define their own units and measures. Queries on the filtered data are analysed to determine..
- …