4,083 research outputs found
From Cooperative Scans to Predictive Buffer Management
In analytical applications, database systems often need to sustain workloads
with multiple concurrent scans hitting the same table. The Cooperative Scans
(CScans) framework, which introduces an Active Buffer Manager (ABM) component
into the database architecture, has been the most effective and elaborate
response to this problem, and was initially developed in the X100 research
prototype. We now report on the the experiences of integrating Cooperative
Scans into its industrial-strength successor, the Vectorwise database product.
During this implementation we invented a simpler optimization of concurrent
scan buffer management, called Predictive Buffer Management (PBM). PBM is based
on the observation that in a workload with long-running scans, the buffer
manager has quite a bit of information on the workload in the immediate future,
such that an approximation of the ideal OPT algorithm becomes feasible. In the
evaluation on both synthetic benchmarks as well as a TPC-H throughput run we
compare the benefits of naive buffer management (LRU) versus CScans, PBM and
OPT; showing that PBM achieves benefits close to Cooperative Scans, while
incurring much lower architectural impact.Comment: VLDB201
Cooperative scans
Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve
High Throughput Push Based Storage Manager
The storage manager, as a key component of the database system, is
responsible for organizing, reading, and delivering data to the execution
engine for processing. According to the data serving mechanism, existing
storage managers are either pull-based, incurring high latency, or push-based,
leading to a high number of I/O requests when the CPU is busy. To improve these
shortcomings, this thesis proposes a push-based prefetching strategy in a
column-wise storage manager. The proposed strategy implements an efficient
cache layer to store shared data among queries to reduce the number of I/O
requests. The capacity of the cache is maintained by a time access-aware
eviction mechanism. Our strategy enables the storage manager to coordinate
multiple queries by merging their requests and dynamically generate an optimal
read order that maximizes the overall I/O throughput. We evaluated our storage
manager both over a disk-based redundant array of independent disks (RAID) and
an NVM Express (NVMe) solid-state drive (SSD). With the high read performance
of the SSD, we successfully minimized the total read time and number of I/O
accesses
Landscape Visibility and Prehistoric Artifact Distribution at Pea Ridge National Military Park
Pea Ridge National Military Park, in the north east corner of Benton County, Arkansas, is the 4,300 acre site of a crucial Civil War Battle. Human occupation of the Ozark Highland landscape, however, extends far into pre-history. A 2005 report to the National Park Service details the findings of a four year cultural resource survey of the park. The sampling strategy employed in the research design (random sample site selection and 2.5% park coverage) provides an excellent dataset to assess prehistoric land use. This dataset is not dependent on artificially defined sites, representing singular activity in a limited geographical space. Instead it allows for interpretation of patterns of land use; while artifacts may not be spatially or temporally associated, their provenience on the landscape can be assessed in relationship to various landscape elements and environmental variables. Trends in artifact location can be seen with this representative sample distribution.
The 2005 report examines artifact distribution with respect to permanent and intermittent streams. The predictive models produced from the analysis closely relate the availability of water and caloric expenditure required to travel across the landscape to a majority of the prehistoric material at the park. The report also explores seasonal expressions of land use at Pea Ridge. The goal of this project is to explore the relationship between another landscape variable, visibility, and prehistoric locations that do not conform to the models of the original study, those with higher travel costs to water. Economic models like cost-to-water are meaningful interpretations of land use, but I feel that such models preclude other elements of landscape experience. By comparing the distributions of conforming and aberrant prehistoric artifact groups against three different measurements of visibility, I hope to show that landscape perception could be a reliable predictor of prehistoric material in high cost areas
Characterizing and Improving the Reliability of Broadband Internet Access
In this paper, we empirically demonstrate the growing importance of
reliability by measuring its effect on user behavior. We present an approach
for broadband reliability characterization using data collected by many
emerging national initiatives to study broadband and apply it to the data
gathered by the Federal Communications Commission's Measuring Broadband America
project. Motivated by our findings, we present the design, implementation, and
evaluation of a practical approach for improving the reliability of broadband
Internet access with multihoming.Comment: 15 pages, 14 figures, 6 table
Personal intelligence
A model of personal intelligence is developed. Personal intelligence is defined as the capacity to reason about personality and to use personality and personal information to enhance one\u27s thoughts, plans, and life experience. Approaches to related concepts such as intrapersonal intelligence and psychological-mindedness are reviewed. Next, a model of personal intelligence is proposed that emphasizes the capacity to: a) recognize personally-relevant information; b) form accurate models of personality; c) guide one\u27s choices by using personality information; and d) systematize one\u27s goals, plans, and life stories. A discussion examines the possible contributions and limitations of the personal intelligence concept
- …