878 research outputs found

    Cooperative scans

    Get PDF
    Data mining, information retrieval and other application areas exhibit a query load with multiple concurrent queries touching a large fraction of a relation. This leads to individual query plans based on a table scan or large index scan. The implementation of this access path in most database systems is straightforward. The Scan operator issues next page requests to the buffer manager without concern for the system state. Conversely, the buffer manager is not aware of the work ahead and it focuses on keeping the most-recently-used pages in the buffer pool. This paper introduces cooperative scans -- a new algorithm, based on a better sharing of knowledge and responsibility between the Scan operator and the buffer manager, which significantly improves performance of concurrent scan queries. In this approach, queries share the buffer content, and progress of the scans is optimized by the buffer manager by minimizing the number of disk transfers in light of the total workload ahead. The experimental results are based on a simulation of the various disk-access scheduling policies, and implementation of the cooperative scans within PostgreSQL and MonetDB/X100. These real-life experiments show that with a little effort the performance of existing database systems on concurrent scan queries can be strongly improve

    A distributed Web document database and its supporting environment

    Get PDF
    [[abstract]]We propose a new Web documentation database as a supporting environment of the Multimedia Micro-University project. The design of this database facilitates a Web documentation development paradigm that we have proposed earlier. From a script description to its implementation as well as testing records, the database and its interface allow the user to design Web documents as virtual courses to be used in a Web-savvy virtual library. The database supports object reuse and sharing, as well as referential integrity and concurrence. In order to allow real-time course demonstration, we also propose a simple course distribution mechanism, which allows the pre-broadcast of course materials. The system is implemented as a three-tier architecture which runs under MS Windows and other platforms.[[conferencetype]]國際[[conferencedate]]19990706~19990708[[conferencelocation]]Red Sea, Egyp

    A HARD REAL-TIME SCHEDULER ALGORITHM FOR SOLID STATE DEVICE

    Get PDF
    This paper presents an approach to use the solid state devices in hard real time application where delay in retrieval or write of data to and fro to them can result in a catastrophe. This new algorithm proposes a new approach of scheduling by considering the deadline’s associated with data’s, multiple synchronous read or write requests along with the algorithm for overcoming the problem of performing new block writes resulting in I/O bottleneck

    IO-Top-k: index-access optimized top-k query processing

    No full text
    Top-k query processing is an important building block for ranked retrieval, with applications ranging from text and data integration to distributed aggregation of network logs and sensor data. Top-k queries operate on index lists for a query's elementary conditions and aggregate scores for result candidates. One of the best implementation methods in this setting is the family of threshold algorithms, which aim to terminate the index scans as early as possible based on lower and upper bounds for the final scores of result candidates. This procedure performs sequential disk accesses for sorted index scans, but also has the option of performing random accesses to resolve score uncertainty. This entails scheduling for the two kinds of accesses: 1) the prioritization of different index lists in the sequential accesses, and 2) the decision on when to perform random accesses and for which candidates. The prior literature has studied some of these scheduling issues, but only for each of the two access types in isolation. The current paper takes an integrated view of the scheduling issues and develops novel strategies that outperform prior proposals by a large margin. Our main contributions are new, principled, scheduling methods based on a Knapsack-related optimization for sequential accesses and a cost model for random accesses. The methods can be further boosted by harnessing probabilistic estimators for scores, selectivities, and index list correlations. We also discuss efficient implementation techniques for the underlying data structures. In performance experiments with three different datasets (TREC Terabyte, HTTP server logs, and IMDB), our methods achieved significant performance gains compared to the best previously known methods: a factor of up to 3 in terms of execution costs, and a factor of 5 in terms of absolute run-times of our implementation. Our best techniques are close to a lower bound for the execution cost of the considered class of threshold algorithms

    Dynamic Routing Algorithms and Methods for Controlling Traffic Flows of Cloud Applications and Services

    Get PDF
    Nowadays, we see a steady growth in the use of cloud computing in modern business. This enables to reduce the cost of IT infrastructure owning and operation; however, there are some issues related to the management of data processing centers.One of these issues is the effective use of companies’ computing and network resources. The goal of optimization is to manage the traffic in cloud applications and services within data centers.Taking into account the multitier architecture of modern data centers, we need to pay a special attention to this task. The advantage of modern infrastructure virtualization is the possibility to use software-defined networks and software-defined data storages. However, the existing optimization of algorithmic solutions does not take into account the specific features of the network traffic formation with multiple application types.The task of optimizing traffic distribution for cloud applications and services can be solved by using software-defined infrastructure of virtual data centers.We have developed a simulation model for the traffic in software-defined networks segments of data centers involved in processing user requests to cloud application and services within a network environment.Our model enables to implement the traffic management algorithm of cloud applications and to optimize the access to storage systems through the effective use of data transmission channels. During the experimental studies, we have found that the use of our algorithm enables to decrease the response time of cloud applications and services and, therefore, to increase the productivity of user requests processing and to reduce the number of refusals

    Location-Dependent Query Processing Under Soft Real-Time Constraints

    Get PDF
    corecore