47,586 research outputs found

    Consecutive retrieval with redundancy: an optimal linear and an optimal cyclic arrangement and their storage space requirements

    Get PDF
    Information retrieval, file organization, consecutive retrieval property, consecutive retrieval with redundancy, storage space requirements 1

    Pinwheel Scheduling for Fault-tolerant Broadcast Disks in Real-time Database Systems

    Full text link
    The design of programs for broadcast disks which incorporate real-time and fault-tolerance requirements is considered. A generalized model for real-time fault-tolerant broadcast disks is defined. It is shown that designing programs for broadcast disks specified in this model is closely related to the scheduling of pinwheel task systems. Some new results in pinwheel scheduling theory are derived, which facilitate the efficient generation of real-time fault-tolerant broadcast disk programs.National Science Foundation (CCR-9308344, CCR-9596282

    Multicluster interleaving on paths and cycles

    Get PDF
    Interleaving codewords is an important method not only for combatting burst errors, but also for distributed data retrieval. This paper introduces the concept of multicluster interleaving (MCI), a generalization of traditional interleaving problems. MCI problems for paths and cycles are studied. The following problem is solved: how to interleave integers on a path or cycle such that any m (m/spl ges/2) nonoverlapping clusters of order 2 in the path or cycle have at least three distinct integers. We then present a scheme using a "hierarchical-chain structure" to solve the following more general problem for paths: how to interleave integers on a path such that any m (m/spl ges/2) nonoverlapping clusters of order L (L/spl ges/2) in the path have at least L+1 distinct integers. It is shown that the scheme solves the second interleaving problem for paths that are asymptotically as long as the longest path on which an MCI exists, and clearly, for shorter paths as well

    Exploring heterogeneity of unreliable machines for p2p backup

    Full text link
    P2P architecture is a viable option for enterprise backup. In contrast to dedicated backup servers, nowadays a standard solution, making backups directly on organization's workstations should be cheaper (as existing hardware is used), more efficient (as there is no single bottleneck server) and more reliable (as the machines are geographically dispersed). We present the architecture of a p2p backup system that uses pairwise replication contracts between a data owner and a replicator. In contrast to standard p2p storage systems using directly a DHT, the contracts allow our system to optimize replicas' placement depending on a specific optimization strategy, and so to take advantage of the heterogeneity of the machines and the network. Such optimization is particularly appealing in the context of backup: replicas can be geographically dispersed, the load sent over the network can be minimized, or the optimization goal can be to minimize the backup/restore time. However, managing the contracts, keeping them consistent and adjusting them in response to dynamically changing environment is challenging. We built a scientific prototype and ran the experiments on 150 workstations in the university's computer laboratories and, separately, on 50 PlanetLab nodes. We found out that the main factor affecting the quality of the system is the availability of the machines. Yet, our main conclusion is that it is possible to build an efficient and reliable backup system on highly unreliable machines (our computers had just 13% average availability)

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios
    corecore