3,437 research outputs found

    On the selection of secondary indices in relational databases

    Get PDF
    An important problem in the physical design of databases is the selection of secondary indices. In general, this problem cannot be solved in an optimal way due to the complexity of the selection process. Often use is made of heuristics such as the well-known ADD and DROP algorithms. In this paper it will be shown that frequently used cost functions can be classified as super- or submodular functions. For these functions several mathematical properties have been derived which reduce the complexity of the index selection problem. These properties will be used to develop a tool for physical database design and also give a mathematical foundation for the success of the before-mentioned ADD and DROP algorithms

    Approximating Block Accesses in Random Files: The Case of Blocking Factors Lower than One

    Get PDF
    Expressions available in the current literature to estimate the number of blocks accessed in a random file fail to work when the blocking factor is lower than one. A new expression is developed in this article to estimate the number of blocks accessed; this expression is valid for blocking factors that are higher as well as lower than one. It is shown using simulation experiments that this expression is quite accurate in most situations

    The Effect of Buffer Size on Pages Accessed in Random Files

    Get PDF
    Prior works, for estimating the number of pages (blocks) accessed from secondary memory to retrieve a certain number of records for a query, have ignored the effect of main memory buffer size. While this may not cause any adverse impact for special cases, in most cases the impact of buffer sizes will be to increase the number of page accesses. This paper describes the reasons for the impact due to a limited buffer size and develops new expressions for the number of pages accessed. The accuracy of the expressions is evaluated by simulation modeling; and the effects of limited buffer size are discussed. Analytical works in database analysis and design should use the new expressions: especially when the effect of the buffer size is significant

    Database Optimizing Services

    Get PDF
    Almost every organization has at its centre a database. The database provides support for conducting different activities, whether it is production, sales and marketing or internal operations. Every day, a database is accessed for help in strategic decisions. The satisfaction therefore of such needs is entailed with a high quality security and availability. Those needs can be realised using a DBMS (Database Management System) which is, in fact, software for a database. Technically speaking, it is software which uses a standard method of cataloguing, recovery, and running different data queries. DBMS manages the input data, organizes it, and provides ways of modifying or extracting the data by its users or other programs. Managing the database is an operation that requires periodical updates, optimizing and monitoring.database, database management system (DBMS), indexing, optimizing, cost for optimized databases

    Expressions for Batched Searching of Sequential and Hierarchical Files

    Get PDF
    Batching yields significant savings in access costs in sequential, tree-structured, and random files. A direct and simple expression is developed for computing the average number of records/pages accessed to satisfy a batched query of a sequential file. The advantages of batching for sequential and random files are discussed. A direct equation is provided for the number of nodes accessed in unbatched queries of hierarchical files. An exact recursive expression is developed for node accesses in batched queries of hierarchical files. In addition to the recursive relationship, good, closed-form upper- and lower-bound approximations are provided for the case of batched queries of hierarchical files

    On the Selection of Optimal Index Configuration in OO Databases

    Get PDF
    An operation in object-oriented databases gives rise to the processing of a path. Several database operations may result into the same path. The authors address the problem of optimal index configuration for a single path. As it is shown an optimal index configuration for a path can be achieved by splitting the path into subpaths and by indexing each subpath with the optimal index organization. The authors present an algorithm which is able to select an optimal index configuration for a given path. The authors consider a limited number of existing indexing techniques (simple index, inherited index, nested inherited index, multi-index, and multi-inherited index) but the principles of the algorithm remain the same adding more indexing technique

    Dynamic Signature File Partitioning Based on Term Characteristics

    Get PDF
    Signature files act as a filter on retrieval to discard a large number of non-qualifying data items. Linear hashing with superimposed signatures (LHSS) provides an effective retrieval filter to process queries in dynamic databases. This study is an analysis of the effects of reflecting the term query and occurrence characteristics to signatures in LHSS. This approach relaxes the unrealistic uniform frequency assumption and lets the terms with high discriminatory power set more bits in signatures. The simulation experiments based on the derived formulas show that incorporating the term characteristics in LHSS improves retrieval efficiency. The paper also discusses the further benefits of this approach to alleviate the potential imbalance between the levels of efficiency and relevancy

    Signature File Hashing Using Term Occurrence and Query Frequencies

    Get PDF
    Signature files act as a filter on retrieval to discard a large number of non-qualifying data items. Linear hashing with superimposed signatures (LHSS) provides an effective retrieval filter to process queries in dynamic databases. This study is an analysis of the effects of reflecting the term occurrence and query frequencies to signatures in LHSS. This approach relaxes the unrealistic uniform frequency assumption and lets the terms with high discriminatory power set more bits in signatures. The simulation experiments based on the derived formulas explore the amount of page savings with different occurrence and query frequency combinations at different hashing levels. The results show that the performance of LHSS improves with the hashing level and the larger is the difference between the term discriminatory power values of the terms, the higher is the retrieval efficiency. The paper also discusses the benefits of this approach to alleviate the imbalance between the levels of efficiency and relevancy in unrealistic uniform frequency assumption case
    corecore