2,161 research outputs found

    UPI: A Primary Index for Uncertain Databases

    Get PDF
    Uncertain data management has received growing attention from industry and academia. Many efforts have been made to optimize uncertain databases, including the development of special index data structures. However, none of these efforts have explored primary (clustered) indexes for uncertain databases, despite the fact that clustering has the potential to offer substantial speedups for non-selective analytic queries on large uncertain databases. In this paper, we propose a new index called a UPI (Uncertain Primary Index) that clusters heap files according to uncertain attributes with both discrete and continuous uncertainty distributions. Because uncertain attributes may have several possible values, a UPI on an uncertain attribute duplicates tuple data once for each possible value. To prevent the size of the UPI from becoming unmanageable, its size is kept small by placing low-probability tuples in a special Cutoff Index that is consulted only when queries for low-probability values are run. We also propose several other optimizations, including techniques to improve secondary index performance and techniques to reduce maintenance costs and fragmentation by buffering changes to the table and writing updates in sequential batches. Finally, we develop cost models for UPIs to estimate query performance in various settings to help automatically select tuning parameters of a UPI. We have implemented a prototype UPI and experimented on two real datasets. Our results show that UPIs can significantly (up to two orders of magnitude) improve the performance of uncertain queries both over clustered and unclustered attributes. We also show that our buffering techniques mitigate table fragmentation and keep the maintenance cost as low as or even lower than using an unclustered heap file.National Science Foundation (U.S.) (Grant IIS-0448124)National Science Foundation (U.S.) (Grant IIS-0905553)National Science Foundation (U.S.) (Grant IIS-0916691

    Analyzing the Impact of Visitors on Page Views with Google Analytics

    Full text link
    This paper develops a flexible methodology to analyze the effectiveness of different variables on various dependent variables which all are times series and especially shows how to use a time series regression on one of the most important and primary index (page views per visit) on Google analytic and in conjunction it shows how to use the most suitable data to gain a more accurate result. Search engine visitors have a variety of impact on page views which cannot be described by single regression. On one hand referral visitors are well-fitted on linear regression with low impact. On the other hand, direct visitors made a huge impact on page views. The higher connection speed does not simply imply higher impact on page views and the content of web page and the territory of visitors can help connection speed to describe user behavior. Returning visitors have some similarities with direct visitors.Comment: 32 pages,16 table, 10 figur

    Correlating fissure occurrence to rice quality for various drying and tempering treatments

    Get PDF
    When a rice kernel fissures, it can break in subsequent food processing operations and lose its commercial value. Head rice yield (HRY) is a measure of the percent of kernels that remain whole (at least three-fourths of original length) after rice has been milled. Our experiment was designed to test the effect of a rapid state transition during drying and tempering processes using cultivars Bengal and Cypress. ‘Bengal’ is a medium-size kernel and ‘Cypress’ is a longsize, thinner grained cultivar. Immediately after drying, the rice samples were separated into four sub-samples and tempered for 0, 80, 160, or 240 minutes at the temperature of the drying air. Tempering is a process to allow kernel moisture content gradients to decrease, thereby reducing the stress within the kernel. From each sample, 400 kernels were randomly selected, visually observed, and the percentage of fissured kernels determined. Results showed that the percentage of fissured kernels generally decreased with tempering. However, some samples still showed many fissures even after extended tempering, yet had a high HRY. While HRY is currently the primary index of rice quality, it is known that fissured kernels can severely and detrimentally affect end-use processing operations such as cooking or puffing. Thus, the tempering duration required for preventing kernel fissuring might be longer than the tempering duration required for maintaining a high HRY

    On the Selection of Optimal Index Configuration in OO Databases

    Get PDF
    An operation in object-oriented databases gives rise to the processing of a path. Several database operations may result into the same path. The authors address the problem of optimal index configuration for a single path. As it is shown an optimal index configuration for a path can be achieved by splitting the path into subpaths and by indexing each subpath with the optimal index organization. The authors present an algorithm which is able to select an optimal index configuration for a given path. The authors consider a limited number of existing indexing techniques (simple index, inherited index, nested inherited index, multi-index, and multi-inherited index) but the principles of the algorithm remain the same adding more indexing technique

    AsterixDB: A Scalable, Open Source BDMS

    Full text link
    AsterixDB is a new, full-function BDMS (Big Data Management System) with a feature set that distinguishes it from other platforms in today's open source Big Data ecosystem. Its features make it well-suited to applications like web data warehousing, social data storage and analysis, and other use cases related to Big Data. AsterixDB has a flexible NoSQL style data model; a query language that supports a wide range of queries; a scalable runtime; partitioned, LSM-based data storage and indexing (including B+-tree, R-tree, and text indexes); support for external as well as natively stored data; a rich set of built-in types; support for fuzzy, spatial, and temporal types and queries; a built-in notion of data feeds for ingestion of data; and transaction support akin to that of a NoSQL store. Development of AsterixDB began in 2009 and led to a mid-2013 initial open source release. This paper is the first complete description of the resulting open source AsterixDB system. Covered herein are the system's data model, its query language, and its software architecture. Also included are a summary of the current status of the project and a first glimpse into how AsterixDB performs when compared to alternative technologies, including a parallel relational DBMS, a popular NoSQL store, and a popular Hadoop-based SQL data analytics platform, for things that both technologies can do. Also included is a brief description of some initial trials that the system has undergone and the lessons learned (and plans laid) based on those early "customer" engagements

    A storage and access architecture for efficient query processing in spatial database systems

    Get PDF
    Due to the high complexity of objects and queries and also due to extremely large data volumes, geographic database systems impose stringent requirements on their storage and access architecture with respect to efficient query processing. Performance improving concepts such as spatial storage and access structures, approximations, object decompositions and multi-phase query processing have been suggested and analyzed as single building blocks. In this paper, we describe a storage and access architecture which is composed from the above building blocks in a modular fashion. Additionally, we incorporate into our architecture a new ingredient, the scene organization, for efficiently supporting set-oriented access of large-area region queries. An experimental performance comparison demonstrates that the concept of scene organization leads to considerable performance improvements for large-area region queries by a factor of up to 150
    corecore