3 research outputs found

    Probabilistic Page Replacement Policy in Buffer Cache Management for Flash-Based Cloud Databases

    Get PDF
    In the fast evolution of storage systems, the newly emerged flash memory-based Solid State Drives (SSDs) are becoming an important part of the computer storage hierarchy. Amongst the several advantages of flash-based SSDs, high read performance, and low power consumption are of primary importance. Amongst its few disadvantages, its asymmetric I/O latencies for read, write and erase operations are the most crucial for overall performance. In this paper, we proposed two novel probabilistic adaptive algorithms that compute the future probability of reference based on recency, frequency, and periodicity of past page references. The page replacement is performed by considering the probability of reference of cached pages as well as asymmetric read-write-erase properties of flash devices. The experimental results show that our proposed method is successful in minimizing the performance overheads of flash-based systems as well as in maintaining the good hit ratio. The results also justify the utility of a genetic algorithm in maximizing the overall performance gains

    Probability density estimation over evolving data streams using tilted Parzen window

    No full text
    Probability density estimation is a very important technology which has been widely used in data mining and data analysis. In this paper, we generalize the traditional Parzen window method to data streams and propose a new method of tilted Parzen window (TPW) for probability density estimation. To adapt to the evolvement of the data streams, we use the tilted window size that is proportional to datapsilas arrival time instead of the fixed window size. Theoretical analysis shows that the tilted Parzen window method is a valid method for estimating the probability density function (pdf) for data streams. We also propose a new strategy for discarding the historical data in data streams. We prove that this strategy can describe the probability density changes more accurately than the conventional discarding strategy. Empirical results on synthetic data set demonstrate the effectiveness and efficiency of this method.Shen Hong & Yan Xiao-Longhttp://www.ieee-iscc.org/2008

    Offline and Online Density Estimation for Large High-Dimensional Data

    Get PDF
    Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient. This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is presented. Also, some example applications of the high-dimensional density estimation in density-based classification and clustering are presented. Another challenge in the area of density estimation rises in dealing with online sources of data, where data is arriving over an open-ended and non-stationary stream. This calls for efficient algorithms for online density estimation. An online density estimator needs to be capable of providing up-to-date estimates of the density, bound to the available computing resources and requirements of the application. In response to this, BBSP method for online density estimation is introduced. It works based on collecting and processing the data in blocks of fixed size, followed by a weighted averaging over block-wise estimates of the density. Proper choice of block size is discussed via simulations for streams of synthetic and real datasets. Further, with the purpose of efficiency improvement in offline and online density estimation, progressive update of the binary partitions in BBSP is proposed, which as simulation results show, leads into improved accuracy as well as speed-up, for various block sizes
    corecore