13,885 research outputs found

    Increasing Availability in Distributed Storage Systems via Clustering

    Full text link
    We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture for Distributed Storage Systems (DSS), achieving a small repair bandwidth while guaranteeing a high availability. Specifically we partition the set of servers in a DSS into ss clusters and allow a failed server to choose any cluster other than its own as its repair group. Thereby, we guarantee an availability of sβˆ’1s-1. We characterize the repair bandwidth vs. storage trade-off for the FCRS under functional repair and show that the minimum repair bandwidth can be improved by an asymptotic multiplicative factor of 2/32/3 compared to the state of the art coding techniques that guarantee the same availability. We further introduce Cubic Codes designed to minimize the repair bandwidth of the FCRS under the exact repair model. We prove an asymptotic multiplicative improvement of 0.790.79 in the minimum repair bandwidth compared to the existing exact repair coding techniques that achieve the same availability. We show that Cubic Codes are information-theoretically optimal for the FCRS with 22 and 33 complete clusters. Furthermore, under the repair-by-transfer model, Cubic Codes are optimal irrespective of the number of clusters

    Integrating E-Commerce and Data Mining: Architecture and Challenges

    Full text link
    We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
    • …
    corecore