13,885 research outputs found
Increasing Availability in Distributed Storage Systems via Clustering
We introduce the Fixed Cluster Repair System (FCRS) as a novel architecture
for Distributed Storage Systems (DSS), achieving a small repair bandwidth while
guaranteeing a high availability. Specifically we partition the set of servers
in a DSS into clusters and allow a failed server to choose any cluster
other than its own as its repair group. Thereby, we guarantee an availability
of . We characterize the repair bandwidth vs. storage trade-off for the
FCRS under functional repair and show that the minimum repair bandwidth can be
improved by an asymptotic multiplicative factor of compared to the state
of the art coding techniques that guarantee the same availability. We further
introduce Cubic Codes designed to minimize the repair bandwidth of the FCRS
under the exact repair model. We prove an asymptotic multiplicative improvement
of in the minimum repair bandwidth compared to the existing exact repair
coding techniques that achieve the same availability. We show that Cubic Codes
are information-theoretically optimal for the FCRS with and complete
clusters. Furthermore, under the repair-by-transfer model, Cubic Codes are
optimal irrespective of the number of clusters
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
- β¦