1 research outputs found
Grid-based Approaches for Distributed Data Mining Applications
The data mining field is an important source of large-scale applications and
datasets which are getting more and more common. In this paper, we present
grid-based approaches for two basic data mining applications, and a performance
evaluation on an experimental grid environment that provides interesting
monitoring capabilities and configuration tools. We propose a new distributed
clustering approach and a distributed frequent itemsets generation well-adapted
for grid environments. Performance evaluation is done using the Condor system
and its workflow manager DAGMan. We also compare this performance analysis to a
simple analytical model to evaluate the overheads related to the workflow
engine and the underlying grid system. This will specifically show that
realistic performance expectations are currently difficult to achieve on the
grid