36,312 research outputs found
i2MapReduce: Incremental MapReduce for Mining Evolving Big Data
As new data and updates are constantly arriving, the results of data mining
applications become stale and obsolete over time. Incremental processing is a
promising approach to refreshing mining results. It utilizes previously saved
states to avoid the expense of re-computation from scratch.
In this paper, we propose i2MapReduce, a novel incremental processing
extension to MapReduce, the most widely used framework for mining big data.
Compared with the state-of-the-art work on Incoop, i2MapReduce (i) performs
key-value pair level incremental processing rather than task level
re-computation, (ii) supports not only one-step computation but also more
sophisticated iterative computation, which is widely used in data mining
applications, and (iii) incorporates a set of novel techniques to reduce I/O
overhead for accessing preserved fine-grain computation states. We evaluate
i2MapReduce using a one-step algorithm and three iterative algorithms with
diverse computation characteristics. Experimental results on Amazon EC2 show
significant performance improvements of i2MapReduce compared to both plain and
iterative MapReduce performing re-computation
- …