Search CORE

4,643 research outputs found

A Survey of Parallel Data Mining

Author: Freitas Alex A.
Publication venue
Publication date
Field of study

With the fast, continuous increase in the number and size of databases, parallel data mining is a natural and cost-effective approach to tackle the problem of scalability in data mining. Recently there has been a considerable research on parallel data mining. However, most projects focus on the parallelization of a single kind of data mining algorithm/paradigm. This paper surveys parallel data mining with a broader perspective. More precisely, we discuss the parallelization of data mining algorithms of four knowledge discovery paradigms, namely rule induction, instance-based learning, genetic algorithms and neural networks. Using the lessons learned from this discussion, we also derive a set of heuristic principles for designing efficient parallel data mining algorithms

Kent Academic Repository

EPiK-a Workflow for Electron Tomography in Kepler.

Author: Altintas Ilkay
Chen Ruijuan
Crawl Daniel
Ellisman Mark
Lawrence Albert
Phan Sébastien
Wan Xiaohua
Wang Jianwu
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

Scientific workflows integrate data and computing interfaces as configurable, semi-automatic graphs to solve a scientific problem. Kepler is such a software system for designing, executing, reusing, evolving, archiving and sharing scientific workflows. Electron tomography (ET) enables high-resolution views of complex cellular structures, such as cytoskeletons, organelles, viruses and chromosomes. Imaging investigations produce large datasets. For instance, in Electron Tomography, the size of a 16 fold image tilt series is about 65 Gigabytes with each projection image including 4096 by 4096 pixels. When we use serial sections or montage technique for large field ET, the dataset will be even larger. For higher resolution images with multiple tilt series, the data size may be in terabyte range. Demands of mass data processing and complex algorithms require the integration of diverse codes into flexible software structures. This paper describes a workflow for Electron Tomography Programs in Kepler (EPiK). This EPiK workflow embeds the tracking process of IMOD, and realizes the main algorithms including filtered backprojection (FBP) from TxBR and iterative reconstruction methods. We have tested the three dimensional (3D) reconstruction process using EPiK on ET data. EPiK can be a potential toolkit for biology researchers with the advantage of logical viewing, easy handling, convenient sharing and future extensibility

PubMed Central

eScholarship - University of California

Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

Author: Chen Tieming
Jin Shichao
Kim Okhee
Zhang Xu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

arXiv.org e-Print Archive

The Hipeac Vision, 2010

Author: Cohen Albert
De Bosschere Koen
De Sutter Bjorn
Duranton Marc
Falsafi Babak
Gaydadjiev Georgi
Katevenis Manolis
Maebe Jonas
Munk Harm
Navarro Nacho
Ramirez Alex
Temam Olivier
Valero Matero
Yehia Sami
Publication venue: HiPEAC
Publication date: 01/01/2010
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23