87,560 research outputs found

    Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

    Full text link
    In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

    Distributed Correlation-Based Feature Selection in Spark

    Get PDF
    CFS (Correlation-Based Feature Selection) is an FS algorithm that has been successfully applied to classification problems in many domains. We describe Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and distributed version of the CFS algorithm, capable of dealing with the large volumes of data typical of big data applications. Two versions of the algorithm were implemented and compared using the Apache Spark cluster computing model, currently gaining popularity due to its much faster processing times than Hadoop's MapReduce model. We tested our algorithms on four publicly available datasets, each consisting of a large number of instances and two also consisting of a large number of features. The results show that our algorithms were superior in terms of both time-efficiency and scalability. In leveraging a computer cluster, they were able to handle larger datasets than the non-distributed WEKA version while maintaining the quality of the results, i.e., exactly the same features were returned by our algorithms when compared to the original algorithm available in WEKA.Comment: 25 pages, 5 figure

    Overview of Swallow --- A Scalable 480-core System for Investigating the Performance and Energy Efficiency of Many-core Applications and Operating Systems

    Full text link
    We present Swallow, a scalable many-core architecture, with a current configuration of 480 x 32-bit processors. Swallow is an open-source architecture, designed from the ground up to deliver scalable increases in usable computational power to allow experimentation with many-core applications and the operating systems that support them. Scalability is enabled by the creation of a tile-able system with a low-latency interconnect, featuring an attractive communication-to-computation ratio and the use of a distributed memory configuration. We analyse the energy and computational and communication performances of Swallow. The system provides 240GIPS with each core consuming 71--193mW, dependent on workload. Power consumption per instruction is lower than almost all systems of comparable scale. We also show how the use of a distributed operating system (nOS) allows the easy creation of scalable software to exploit Swallow's potential. Finally, we show two use case studies: modelling neurons and the overlay of shared memory on a distributed memory system.Comment: An open source release of the Swallow system design and code will follow and references to these will be added at a later dat
    • …
    corecore