5,409 research outputs found

    Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

    Full text link
    In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer

    Optimization of Real-World MapReduce Applications With Flame-MR: Practical Use Cases

    Get PDF
    [Abstract] Apache Hadoop is a widely used MapReduce framework for storing and processing large amounts of data. However, it presents some performance issues that hinder its utilization in many practical use cases. Although existing alternatives like Spark or Hama can outperform Hadoop, they require to rewrite the source code of the applications due to API incompatibilities. This paper studies the use of Flame-MR, an in-memory processing architecture for MapReduce applications, to improve the performance of real-world use cases in a transparent way while keeping application compatibility. Flame-MR adapts to the characteristics of the workloads, managing efficiently the use of custom data formats and iterative computations, while also reducing workload imbalance. The experimental evaluation, conducted in high performance clusters and the Microsoft Azure cloud, shows a clear outperformance of Flame-MR over Hadoop. In most cases, Flame-MR reduces the execution times by more than a half

    Energy Efficient Designs for Collaborative Signal and Information Processing inWireless Sensor Networks

    Get PDF
    Collaborative signal and information processing (CSIP) plays an important role in the deployment of wireless sensor networks. Since each sensor has limited computing capability, constrained power usage, and limited sensing range, collaboration among sensor nodes is important in order to compensate for each other’s limitation as well as to improve the degree of fault tolerance. In order to support the execution of CSIP algorithms, distributed computing paradigm and clustering protocols, are needed, which are the major concentrations of this dissertation. In order to facilitate collaboration among sensor nodes, we present a mobile-agent computing paradigm, where instead of each sensor node sending local information to a processing center, as is typical in the client/server-based computing, the processing code is moved to the sensor nodes through mobile agents. We further conduct extensive performance evaluation versus the traditional client/server-based computing. Experimental results show that the mobile agent paradigm performs much better when the number of nodes is large while the client/server paradigm is advantageous when the number of nodes is small. Based on this result, we propose a hybrid computing paradigm that adopts different computing models within different clusters of sensor nodes. Either the client/server or the mobile agent paradigm can be employed within clusters or between clusters according to the different cluster configurations. This new computing paradigm can take full advantages of both client/server and mobile agent computing paradigms. Simulations show that the hybrid computing paradigm performs better than either the client/server or the mobile agent computing. The mobile agent itinerary has a significant impact on the overall performance of the sensor network. We thus formulate both the static mobile agent planning and the dynamic mobile agent planning as optimization problems. Based on the models, we present three itinerary planning algorithms. We have showed, through simulation, that the predictive dynamic itinerary performs the best under a wide range of conditions, thus making it particularly suitable for CSIP in wireless sensor networks. In order to facilitate the deployment of hybrid computing paradigm, we proposed a decentralized reactive clustering (DRC) protocol to cluster the sensor network in an energy-efficient way. The clustering process is only invoked by events occur in the sensor network. Nodes that do not detect the events are put into the sleep state to save energy. In addition, power control technique is used to minimize the transmission power needed. The advantages of DRC protocol are demonstrated through simulations

    Cognition-Based Networks: A New Perspective on Network Optimization Using Learning and Distributed Intelligence

    Get PDF
    IEEE Access Volume 3, 2015, Article number 7217798, Pages 1512-1530 Open Access Cognition-based networks: A new perspective on network optimization using learning and distributed intelligence (Article) Zorzi, M.a , Zanella, A.a, Testolin, A.b, De Filippo De Grazia, M.b, Zorzi, M.bc a Department of Information Engineering, University of Padua, Padua, Italy b Department of General Psychology, University of Padua, Padua, Italy c IRCCS San Camillo Foundation, Venice-Lido, Italy View additional affiliations View references (107) Abstract In response to the new challenges in the design and operation of communication networks, and taking inspiration from how living beings deal with complexity and scalability, in this paper we introduce an innovative system concept called COgnition-BAsed NETworkS (COBANETS). The proposed approach develops around the systematic application of advanced machine learning techniques and, in particular, unsupervised deep learning and probabilistic generative models for system-wide learning, modeling, optimization, and data representation. Moreover, in COBANETS, we propose to combine this learning architecture with the emerging network virtualization paradigms, which make it possible to actuate automatic optimization and reconfiguration strategies at the system level, thus fully unleashing the potential of the learning approach. Compared with the past and current research efforts in this area, the technical approach outlined in this paper is deeply interdisciplinary and more comprehensive, calling for the synergic combination of expertise of computer scientists, communications and networking engineers, and cognitive scientists, with the ultimate aim of breaking new ground through a profound rethinking of how the modern understanding of cognition can be used in the management and optimization of telecommunication network

    Machine Learning Based Applications for Data Visualization, Modeling, Control, and Optimization for Chemical and Biological Systems

    Get PDF
    This dissertation report covers Yan Ma’s Ph.D. research with applicational studies of machine learning in manufacturing and biological systems. The research work mainly focuses on reaction modeling, optimization, and control using a deep learning-based approaches, and the work mainly concentrates on deep reinforcement learning (DRL). Yan Ma’s research also involves with data mining with bioinformatics. Large-scale data obtained in RNA-seq is analyzed using non-linear dimensionality reduction with Principal Component Analysis (PCA), t-Distributed Stochastic Neighbor Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP), followed by clustering analysis using k-Means and Hierarchical Density-Based Spatial Clustering with Noise (HDBSCAN). This report focuses on 3 case studies with DRL optimization control including a polymerization reaction control with deep reinforcement learning, a bioreactor optimization, and a fed-batch reaction optimization from a reactor at Dow Inc.. In the first study, a data-driven controller based on DRL is developed for a fed-batch polymerization reaction with multiple continuous manipulative variables with continuous control. The second case study is the modeling and optimization of a bioreactor. In this study, a data-driven reaction model is developed using Artificial Neural Network (ANN) to simulate the growth curve and bio-product accumulation of cyanobacteria Plectonema. Then a DRL control agent that optimizes the daily nutrient input is applied to maximize the yield of valuable bio-product C-phycocyanin. C-phycocyanin yield is increased by 52.1% compared to a control group with the same total nutrient content in experimental validation. The third case study is employing the data-driven control scheme for optimization of a reactor from Dow Inc, where a DRL-based optimization framework is established for the optimization of the Multi-Input, Multi-Output (MIMO) reaction system with reaction surrogate modeling. Yan Ma’s research overall shows promising directions for employing the emerging technologies of data-driven methods and deep learning in the field of manufacturing and biological systems. It is demonstrated that DRL is an efficient algorithm in the study of three different reaction systems with both stochastic and deterministic policies. Also, the use of data-driven models in reaction simulation also shows promising results with the non-linear nature and fast computational speed of the neural network models
    • …