Search CORE

24,463 research outputs found

QCBA: Postoptimization of Quantitative Attributes in Classifiers based on Association Rules

Author: Kliegr Tomas
Publication venue
Publication date: 18/10/2019
Field of study

The need to prediscretize numeric attributes before they can be used in association rule learning is a source of inefficiencies in the resulting classifier. This paper describes several new rule tuning steps aiming to recover information lost in the discretization of numeric (quantitative) attributes, and a new rule pruning strategy, which further reduces the size of the classification models. We demonstrate the effectiveness of the proposed methods on postoptimization of models generated by three state-of-the-art association rule classification algorithms: Classification based on Associations (Liu, 1998), Interpretable Decision Sets (Lakkaraju et al, 2016), and Scalable Bayesian Rule Lists (Yang, 2017). Benchmarks on 22 datasets from the UCI repository show that the postoptimized models are consistently smaller -- typically by about 50% -- and have better classification performance on most datasets

arXiv.org e-Print Archive

Recommended from our members

A survey of intrusion detection techniques in Cloud

Author: Arshad
Avi Patel
Beg
Bhavesh Borisaniya
Botha
Chen
Chen
Chirag Modi
Dhanalakshmi
Dhiren Patel
Garfinkel
Grediaga
Hamad
Han
Hiren Patel
Ibrahim
Jia
Katar
Lei
Leu
Li
Li
Lu
Muttukrishnan Rajarajan
Ram
Sandar
Su
Tillapart
Vieira
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Cloud computing provides scalable, virtualized on-demand services to the end users with greater flexibility and lesser infrastructural investment. These services are provided over the Internet using known networking protocols, standards and formats under the supervision of different managements. Existing bugs and vulnerabilities in underlying technologies and legacy protocols tend to open doors for intrusion. This paper, surveys different intrusions affecting availability, confidentiality and integrity of Cloud resources and services. It examines proposals incorporating Intrusion Detection Systems (IDS) in Cloud and discusses various types and techniques of IDS and Intrusion Prevention Systems (IPS), and recommends IDS/IPS positioning in Cloud architecture to achieve desired security in the next generation networks

City Research Online

Crossref

Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS

Author: Pardede Eric
Pratama Mahardhika
Za'in Choiru
Publication venue
Publication date: 18/07/2018
Field of study

Many distributed machine learning frameworks have recently been built to speed up the large-scale data learning process. However, most distributed machine learning used in these frameworks still uses an offline algorithm model which cannot cope with the data stream problems. In fact, large-scale data are mostly generated by the non-stationary data stream where its pattern evolves over time. To address this problem, we propose a novel Evolving Large-scale Data Stream Analytics framework based on a Scalable Parsimonious Network based on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving algorithm is distributed over the worker nodes in the cloud to learn large-scale data stream. Scalable PANFIS framework incorporates the active learning (AL) strategy and two model fusion methods. The AL accelerates the distributed learning process to generate an initial evolving large-scale data stream model (initial model), whereas the two model fusion methods aggregate an initial model to generate the final model. The final model represents the update of current large-scale data knowledge which can be used to infer future data. Extensive experiments on this framework are validated by measuring the accuracy and running time of four combinations of Scalable PANFIS and other Spark-based built in algorithms. The results indicate that Scalable PANFIS with AL improves the training time to be almost two times faster than Scalable PANFIS without AL. The results also show both rule merging and the voting mechanisms yield similar accuracy in general among Scalable PANFIS algorithms and they are generally better than Spark-based algorithms. In terms of running time, the Scalable PANFIS training time outperforms all Spark-based algorithms when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Microservices and Machine Learning Algorithms for Adaptive Green Buildings

Author: Alonso Montesinos Joaquín Blas
Ayala Palenzuela Rosa María
Capobianco Uriarte María De Las Mercedes
Criado Rodríguez Javier
Iribarne Martínez Luis Fernando
Piedra Fernández José Antonio
Rodríguez Gracia Diego
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

In recent years, the use of services for Open Systems development has consolidated and strengthened. Advances in the Service Science and Engineering (SSE) community, promoted by the reinforcement of Web Services and Semantic Web technologies and the presence of new Cloud computing techniques, such as the proliferation of microservices solutions, have allowed software architects to experiment and develop new ways of building open and adaptable computer systems at runtime. Home automation, intelligent buildings, robotics, graphical user interfaces are some of the social atmosphere environments suitable in which to apply certain innovative trends. This paper presents a schema for the adaptation of Dynamic Computer Systems (DCS) using interdisciplinary techniques on model-driven engineering, service engineering and soft computing. The proposal manages an orchestrated microservices schema for adapting component-based software architectural systems at runtime. This schema has been developed as a three-layer adaptive transformation process that is supported on a rule-based decision-making service implemented by means of Machine Learning (ML) algorithms. The experimental development was implemented in the Solar Energy Research Center (CIESOL) applying the proposed microservices schema for adapting home architectural atmosphere systems on Green Buildings

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional de la Universidad de Almería (Spain)

A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks

Author: Alaeiyan M
Conti M
Dargahi T
Dehghantanha A
Parsa S
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/03/2020
Field of study

The rapid increase in the number of malicious programs has made malware forensics a daunting task and caused users’ systems to become in danger. Timely identification of malware characteristics including its origin and the malware sample family would significantly limit the potential damage of malware. This is a more profound risk in Cyber-Physical Systems (CPSs), where a malware attack may cause significant physical damage to the infrastructure. Due to limited on-device available memory and processing power in CPS devices, most of the efforts for protecting CPS networks are focused on the edge layer, where the majority of security mechanisms are deployed. Since the majority of advanced and sophisticated malware programs are combining features from different families, these malicious programs are not similar enough to any existing malware family and easily evade binary classifier detection. Therefore, in this article, we propose a novel multilabel fuzzy clustering system for malware attack attribution. Our system is deployed on the edge layer to provide insight into applicable malware threats to the CPS network. We leverage static analysis by utilizing Opcode frequencies as the feature space to classify malware families. We observed that a multilabel classifier does not classify a part of samples. We named this problem the instance coverage problem. To overcome this problem, we developed an ensemble-based multilabel fuzzy classification method to suggest the relevance of a malware instance to the stricken families. This classifier identified samples of VirusShare, RansomwareTracker, and BIG2015 with an accuracy of 94.66%, 94.26%, and 97.56%, respectively

University of Salford Institutional Repository

A survey on utilization of data mining approaches for dermatological (skin) diseases prediction

Author: Adibi N
Ahmadzadeh MR
Barati E
Mohammadi A
Saraee MH
Publication venue: Cyber Journals
Publication date: 01/03/2011
Field of study

Due to recent technology advances, large volumes of medical data is obtained. These data contain valuable information. Therefore data mining techniques can be used to extract useful patterns. This paper is intended to introduce data mining and its various techniques and a survey of the available literature on medical data mining. We emphasize mainly on the application of data mining on skin diseases. A categorization has been provided based on the different data mining techniques. The utility of the various data mining methodologies is highlighted. Generally association mining is suitable for extracting rules. It has been used especially in cancer diagnosis. Classification is a robust method in medical mining. In this paper, we have summarized the different uses of classification in dermatology. It is one of the most important methods for diagnosis of erythemato-squamous diseases. There are different methods like Neural Networks, Genetic Algorithms and fuzzy classifiaction in this topic. Clustering is a useful method in medical images mining. The purpose of clustering techniques is to find a structure for the given data by finding similarities between data according to data characteristics. Clustering has some applications in dermatology. Besides introducing different mining methods, we have investigated some challenges which exist in mining skin data

University of Salford Institutional Repository