840 research outputs found

    MACOC: a medoid-based ACO clustering algorithm

    Get PDF
    The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, showing great potential of ACO-based techniques. This work presents an ACO-based clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach restructures ACOC from a centroid-based technique to a medoid-based technique, where the properties of the search space are not necessarily known. Instead, it only relies on the information about the distances amongst data. The new algorithm, called MACOC, has been compared against well-known algorithms (K-means and Partition Around Medoids) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository

    A Hybrid Heuristic for the k-medoids Clustering Problem

    Get PDF
    Clustering is an important tool for data analysis, since it allows the exploration of datasets with no or very little prior information. Its main goal is to group a set of data based on their similarity (dissimilarity). A well known mathematical formulation for clustering is the k-medoids problem. Current versions of k-medoids rely on heuristics, with good results reported in the literature. However, few methods that analyze the quality of the partitions found by the heuristics have been proposed. in this paper, we propose a hybrid Lagrangian heuristic for the k-medoids. We compare the performance of the proposed Lagrangian heuristic with other heuristics for the k-medoids problem found in literature. Experimental results presented that the proposed Lagrangian heuristic outperformed the other algorithms.UNIFESP, Inst Ciencia & Tecnol, BR-12230280 Sao Jose Dos Campos, SP, BrazilUNIFESP, Inst Ciencia & Tecnol, BR-12230280 Sao Jose Dos Campos, SP, BrazilWeb of Scienc

    Finding groups in data: Cluster analysis with ants

    Get PDF
    Wepresent in this paper a modification of Lumer and Faieta’s algorithm for data clustering. This approach mimics the clustering behavior observed in real ant colonies. This algorithm discovers automatically clusters in numerical data without prior knowledge of possible number of clusters. In this paper we focus on ant-based clustering algorithms, a particular kind of a swarm intelligent system, and on the effects on the final clustering by using during the classification differentmetrics of dissimilarity: Euclidean, Cosine, and Gower measures. Clustering with swarm-based algorithms is emerging as an alternative to more conventional clustering methods, such as e.g. k-means, etc. Among the many bio-inspired techniques, ant clustering algorithms have received special attention, especially because they still require much investigation to improve performance, stability and other key features that would make such algorithms mature tools for data mining. As a case study, this paper focus on the behavior of clustering procedures in those new approaches. The proposed algorithm and its modifications are evaluated in a number of well-known benchmark datasets. Empirical results clearly show that ant-based clustering algorithms performs well when compared to another techniques

    Security Analytics: Using Deep Learning to Detect Cyber Attacks

    Get PDF
    Security attacks are becoming more prevalent as cyber attackers exploit system vulnerabilities for financial gain. The resulting loss of revenue and reputation can have deleterious effects on governments and businesses alike. Signature recognition and anomaly detection are the most common security detection techniques in use today. These techniques provide a strong defense. However, they fall short of detecting complicated or sophisticated attacks. Recent literature suggests using security analytics to differentiate between normal and malicious user activities. The goal of this research is to develop a repeatable process to detect cyber attacks that is fast, accurate, comprehensive, and scalable. A model was developed and evaluated using several production log files provided by the University of North Florida Information Technology Security department. This model uses security analytics to complement existing security controls to detect suspicious user activity occurring in real time by applying machine learning algorithms to multiple heterogeneous server-side log files. The process is linearly scalable and comprehensive; as such it can be applied to any enterprise environment. The process is composed of three steps. The first step is data collection and transformation which involves identifying the source log files and selecting a feature set from those files. The resulting feature set is then transformed into a time series dataset using a sliding time window representation. Each instance of the dataset is labeled as green, yellow, or red using three different unsupervised learning methods, one of which is Partitioning around Medoids (PAM). The final step uses Deep Learning to train and evaluate the model that will be used for detecting abnormal or suspicious activities. Experiments using datasets of varying sizes of time granularity resulted in a very high accuracy and performance. The time required to train and test the model was surprisingly fast even for large datasets. This is the first research paper that develops a model to detect cyber attacks using security analytics; hence this research builds a foundation on which to expand upon for future research in this subject area

    Hybrid of K-Means and partitioning around medoids for predicting COVID-19 cases: Iraq case study

    Get PDF
    COVID-19 was discovered near the end of 2019 in Wuhan, China. In a short period, the virus had spread throughout the entire world. One of the primary concerns of managers and decision-makers in all types of hospitals nowadays is to implement detection plans for status of patient (Negative, Positive) in order to provide enough care at the proper moment. To reduce a pandemic of COVID-19, improving health care quality could be advantageous. Making clusters of patients with similar features and symptoms supplies an overview of health quality given to similar patients. In the scope of medical machine learning, the K-means and Partitioning Around Medoids (PAM) clustering algorithms are usually used to produce clusters depend on similarity and to detect helpful patterns from sizes of data. In this paper, we proposed a hybrid algorithm of K-Means and Partitioning Around Medoids (PAM) called K-MP to take benefits of both PAM and K-Means to construct an efficient model for predicting patient status. The suggested model for the real dataset was collected from 400 patients in the many Iraqi clinics using a questionnaire. We evaluated the proposed K-MP by using true negative rate, balance accuracy, precision, accuracy, recall, mean absolute error, F1 score, and root mean square error. From these performance measures, we found that K-MP is more efficient in discovering patient status comparing to K-Means and PAM
    corecore