4,258 research outputs found

    Data mining based cyber-attack detection

    Get PDF

    Non-convex clustering using expectation maximization algorithm with rough set initialization

    Get PDF
    An integration of a minimal spanning tree (MST) based graph-theoretic technique and expectation maximization (EM) algorithm with rough set initialization is described for non-convex clustering. EM provides the statistical model of the data and handles the associated uncertainties. Rough set theory helps in faster convergence and avoidance of the local minima problem, thereby enhancing the performance of EM. MST helps in determining non-convex clusters. Since it is applied on Gaussians rather than the original data points, time required is very low. These features are demonstrated on real life datasets. Comparison with related methods is made in terms of a cluster quality measure and computation time

    Hybrid self-organizing feature map (SOM) for anomaly detection in cloud infrastructures using granular clustering based upon value-difference metrics

    Get PDF
    We have witnessed an increase in the availability of data from diverse sources over the past few years. Cloud computing, big data and Internet-of-Things (IoT) are distinctive cases of such an increase which demand novel approaches for data analytics in order to process and analyze huge volumes of data for security and business use. Cloud computing has been becoming popular for critical structure IT mainly due to cost savings and dynamic scalability. Current offerings, however, are not mature enough with respect to stringent security and resilience requirements. Mechanisms such as anomaly detection hybrid systems are required in order to protect against various challenges that include network based attacks, performance issues and operational anomalies. Such hybrid AI systems include Neural Networks, blackboard systems, belief (Bayesian) networks, case-based reasoning and rule-based systems and can be implemented in a variety of ways. Traffic in the cloud comes from multiple heterogeneous domains and changes rapidly due to the variety of operational characteristics of the tenants using the cloud and the elasticity of the provided services. The underlying detection mechanisms rely upon measurements drawn from multiple sources. However, the characteristics of the distribution of measurements within specific subspaces might be unknown. We argue in this paper that there is a need to cluster the observed data during normal network operation into multiple subspaces each one of them featuring specific local attributes, i.e. granules of information. Clustering is implemented by the inference engine of a model hybrid NN system. Several variations of the so-called value-difference metric (VDM) are investigated like local histograms and the Canberra distance for scalar attributes, the Jaccard distance for binary word attributes, rough sets as well as local histograms over an aggregate ordering distance and the Canberra measure for vectorial attributes. Low-dimensional subspace representations of each group of points (measurements) in the context of anomaly detection in critical cloud implementations is based upon VD metrics and can be either parametric or non-parametric. A novel application of a Self-Organizing-Feature Map (SOFM) of reduced/aggregate ordered sets of objects featuring VD metrics (as obtained from distributed network measurements) is proposed. Each node of the SOFM stands for a structured local distribution of such objects within the input space. The so-called Neighborhood-based Outlier Factor (NOOF) is defined for such reduced/aggregate ordered sets of objects as a value-difference metric of histogrammes. Measurements that do not belong to local distributions are detected as anomalies, i.e. outliers of the trained SOFM. Several methods of subspace clustering using Expectation-Maximization Gaussian Mixture Models (a parametric approach) as well as local data densities (a non-parametric approach) are outlined and compared against the proposed method using data that are obtained from our cloud testbed in emulated anomalous traffic conditions. The results—which are obtained from a model NN system—indicate that the proposed method performs well in comparison with conventional techniques

    A Miniature Robot for Isolating and Tracking Neurons in Extracellular Cortical Recordings

    Get PDF
    This paper presents a miniature robot device and control algorithm that can autonomously position electrodes in cortical tissue for isolation and tracking of extracellular signals of individual neurons. Autonomous electrode positioning can significantly enhance the efficiency and quality of acute electrophysiolgical experiments aimed at basic understanding of the nervous system. Future miniaturized systems of this sort could also overcome some of the inherent difficulties in estabilishing long-lasting neural interfaces that are needed for practical realization of neural prostheses. The paper describes the robot's design and summarizes the overall structure of the control system that governs the electrode positioning process. We present a new sequential clustering algorithm that is key to improving our system's performance, and which may have other applications in robotics. Experimental results in macaque cortex demonstrate the validity of our approach

    GOGGLES: Automatic Image Labeling with Affinity Coding

    Full text link
    Generating large labeled training data is becoming the biggest bottleneck in building and deploying supervised machine learning models. Recently, the data programming paradigm has been proposed to reduce the human cost in labeling training data. However, data programming relies on designing labeling functions which still requires significant domain expertise. Also, it is prohibitively difficult to write labeling functions for image datasets as it is hard to express domain knowledge using raw features for images (pixels). We propose affinity coding, a new domain-agnostic paradigm for automated training data labeling. The core premise of affinity coding is that the affinity scores of instance pairs belonging to the same class on average should be higher than those of pairs belonging to different classes, according to some affinity functions. We build the GOGGLES system that implements affinity coding for labeling image datasets by designing a novel set of reusable affinity functions for images, and propose a novel hierarchical generative model for class inference using a small development set. We compare GOGGLES with existing data programming systems on 5 image labeling tasks from diverse domains. GOGGLES achieves labeling accuracies ranging from a minimum of 71% to a maximum of 98% without requiring any extensive human annotation. In terms of end-to-end performance, GOGGLES outperforms the state-of-the-art data programming system Snuba by 21% and a state-of-the-art few-shot learning technique by 5%, and is only 7% away from the fully supervised upper bound.Comment: Published at 2020 ACM SIGMOD International Conference on Management of Dat
    corecore