77,130 research outputs found

    Real-time Unsupervised Clustering

    Get PDF
    In our research program, we are developing machine learning algorithms to enable a mobile robot to build a compact representation of its environment. This requires the processing of each new input to terminate in constant time. Existing machine learning algorithms are either incapable of meeting this constraint or deliver problematic results. In this paper, we describe a new algorithm for real-time unsupervised clustering, Bounded Self-Organizing Clustering. It executes in constant time for each input, and it produces clusterings that are significantly better than those created by the Self-Organizing Map, its closest competitor, on sensor data acquired from a physically embodied mobile robot

    SACOC: A spectral-based ACO clustering algorithm

    Get PDF
    The application of ACO-based algorithms in data mining is growing over the last few years and several supervised and unsupervised learning algorithms have been developed using this bio-inspired approach. Most recent works concerning unsupervised learning have been focused on clustering, where ACO-based techniques have showed a great potential. At the same time, new clustering techniques that seek the continuity of data, specially focused on spectral-based approaches in opposition to classical centroid-based approaches, have attracted an increasing research interest–an area still under study by ACO clustering techniques. This work presents a hybrid spectral-based ACO clustering algorithm inspired by the ACO Clustering (ACOC) algorithm. The proposed approach combines ACOC with the spectral Laplacian to generate a new search space for the algorithm in order to obtain more promising solutions. The new algorithm, called SACOC, has been compared against well-known algorithms (K-means and Spectral Clustering) and with ACOC. The experiments measure the accuracy of the algorithm for both synthetic datasets and real-world datasets extracted from the UCI Machine Learning Repository

    Intrusion Signature Creation via Clustering Anomalies

    Get PDF
    Current practices for combating cyber attacks typically use Intrusion Detection Systems (IDSs) to detect and block multistage attacks. Because of the speed and impacts of new types of cyber attacks, current IDSs are limited in providing accurate detection while reliably adapting to new attacks. In signature-based IDS systems, this limitation is made apparent by the latency from day zero of an attack to the creation of an appropriate signature. This work hypothesizes that this latency can be shortened by creating signatures via anomaly-based algorithms. A hybrid supervised and unsupervised clustering algorithm is proposed for new signature creation. These new signatures created in real-time would take effect immediately, ideally detecting new attacks. This work first investigates a modified density-based clustering algorithm as an IDS, with its strengths and weaknesses identified. A signature creation algorithm leveraging the summarizing abilities of clustering is investigated. Lessons learned from the supervised signature creation are then leveraged for the development of unsupervised real-time signature classification. Automating signature creation and classification via clustering is demonstrated as satisfactory but with limitations

    Integration Of Unsupervised Clustering Algorithm And Supervised Classifier For Pattern Recognition

    Get PDF
    In a real world, pattern recognition problems in diversified forms are ubiquitous and are critical in most human decision making tasks. In pattern recognition system, achieving high accuracy in pattern classification is crucial. There are two general paradigms for pattern recognition classification which are supervised and unsupervised learning. The problems in applying unsupervised learning/clustering is that this method requires teacher during the classification process and it has to learn independently which may lead to poor classification. Whereas for supervised learning method, it requires teacher or prior data (i.e. large, prohibitive and labelled training data) during classification process which in real life, the cost of obtaining sufficient labelled training data is high. In addition, the labelling is time consuming and done manually. To solve the problems mentioned, integration of unsupervised clustering algorithm and the supervised classifier is proposed. The objective of this research is to study the performance/capability of the integration between both unsupervised and supervised learning. In order to achieve the objective, this research is separated into two phases. Phase 1 is mainly to evaluate the performance of clustering algorithm (K-Means and FCM). Phase 2 is to study the performance of proposed integration system which using the data clustered to be used as train data for Naïve Bayes classifier. By adopting the proposed integration system, the limitation of the unsupervised clustering method can be overcome and for supervised learning, the labelling time can be reduced and more training examples are labelled which can be used to train for supervised classifier. As the result, the pattern classification accuracy is also xii increase. For examples, after applying the proposed integration system, the classification accuracy of Fisher’s Iris, Wine and Bacteria18Class has been increased from 88.67% to 96.00%, from 78.33% to 83.45% and from 93.33% to 94.67% respectively as compared to only used unsupervised clustering algorithm. The result has shown that the proposed integration system could be applied to increase the performance of the classification. However, further study is needed in the feature extraction and clustering algorithms part as the performance of the pattern classification is still depending on the data input

    Unsupervised Multivariate Time Series Clustering

    Get PDF
    Clustering is widely used in unsupervised machine learning to partition a given set of data into non-overlapping groups. Many real-world applications require processing more complex multivariate time series data characterized by more than one dependent variables. A few works in literature reported multivariate classification using Shapelet learning. However, the clustering of multivariate time series signals using Shapelet learning has not explored yet. Shapelet learning is a process of discovering those Shapelets which contain the most informative features of the time series signal. Discovering suitable Shapelets from many candidates Shapelet has been broadly studied for classification and clustering of univariate time series signals. Shapelet learning has shown promising results in the case of univariate time series analysis. The analysis of multivariate time series signals is not widely explored because of the dimensionality issue. This work proposes a generalized Shapelet learning method for unsupervised multivariate time series clustering. The proposed method utilizes spectral clustering and Shapelet similarity minimization with least square regularization to obtain the optimal Shapelets for unsupervised clustering. The proposed method is evaluated using an in-house multivariate time series dataset on detection of radio frequency (RF) faults in the Jefferson Labs Continuous Beam Accelerator Facility (CEBAF). The dataset constitutes of three-dimensional time series recordings of three RF fault types. The proposed method shows successful clustering performance with average value of a precision of 0.732, recall of 0.717, F-score of 0.732, a rand index (RI) score of 0.812 and normalize mutual information (NMI) of 0.56 with overall less than 3% standard deviation in a five-fold cross validation evaluation.https://digitalcommons.odu.edu/gradposters2021_engineering/1004/thumbnail.jp

    Multi-instance graphical transfer clustering for traffic data learning

    Full text link
    © 2016 IEEE. In order to better model complex real-world data and to develop robust features that capture relevant information, we usually employ unsupervised feature learning to learn a layer of features representations from unlabeled data. However, developing domain-specific features for each task is expensive, time-consuming and requires expertise of the data. In this paper, we introduce multi-instance clustering and graphical learning to unsupervised transfer learning. For a better clustering efficient, we proposed a set of algorithms on the application of traffic data learning, instance feature representation, distance calculation of multi-instance clustering, multi-instance graphical cluster initialisation, multi-instance multi-cluster update, and graphical multi-instance transfer clustering (GMITC). In the end of this paper, we examine the proposed algorithms on the Eastwest datasets by couples of baselines. The experiment results indicate that our proposed algorithms can get higher clustering accuracy and much higher programming speed

    Efficient Hardware Architecture for Correlation-Based Spike Detection and Unsupervised Clustering

    Get PDF
    This chapter presents a novel hardware architecture for correlation-based spike detection and unsupervised clustering. The architecture is able to utilize the information extracted from the results of spike clustering for efficient spike detection. The architecture supports the fast computation for the normalized correlation and OSORT operations. The normalized correlation is used for template matching for accurate spike detection. The OSORT algorithm is adopted for unsupervised classification of the detected spikes. The mean of spikes of each cluster produced by the OSORT algorithm is used as the templates for subsequent detection. The architecture adopts postnormalization technique for reducing the area costs. Modified OSORT operations are also proposed for facilitating unsupervised clustering by hardware. The proposed architecture is implemented by field programmable gate array (FPGA) for performance evaluation. In addition to attaining high detection and classification accuracy for spike sorting, experimental results reveal that the proposed architecture is an efficient design providing low area cost and high throughput for real-time offline spike sorting applications

    Clustering of Time Series Data: Measures, Methods, and Applications

    Get PDF
    Clustering is an essential branch of data mining and statistical analysis that could help us explore the distribution of data and extract knowledge. With the broad accumulation and application of time series data, the study of its clustering is a natural extension of existing unsupervised learning heuristics. We discuss the components which configure the clustering of time series data, specifically, the similarity measure, the clustering heuristic, the evaluation of cluster quality, and the applications of said heuristics. Being the groundwork for the task of data analysis, we propose a scalable and efficient time series similarity measure: segmented-Dynamic Time Warping. For time series clustering, we formulate the Distance Density Clustering heuristic, a deterministic clustering algorithm that adopts concepts from both density and distance separation. In addition, we explored the characteristics and discussed the limitations of existing cluster evaluation methods. Finally, all components lead to the goal of real-world applications
    corecore