182,792 research outputs found

    MOCF: A Multi-Objective Clustering Framework using an Improved Particle Swarm Optimization Algorithm

    Get PDF
    Traditional clustering algorithms, such as K-Means, perform clustering with a single goal in mind. However, in many real-world applications, multiple objective functions must be considered at the same time. Furthermore, traditional clustering algorithms have drawbacks such as centroid selection, local optimal, and convergence. Particle Swarm Optimization (PSO)-based clustering approaches were developed to address these shortcomings. Animals and their social Behaviour, particularly bird flocking and fish schooling, inspire PSO. This paper proposes the Multi-Objective Clustering Framework (MOCF), an improved PSO-based framework. As an algorithm, a Particle Swarm Optimization (PSO) based Multi-Objective Clustering (PSO-MOC) is proposed. It significantly improves clustering efficiency. The proposed framework's performance is evaluated using a variety of real-world datasets. To test the performance of the proposed algorithm, a prototype application was built using the Python data science platform. The empirical results showed that multi-objective clustering outperformed its single-objective counterparts

    Regionalization approaches for the spatial analysis of extremal dependence

    Full text link
    The impact of an extreme climate event depends strongly on its geographical scale. Max-stable processes can be used for the statistical investigation of climate extremes and their spatial dependencies on a continuous area. Most existing parametric models of max-stable processes assume spatial stationarity and are therefore not suitable for the application to data that cover a large and heterogeneous area. For this reason, it has recently been proposed to use a clustering algorithm to divide the area of investigation into smaller regions and to fit parametric max-stable processes to the data within those regions. We investigate this clustering algorithm further and point out that there are cases in which it results in regions on which spatial stationarity is not a reasonable assumption. We propose an alternative clustering algorithm and demonstrate in a simulation study that it can lead to improved results.Comment: 25 pages, 7 figure

    Testing of Hybrid Quantum-Classical K-Means for Nonlinear Noise Mitigation

    Full text link
    Nearest-neighbour clustering is a simple yet powerful machine learning algorithm that finds natural application in the decoding of signals in classical optical-fibre communication systems. Quantum k-means clustering promises a speed-up over the classical k-means algorithm; however, it has been shown to currently not provide this speed-up for decoding optical-fibre signals due to the embedding of classical data, which introduces inaccuracies and slowdowns. Although still not achieving an exponential speed-up for NISQ implementations, this work proposes the generalised inverse stereographic projection as an improved embedding into the Bloch sphere for quantum distance estimation in k-nearest-neighbour clustering, which allows us to get closer to the classical performance. We also use the generalised inverse stereographic projection to develop an analogous classical clustering algorithm and benchmark its accuracy, runtime and convergence for decoding real-world experimental optical-fibre communication data. This proposed `quantum-inspired' algorithm provides an improvement in both the accuracy and convergence rate with respect to the k-means algorithm. Hence, this work presents two main contributions. Firstly, we propose the general inverse stereographic projection into the Bloch sphere as a better embedding for quantum machine learning algorithms; here, we use the problem of clustering quadrature amplitude modulated optical-fibre signals as an example. Secondly, as a purely classical contribution inspired by the first contribution, we propose and benchmark the use of the general inverse stereographic projection and spherical centroid for clustering optical-fibre signals, showing that optimizing the radius yields a consistent improvement in accuracy and convergence rate.Comment: 2023 IEEE Global Communications Conference: Selected Areas in Communications: Quantum Communications and Computin

    Temporal - spatial recognizer for multi-label data

    Get PDF
    Pattern recognition is an important artificial intelligence task with practical applications in many fields such as medical and species distribution. Such application involves overlapping data points which are demonstrated in the multi- label dataset. Hence, there is a need for a recognition algorithm that can separate the overlapping data points in order to recognize the correct pattern. Existing recognition methods suffer from sensitivity to noise and overlapping points as they could not recognize a pattern when there is a shift in the position of the data points. Furthermore, the methods do not implicate temporal information in the process of recognition, which leads to low quality of data clustering. In this study, an improved pattern recognition method based on Hierarchical Temporal Memory (HTM) is proposed to solve the overlapping in data points of multi- label dataset. The imHTM (Improved HTM) method includes improvement in two of its components; feature extraction and data clustering. The first improvement is realized as TS-Layer Neocognitron algorithm which solves the shift in position problem in feature extraction phase. On the other hand, the data clustering step, has two improvements, TFCM and cFCM (TFCM with limit- Chebyshev distance metric) that allows the overlapped data points which occur in patterns to be separated correctly into the relevant clusters by temporal clustering. Experiments on five datasets were conducted to compare the proposed method (imHTM) against statistical, template and structural pattern recognition methods. The results showed that the percentage of success in recognition accuracy is 99% as compared with the template matching method (Featured-Based Approach, Area-Based Approach), statistical method (Principal Component Analysis, Linear Discriminant Analysis, Support Vector Machines and Neural Network) and structural method (original HTM). The findings indicate that the improved HTM can give an optimum pattern recognition accuracy, especially the ones in multi- label dataset

    Feedback-Driven Data Clustering

    Get PDF
    The acquisition of data and its analysis has become a common yet critical task in many areas of modern economy and research. Unfortunately, the ever-increasing scale of datasets has long outgrown the capacities and abilities humans can muster to extract information from them and gain new knowledge. For this reason, research areas like data mining and knowledge discovery steadily gain importance. The algorithms they provide for the extraction of knowledge are mandatory prerequisites that enable people to analyze large amounts of information. Among the approaches offered by these areas, clustering is one of the most fundamental. By finding groups of similar objects inside the data, it aims to identify meaningful structures that constitute new knowledge. Clustering results are also often used as input for other analysis techniques like classification or forecasting. As clustering extracts new and unknown knowledge, it obviously has no access to any form of ground truth. For this reason, clustering results have a hypothetical character and must be interpreted with respect to the application domain. This makes clustering very challenging and leads to an extensive and diverse landscape of available algorithms. Most of these are expert tools that are tailored to a single narrowly defined application scenario. Over the years, this specialization has become a major trend that arose to counter the inherent uncertainty of clustering by including as much domain specifics as possible into algorithms. While customized methods often improve result quality, they become more and more complicated to handle and lose versatility. This creates a dilemma especially for amateur users whose numbers are increasing as clustering is applied in more and more domains. While an abundance of tools is offered, guidance is severely lacking and users are left alone with critical tasks like algorithm selection, parameter configuration and the interpretation and adjustment of results. This thesis aims to solve this dilemma by structuring and integrating the necessary steps of clustering into a guided and feedback-driven process. In doing so, users are provided with a default modus operandi for the application of clustering. Two main components constitute the core of said process: the algorithm management and the visual-interactive interface. Algorithm management handles all aspects of actual clustering creation and the involved methods. It employs a modular approach for algorithm description that allows users to understand, design, and compare clustering techniques with the help of building blocks. In addition, algorithm management offers facilities for the integration of multiple clusterings of the same dataset into an improved solution. New approaches based on ensemble clustering not only allow the utilization of different clustering techniques, but also ease their application by acting as an abstraction layer that unifies individual parameters. Finally, this component provides a multi-level interface that structures all available control options and provides the docking points for user interaction. The visual-interactive interface supports users during result interpretation and adjustment. For this, the defining characteristics of a clustering are communicated via a hybrid visualization. In contrast to traditional data-driven visualizations that tend to become overloaded and unusable with increasing volume/dimensionality of data, this novel approach communicates the abstract aspects of cluster composition and relations between clusters. This aspect orientation allows the use of easy-to-understand visual components and makes the visualization immune to scale related effects of the underlying data. This visual communication is attuned to a compact and universally valid set of high-level feedback that allows the modification of clustering results. Instead of technical parameters that indirectly cause changes in the whole clustering by influencing its creation process, users can employ simple commands like merge or split to directly adjust clusters. The orchestrated cooperation of these two main components creates a modus operandi, in which clusterings are no longer created and disposed as a whole until a satisfying result is obtained. Instead, users apply the feedback-driven process to iteratively refine an initial solution. Performance and usability of the proposed approach were evaluated with a user study. Its results show that the feedback-driven process enabled amateur users to easily create satisfying clustering results even from different and not optimal starting situations

    Mechanical System Topology Optimization for Better Maintenance

    Get PDF
    For a given mechanical equipment, knowing its modular topology has the advantage of facilitating its maintenance. Indeed, during a maintenance problem, we will not act on the whole product except on the failed module (product subsystem) and we would also gain time to detect, diagnose and compensate for the observed failure. On the other hand, the clustering algorithm, which has served as a reference for several works has several limits. It generates much more complex and more expensive modules in terms of coupling costs, which could require more resources, more intervention time and more maintenance work. This has worse consequences for product maintenance, because the more complex the product modules are, the more expensive the maintenance is. We therefore propose an improved clustering algorithm which has the advantage of reducing maintenance costs by reducing the coupling and decoupling costs (Disassembly and reassembly costs) of the modules, generated by the reference algorithm for good maintainability (dis-assemblability). The application is made on a soy roaster. The approach followed in the proposed algorithm consists first of all in defining a DSM (Design Structure Matrix) which will make it possible to define the correction coefficients of the coupling cost, then in formulating an objective function to reduce the coupling costs, and finally to take into account the integrating elements to reduce the size of the modules. The result achieved is the proposal for a modular topology (modular architecture) leading to a significant reduction in maintenance costs. The developed algorithm also allows an economy of scale in reducing the complexity of the modules, promoting good maintainability

    Comparison of Various Improved-Partition Fuzzy c-Means Clustering Algorithms in Fast Color Reduction

    Get PDF
    This paper provides a comparative study of sev- eral enhanced versions of the fuzzy c -means clustering al- gorithm in an application of histogram-based image color reduction. A common preprocessing is performed before clus- tering, consisting of a preliminary color quantization, histogram extraction and selection of frequently occurring colors of the image. These selected colors will be clustered by tested c -means algorithms. Clustering is followed by another common step, which creates the output image. Besides conventional hard (HCM) and fuzzy c -means (FCM) clustering, the so-called generalized improved partition FCM algorithm, and several versions of the suppressed FCM (s-FCM) in its conventional and generalized form, are included in this study. Accuracy is measured as the average color difference between pixels of the input and output image, while efficiency is mostly characterized by the total runtime of the performed color reduction. Nu- merical evaluation found all enhanced FCM algorithms more accurate, and four out of seven enhanced algorithms faster than FCM. All tested algorithms can create reduced color images of acceptable quality

    Satellite image segmentation using RVM and Fuzzy clustering

    Get PDF
    Image segmentation is common but still very challenging problem in the area of image processing but it has its application in many industries and medical field for example target tracking, object recognition and medical image processing. The task of image segmentation is to divide image into number of meaningful pieces on the basis of features of image such as color, texture. In this thesis some recently developed fuzzy clustering algorithms as well as supervised learning classifier Relevance Vector Machine has been used to get improved solution. First of all various fuzzy clustering algorithms such as FCM, DeFCM are used to produce different clustering solutions and then we improve each solution by again classifying remaining pixels of satellite image using Relevance Vector Machine (RVM classifier. Result of different supervised learning classifier such as Support Vector Machine (SVM), Relevance Vector Machine (RVM), K-nearest neighbors (KNN) has been compared on basis of error rate and time. One of the major drawback of any clustering algorithm is their input argument that is number of clusters in unlabelled data. In this thesis an attempt has been made to evaluate optimal number of clusters present in satellite image using DAVIES-BOULDIN Index
    corecore