1,272 research outputs found

    A Novel Approach to Fuzzy Clustering based on a Dissimilarity Relation extracted from Data using a TS System

    Get PDF
    Clustering refers to the process of unsupervised partitioning of a data set based on a dissimilarity measure, which determines the cluster shape. Considering that cluster shapes may change from one cluster to another, it would be of the utmost importance to extract the dissimilarity measure directly from the data by means of a data model. On the other hand, a model construction requires some kind of supervision of the data structure, which is exactly what we look for during clustering. So, the lower the supervision degree used to build the data model, the more it makes sense to resort to a data model for clustering purposes. Conscious of this, we propose to exploit very few pairs of patterns with known dissimilarity to build a TS system which models the dissimilarity relation. Among other things, the rules of the TS system provide an intuitive description of the dissimilarity relation itself. Then we use the TS system to build a dissimilarity matrix which is fed as input to an unsupervised fuzzy relational clustering algorithm, denoted any relation clustering algorithm (ARCA), which partitions the data set based on the proximity of the vectors containing the dissimilarity values between each pattern and all the other patterns in the data set. We show that combining the TS system and the ARCA algorithm allows us to achieve high classification performance on a synthetic data set and on two real data sets. Further, we discuss how the rules of the TS system represent a sort of linguistic description of the dissimilarity relation

    Stochastic information granules extraction for graph embedding and classification

    Get PDF
    3noopenGraphs are data structures able to efficiently describe real-world systems and, as such, have been extensively used in recent years by many branches of science, including machine learning engineering. However, the design of efficient graph-based pattern recognition systems is bottlenecked by the intrinsic problem of how to properly match two graphs. In this paper, we investigate a granular computing approach for the design of a general purpose graph-based classification system. The overall framework relies on the extraction of meaningful pivotal substructures on the top of which an embedding space can be build and in which the classification can be performed without limitations. Due to its importance, we address whether information can be preserved by performing stochastic extraction on the training data instead of performing an exhaustive extraction procedure which is likely to be unfeasible for large datasets. Tests on benchmark datasets show that stochastic extraction can lead to a meaningful set of pivotal substructures with a much lower memory footprint and overall computational burden, making the proposed strategies suitable also for dealing with big datasets.openAccademicoBaldini, Luca; Martino, Alessio; Rizzi, AntonelloBaldini, Luca; Martino, Alessio; Rizzi, Antonell

    Relational data clustering algorithms with biomedical applications

    Get PDF

    A multi-objective optimization approach for the synthesis of granular computing-based classification systems in the graph domain

    Get PDF
    The synthesis of a pattern recognition system usually aims at the optimization of a given performance index. However, in many real-world scenarios, there exist other desired facets to take into account. In this regard, multi-objective optimization acts as the main tool for the optimization of different (and possibly conflicting) objective functions in order to seek for potential trade-offs among them. In this paper, we propose a three-objective optimization problem for the synthesis of a granular computing-based pattern recognition system in the graph domain. The core pattern recognition engine searches for suitable information granules (i.e., recurrent and/or meaningful subgraphs from the training data) on the top of which the graph embedding procedure towards the Euclidean space is performed. In the latter, any classification system can be employed. The optimization problem aims at jointly optimizing the performance of the classifier, the number of information granules and the structural complexity of the classification model. Furthermore, we address the problem of selecting a suitable number of solutions from the resulting Pareto Fronts in order to compose an ensemble of classifiers to be tested on previously unseen data. To perform such selection, we employed a multi-criteria decision making routine by analyzing different case studies that differ on how much each objective function weights in the ranking process. Results on five open-access datasets of fully labeled graphs show that exploiting the ensemble is effective (especially when the structural complexity of the model plays a minor role in the decision making process) if compared against the baseline solution that solely aims at maximizing the performances

    Modelling and recognition of protein contact networks by multiple kernel learning and dissimilarity representations

    Get PDF
    Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins' functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system

    Analyzing the Optimal Performance of Pest Image Segmentation using Non Linear Objective Assessments

    Get PDF
    In modern agricultural field, pest detection is a major role in plant cultivation. In order to increase the Production rate of agricultural field, the presence of whitefly pests which cause leaf discoloration is the major problem.  This emphasizes the necessity of image segmentation, which divides an image into parts that have strong correlations with objects to reflect the actual information collected from the real world. Image processing is affected by illumination conditions, random noise and environmental disturbances due to atmospheric pressure or temperature fluctuation. The quality of pest images is directly affected by atmosphere medium, pressure and temperature. The fuzzy c means (FCM) have been proposed to identify accurate location of whitefly pests. The watershed transform has interesting properties that make it useful for many different image segmentation applications: it is simple and intuitive, can be parallelized, and always produces a complete division of the image. However, when applied to pest image analysis, it has important drawbacks (over segmentation, sensitivity to noise). In this paper, pest image segmentation using marker controlled watershed segmentation is presented. Objective of this paper is segmenting the pest image and comparing the results of fuzzy c means algorithm and marker controlled watershed transformation. The performance of an image segmentation algorithms are compared using nonlinear objective assessment or the quantitative measures like structural content, peak signal to noise ratio, normalized correlation coefficient, average difference and normalized absolute error. Out of the above methods the experimental results show that fuzzy c means algorithm performs better than watershed transformation algorithm in processing pest images

    Fuzzy clustering of ordinal time series based on two novel distances with economic applications

    Full text link
    Time series clustering is a central machine learning task with applications in many fields. While the majority of the methods focus on real-valued time series, very few works consider series with discrete response. In this paper, the problem of clustering ordinal time series is addressed. To this aim, two novel distances between ordinal time series are introduced and used to construct fuzzy clustering procedures. Both metrics are functions of the estimated cumulative probabilities, thus automatically taking advantage of the ordering inherent to the series' range. The resulting clustering algorithms are computationally efficient and able to group series generated from similar stochastic processes, reaching accurate results even though the series come from a wide variety of models. Since the dynamic of the series may vary over the time, we adopt a fuzzy approach, thus enabling the procedures to locate each series into several clusters with different membership degrees. An extensive simulation study shows that the proposed methods outperform several alternative procedures. Weighted versions of the clustering algorithms are also presented and their advantages with respect to the original methods are discussed. Two specific applications involving economic time series illustrate the usefulness of the proposed approaches

    Optimised meta-clustering approach for clustering Time Series Matrices

    Get PDF
    The prognostics (health state) of multiple components represented as time series data stored in vectors and matrices were processed and clustered more effectively and efficiently using the newly devised ‘Meta-Clustering’ approach. These time series data gathered from large applications and systems in diverse fields such as communication, medicine, data mining, audio, visual applications, and sensors. The reason time series data was used as the domain of this research is that meaningful information could be extracted regarding the characteristics of systems and components found in large applications. Also when it came to clustering, only time series data would allow us to group these data according to their life cycle, i.e. from the time which they were healthy until the time which they start to develop faults and ultimately fail. Therefore by proposing a technique that can better process extracted time series data would significantly cut down on space and time consumption which are both crucial factors in data mining. This approach will, as a result, improve the current state of the art pattern recognition algorithms such as K-NM as the clusters will be identified faster while consuming less space. The project also has application implications in the sense that by calculating the distance between the similar components faster while also consuming less space means that the prognostics of multiple components clustered can be realised and understood more efficiently. This was achieved by using the Meta-Clustering approach to process and cluster the time series data by first extracting and storing the time series data as a two-dimensional matrix. Then implementing an enhance K-NM clustering algorithm based on the notion of Meta-Clustering and using the Euclidean distance tool to measure the similarity between the different set of failure patterns in space. This approach would initially classify and organise each component within its own refined individual cluster. This would provide the most relevant set of failure patterns that show the highest level of similarity and would also get rid of any unnecessary data that adds no value towards better understating the failure/health state of the component. Then during the second stage, once these clusters were effectively obtained, the following inner clusters initially formed are thereby grouped into one general cluster that now represents the prognostics of all the processed components. The approach was tested on multivariate time series data extracted from IGBT components within Matlab and the results achieved from this experiment showed that the optimised Meta-Clustering approach proposed does indeed consume less time and space to cluster the prognostics of IGBT components as compared to existing data mining techniques

    Evolutionary Fuzzy Systems for Explainable Artificial Intelligence: Why, When, What for, and Where to?

    Get PDF
    Evolutionary fuzzy systems are one of the greatest advances within the area of computational intelligence. They consist of evolutionary algorithms applied to the design of fuzzy systems. Thanks to this hybridization, superb abilities are provided to fuzzy modeling in many different data science scenarios. This contribution is intended to comprise a position paper developing a comprehensive analysis of the evolutionary fuzzy systems research field. To this end, the "4 W" questions are posed and addressed with the aim of understanding the current context of this topic and its significance. Specifically, it will be pointed out why evolutionary fuzzy systems are important from an explainable point of view, when they began, what they are used for, and where the attention of researchers should be directed to in the near future in this area. They must play an important role for the emerging area of eXplainable Artificial Intelligence (XAI) learning from data
    corecore