66,031 research outputs found

    BigFCM: Fast, Precise and Scalable FCM on Hadoop

    Full text link
    Clustering plays an important role in mining big data both as a modeling technique and a preprocessing step in many data mining process implementations. Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing each data record to belong to more than one cluster to some degree. However, a serious challenge in fuzzy clustering is the lack of scalability. Massive datasets in emerging fields such as geosciences, biology and networking do require parallel and distributed computations with high performance to solve real-world problems. Although some clustering methods are already improved to execute on big data platforms, but their execution time is highly increased for large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named BigFCM is proposed and designed for the Hadoop distributed data platform. Based on the map-reduce programming model, it exploits several mechanisms including an efficient caching design to achieve several orders of magnitude reduction in execution time. Extensive evaluation over multi-gigabyte datasets shows that BigFCM is scalable while it preserves the quality of clustering

    Fuzzy c-Means Clustering untuk Pengenalan Pola Studi kasus Data Saham

    Get PDF
    Fuzzy Clustering is one of the five roles used by data mining experts to transform large amounts of data into useful information, and one method that is often and widely used is Fuzzy c-Means (FCM) Clustering. FCM is a data clustering technique where the existence of each data point in the cluster is based on the degree of membership. This study aims to see the pattern of data samples or data categories using FCM clustering. The analyzed data is stock data on Jakarta Stock Exchange (BEJ) in the Property and Real Estate sector (issuer group). The data mining processes comply Cross Industry Standard Process Model for Data mining Process (Crisp-DM), with several stages, starting with the stage of getting to know the business process (Business Understanding) then studying the data (Data Understanding), continuing with the Data Preparation stage, Modeling stage, Evaluation stage and finally the Deployment stage. In the modeling stage, the FCM model is used. FCM clustering model data mining can analyze data in large databases with many variables and complicated, especially to get patterns from the data. Then a Fuzzy Inference System (FIS) was built based on a known pattern for simulating input data into output data based on fuzzy logic. Keywords: Fuzzy c-Means Clustering, Pattern Recognitio

    Fuzzy Sets Use in Cluster Analysis with a Special Attention to a Fuzzy C-means Clustering Method

    Get PDF
    Táto práca sa zaoberá zhlukovou analýzou, a podrobnejšie zhlukovacími metódami, ktoré používajú fuzzy množiny. V teoretickej časti sú popísané zhlukovacie metódy a transformácie potrebné na zhlukovú analýzu. V praktickej časti aplikujeme na reálne dáta. Tieto dáta predstavujú vstupné dáta z chemicko-transportného modelu CMAQ, ktorý sa používa na získanie výpočtu koncentrácii znečisťujúcich látok v atmosfére. Na tieto dáta aplikujeme dve rôzne metódy, metódu k-means a fuzzy c-means. Pre metódu fuzzy c-means porovnáva dva rôzne prístupy k zvoleniu optimálneho váhového exponentu. Porovnali sme takto vytvorené 3 zhlukovacie štruktúry. Výsledné zhluky si boli podobné a však metóda fuzzy c- means s vyššiu hodnotou váhového exponentu vytvorila zhluky, ktoré nemali žiadnu podobnosť so zhlukovanými veličinami. V závere sme vytvorili regresný model na nájdenie vzťahu medzi vstupnými a výstupnými dátami modelu CMAQ.This master thesis deals with cluster analysis, more specifically with clustering methods that use fuzzy sets. Basic clustering algorithms and necessary multivariate transformations are described in the first chapter. In the practical part, which is in the third chapter we apply fuzzy c-means clustering and k-means clustering on real data. Data used for clustering are the inputs of chemical transport model CMAQ. Model CMAQ is used to approximate concentration of air pollutants in the atmosphere. To the data we will apply two different clustering methods. We have used two different methods to select optimal weighting exponent to find data structure in our data. We have compared all 3 created data structures. The structures resembled each other but with fuzzy c-means clustering, one of the clusters did not resemble any of the clustering inputs. The end of the third chapter is dedicated to an attempt to find a regression model that finds the relationship between inputs and outputs of model CMAQ.

    Data Selection and Fuzzy-Rules Generation for Short-Term Load Forecasting Using ANFIS

    Get PDF
    Forecasting accuracy depends on data identification and model parameters. Volume of data and good analysis are the key factors that influence the accuracy of forecasting algorithm. This paper focused on data analysis with aim of determining the actual variables that affect the load consumption. Correlation analysis was used to determine how the load consumption is related to the forecasting variables (model inputs), and hypothesis test to justify the correlation coefficient of each variable. This produced tree different scenarios which ware used to forecast the load within short-term time frame. On the other hand, subtractive clustering and Fuzzy c-means (FCM) algorithms ware compared in fuzzy rules generation using Adaptive Neuro-Fuzzy Inference System (ANFIS) model, for short term electric load forecasting. Forecasting using Hypothesis test data with Subtractive clustering algorithm gave better accuracy compared to the other two approaches. But FCM algorithm is faster in all the three approaches. In conclusion, hypothesis test on the correlation coefficient of the data is a commendable practice for data selection and analysis in short-term load forecasting. Also, subtractive clustering algorithm is good in generating appropriate number of fuzzy rules, and the number depends on the number of input variables. Fuzzy c-means algorithm reduces the number of the rules irrespective of the number of input variables.

    A Novel Evolutionary Swarm Fuzzy Clustering Approach for Hyperspectral Imagery

    Get PDF
    In land cover assessment, classes often gradually change from one to another. Therefore, it is difficult to allocate sharp boundaries between different classes of interest. To overcome this issue and model such conditions, fuzzy techniques that resemble human reasoning have been proposed as alternatives. Fuzzy C-means is the most common fuzzy clustering technique, but its concept is based on a local search mechanism and its convergence rate is rather slow, especially considering high-dimensional problems (e.g., in processing of hyperspectral images). Here, in order to address those shortcomings of hard approaches, a new approach is proposed, i.e., fuzzy C-means which is optimized by fractional order Darwinian particle swarm optimization. In addition, to speed up the clustering process, the histogram of image intensities is used during the clustering process instead of the raw image data. Furthermore, the proposed clustering approach is combined with support vector machine classification to accurately classify hyperspectral images. The new classification framework is applied on two well-known hyperspectral data sets; Indian Pines and Salinas. Experimental results confirm that the proposed swarm-based clustering approach can group hyperspectral images accurately in a time-efficient manner compared to other existing clustering techniques.PostPrin

    A New Approach to Adaptive Neuro-fuzzy Modeling using Kernel based Clustering

    Get PDF
    Data clustering is a well known technique for fuzzy model identification or fuzzy modelling for apprehending the system behavior in the form of fuzzy if-then rules based on experimental data Fuzzy c- Means FCM clustering and subtractive clustering SC are efficient techniques for fuzzy rule extraction in fuzzy modeling of Adaptive Neuro-fuzzy Inference System ANFIS In this paper we have employed a novel technique to build the rule base of ANFIS based on the kernel based variants of these two clustering techniques which have shown better clustering accuracy In kernel based clustering approach the kernel functions are used to calculate the distance measure between the data points during clustering which enables to map the data to a higher dimensional space This generalization makes data set more distinctly separable which results in more accurate cluster centers and therefore a more precise rule base for the ANFIS can be constructed which increases the prediction performance of the system The performance analysis of ANFIS models built using kernel based FCM and kernel based SC has been done on three business prediction problems viz sales forecasting stock price prediction and qualitative bankruptcy prediction A performance comparison with the ANFIS models based on conventional SC and FCM clustering for each of these forecasting problems has been provided and discusse

    The application of ANFIS prediction models for thermal error compensation on CNC machine tools

    Get PDF
    Thermal errors can have significant effects on CNC machine tool accuracy. The errors come from thermal deformations of the machine elements caused by heat sources within the machine structure or from ambient temperature change. The effect of temperature can be reduced by error avoidance or numerical compensation. The performance of a thermal error compensation system essentially depends upon the accuracy and robustness of the thermal error model and its input measurements. This paper first reviews different methods of designing thermal error models, before concentrating on employing an adaptive neuro fuzzy inference system (ANFIS) to design two thermal prediction models: ANFIS by dividing the data space into rectangular sub-spaces (ANFIS-Grid model) and ANFIS by using the fuzzy c-means clustering method (ANFIS-FCM model). Grey system theory is used to obtain the influence ranking of all possible temperature sensors on the thermal response of the machine structure. All the influence weightings of the thermal sensors are clustered into groups using the fuzzy c-means (FCM) clustering method, the groups then being further reduced by correlation analysis. A study of a small CNC milling machine is used to provide training data for the proposed models and then to provide independent testing data sets. The results of the study show that the ANFIS-FCM model is superior in terms of the accuracy of its predictive ability with the benefit of fewer rules. The residual value of the proposed model is smaller than ±4 μm. This combined methodology can provide improved accuracy and robustness of a thermal error compensation system

    Development of c-means Clustering Based Adaptive Fuzzy Controller for A Flapping Wing Micro Air Vehicle

    Full text link
    Advanced and accurate modelling of a Flapping Wing Micro Air Vehicle (FW MAV) and its control is one of the recent research topics related to the field of autonomous Unmanned Aerial Vehicles (UAVs). In this work, a four wing Natureinspired (NI) FW MAV is modeled and controlled inspiring by its advanced features like quick flight, vertical take-off and landing, hovering, and fast turn, and enhanced manoeuvrability when contrasted with comparable-sized fixed and rotary wing UAVs. The Fuzzy C-Means (FCM) clustering algorithm is utilized to demonstrate the NIFW MAV model, which has points of interest over first principle based modelling since it does not depend on the system dynamics, rather based on data and can incorporate various uncertainties like sensor error. The same clustering strategy is used to develop an adaptive fuzzy controller. The controller is then utilized to control the altitude of the NIFW MAV, that can adapt with environmental disturbances by tuning the antecedent and consequent parameters of the fuzzy system.Comment: this paper is currently under review in Journal of Artificial Intelligence and Soft Computing Researc

    Geoelectrical Data Inversion by Clustering Techniques of Fuzzy Logic to Estimate the Subsurface Layer Model

    Get PDF
    Soft computing based geoelectrical data inversion differs from conventional computing in fixing the uncertainty problems. It is tractable, robust, efficient, and inexpensive. In this paper, fuzzy logic clustering methods are used in the inversion of geoelectrical resistivity data. In order to characterize the subsurface features of the earth one should rely on the true field oriented data validation. This paper supports the field data obtained from the published results and also plays a crucial role in making an interdisciplinary approach to solve complex problems. Three clustering algorithms of fuzzy logic, namely, fuzzy C-means clustering, fuzzy K-means clustering, and fuzzy subtractive clustering, were analyzed with the help of fuzzy inference system (FIS) training on synthetic data. Here in this approach, graphical user interface (GUI) was developed with the integration of three algorithms and the input data (AB/2 and apparent resistivity), while importing will process each algorithm and interpret the layer model parameters (true resistivity and depth). A complete overview on the three above said algorithms is presented in the text. It is understood from the results that fuzzy logic subtractive clustering algorithm gives more reliable results and shows efficacy of soft computing tools in the inversion of geoelectrical resistivity data
    corecore