19 research outputs found

    Machine-Part cell formation through visual decipherable clustering of Self Organizing Map

    Full text link
    Machine-part cell formation is used in cellular manufacturing in order to process a large variety, quality, lower work in process levels, reducing manufacturing lead-time and customer response time while retaining flexibility for new products. This paper presents a new and novel approach for obtaining machine cells and part families. In the cellular manufacturing the fundamental problem is the formation of part families and machine cells. The present paper deals with the Self Organising Map (SOM) method an unsupervised learning algorithm in Artificial Intelligence, and has been used as a visually decipherable clustering tool of machine-part cell formation. The objective of the paper is to cluster the binary machine-part matrix through visually decipherable cluster of SOM color-coding and labelling via the SOM map nodes in such a way that the part families are processed in that machine cells. The Umatrix, component plane, principal component projection, scatter plot and histogram of SOM have been reported in the present work for the successful visualization of the machine-part cell formation. Computational result with the proposed algorithm on a set of group technology problems available in the literature is also presented. The proposed SOM approach produced solutions with a grouping efficacy that is at least as good as any results earlier reported in the literature and improved the grouping efficacy for 70% of the problems and found immensely useful to both industry practitioners and researchers.Comment: 18 pages,3 table, 4 figure

    Using Discriminant Analysis to Verify the Clustering of Self-Organizing Map

    Get PDF
    The data models according to the hot spots spreading in Indonesian forests are usually available with the large of feature space and heterogeneous of distribution patterns. The complexities of this hot spot data structure are central to the present analysis. Clustering of the hot spot regions that persist over time are good indicators of fire risk problems. Therefore, the self-organizing map (SOM) was implemented for clustering hot spot regions. This method is a nonlinear statistical technique that can be used for solving data problems that involved classification and information visualization. The finding of study shows that SOM has provided a classification of hot spot via regions into some different clusters. However, a specification of the cluster is needed when the SOM nodes does not clearly reveal the borders of cluster. Under these circumstances, a supervised learning of discriminant analysis (DA) is used to validate the SOM clusters. The main purpose of DA is to predict cluster membership according to a given prior cluster information, through distance measures and distinct coloring of the nodes in the SOM. DA gave highly accurate cluster discrimination, which shows that this method can be a useful tool to verify the SOM clustering. The combination of the proposed methods is a reliable means of classifying and visualizing of the data, and enables interpretation of the disparities of fire risk by regions in forest on the basis of the hot spot data

    Clustering based on weighted ensemble

    Get PDF
    The clustering is an ill-posed problem and it has been proven that there is no algorithm that would satisfy all the assumptions about good clustering. This is why numerous clustering algorithms exist, based on various theories and approaches, one of them being the well-known Kohonen’s self-organizing map (SOM). Unfortunately, after training the SOM there is no explicitly obtained information about clusters in the underlying data, so another technique for grouping SOM units has to be applied afterwards. In the thesis, a contribution towards a two-level clustering of the SOM is presented, employing principles of Gravitational Law. The proposed algorithm for gravitational clustering of the SOM (gSOM) is capable of discovering complex cluster shapes, not only limited to the spherical ones, and is able to automatically determine the number of clusters. Experimental comparison with other clustering techniques is conducted on synthetic and real-world data. We show that gSOM achieves promising results especially on gene-expression data. As there is no clustering algorithm that can solve all the problems, it turns out as very beneficial to analyse the data using multiple partitions of them – an ensemble of partitions. Cluster-ensemble methods have emerged recently as an effective approach to stabilize and boost the performance of the single-clustering algorithms. Basically, data clustering with an ensemble involves two steps: generation of the ensemble with single-clustering methods and the combination of the obtained solutions to produce a final consensus partition of the data. To alleviate the consensus step the weighted cluster ensemble was proposed that tries to assess the relevance of ensemble members. One way to achieve this is to employ internal cluster validity indices to perform partition relevance analysis (PRA). Our contribution here is two-fold: first, we propose a novel cluster validity index DNs that extends the Dunn’s index and is based on the shortest paths between the data points considering the Gabriel graph on the data; second, we propose an enhancement to the weighted cluster ensemble approach by introducing the reduction step after the assessment of the ensemble partitions is done. The developed partition relevance analysis with the reduction step (PRAr) yields promising results when plugged in the three consensus functions, based on the evidence accumulation principle. In the thesis we address all the major stages of data clustering: data generation, data analysis using single-clustering algorithms, cluster validity using internal end external indices, and finally the cluster ensemble approach with the focus on the weighted variants. All the contributions are compared to the state-of-art methods using datasets from various problem domains. Results are positive and encourage the inclusion of the proposed algorithms in the machine-learning practitioner’s toolbox

    Clustering based on weighted ensemble

    Get PDF
    The clustering is an ill-posed problem and it has been proven that there is no algorithm that would satisfy all the assumptions about good clustering. This is why numerous clustering algorithms exist, based on various theories and approaches, one of them being the well-known Kohonen’s self-organizing map (SOM). Unfortunately, after training the SOM there is no explicitly obtained information about clusters in the underlying data, so another technique for grouping SOM units has to be applied afterwards. In the thesis, a contribution towards a two-level clustering of the SOM is presented, employing principles of Gravitational Law. The proposed algorithm for gravitational clustering of the SOM (gSOM) is capable of discovering complex cluster shapes, not only limited to the spherical ones, and is able to automatically determine the number of clusters. Experimental comparison with other clustering techniques is conducted on synthetic and real-world data. We show that gSOM achieves promising results especially on gene-expression data. As there is no clustering algorithm that can solve all the problems, it turns out as very beneficial to analyse the data using multiple partitions of them – an ensemble of partitions. Cluster-ensemble methods have emerged recently as an effective approach to stabilize and boost the performance of the single-clustering algorithms. Basically, data clustering with an ensemble involves two steps: generation of the ensemble with single-clustering methods and the combination of the obtained solutions to produce a final consensus partition of the data. To alleviate the consensus step the weighted cluster ensemble was proposed that tries to assess the relevance of ensemble members. One way to achieve this is to employ internal cluster validity indices to perform partition relevance analysis (PRA). Our contribution here is two-fold: first, we propose a novel cluster validity index DNs that extends the Dunn’s index and is based on the shortest paths between the data points considering the Gabriel graph on the data; second, we propose an enhancement to the weighted cluster ensemble approach by introducing the reduction step after the assessment of the ensemble partitions is done. The developed partition relevance analysis with the reduction step (PRAr) yields promising results when plugged in the three consensus functions, based on the evidence accumulation principle. In the thesis we address all the major stages of data clustering: data generation, data analysis using single-clustering algorithms, cluster validity using internal end external indices, and finally the cluster ensemble approach with the focus on the weighted variants. All the contributions are compared to the state-of-art methods using datasets from various problem domains. Results are positive and encourage the inclusion of the proposed algorithms in the machine-learning practitioner’s toolbox

    Exploration into The Effect of The Real Life Production Factors in The Assessment of Cellular Manufacturing System

    Get PDF
    نظام التصنيع الخلوي هو فلسفة تصنيع تعتمد على اسس تكنولوجيا المجموعة. لنظام التصنيع الخلوي فوائد ايجابية في تحسين النوعية وزيادة الانتاجية. ان احد مراحل التصنيع الخلوي المهمة تسمى مرحلة التقييم (FA). تعتبر نتائج مرحلة التقييم  نتائج تنبؤية للمرحلة اللاحقة وهي مرحلة التصميم والتي تسمى تكوين الخلايا (CF). وخلال مرحلة التقييم يتم: تحديد عدد خلايا المكائن المتكونة؛ القرار حول تطبيق نظام التصنيع الخلوي ام لا واخيرا نوعية الحل. ان معظم الدراسات السابقة قد ركزت على دراسة تأثير العوامل الانتاجية على مرحلة التصميم (CF) وسجلت نتائج مهمة لهذه العوامل. هذا البحث يمثل محاولة لدراسة تاثير هذه العوامل الانتاجية على مرحلة التقييم (FA). لهذا الغرض تم اختيار اثنان من معاملات التشابه التي تستند على العوامل الانتاجية (حجم الانتاج وحجم الدفعة). النتائج التي تم استحصالها بأستخدام معاملي التشابه المذكورين تم مقارنتها مع احد معاملات التشابه المعروفة والمستخدمة بشكل واسع وتعرف بمعاملات التشابه ذات الاستخدام العام. ومنها معامل يدعى (جاكارد). ان نتائج البحث اشارت الى عدم وجود تأثير مهم عند استخدام هذه العوامل الانتاجية في مرحلة التقييم حيث ان 84% من المصفوفات انتجت نفس العدد من خلايا المكائن بأستخدام معاملات التشابه الثلاثة المختلفة في حين ان 16% فقط من المصفوفات انتجت عدد مختلف من خلايا المكائن. وبناءا على النتائج المستحصلة فأن مصفوفة (صفر-1) ومعامل التشابه العام (جاكارد) يكفي لاستخدامه في مرحلة التقييم لتحديد عدد خلايا المكائن.Cellular Manufacturing (CM) is a production philosophy that operates in view of  the Group Technology (GT) morality. CM offers a positive impact in the terms of enhancing the quality and increasing the productivity. One of the earlier and essential stages in the CM is known as a Feasibility Assessment (FA). FA considers as an evaluation stage and its results consider as a prediction results for the next design stage called Cell Formation (CF). The output of the FA includes the predicted number of machine cells, the decision of applying or not the CM and the quality of the expected solution. Most of the previous studies focused on studying the influence of the real life production features on the second stage (CF) and recorded significant results. However, an attempt was carried out in the current paper to study the influence of the real life production features on the first stage FA. For this purpose, 19 data sets, two Similarity Coefficients (SCs) based on the real life production features known as production volume and batch size were selected. The results of these two features compared with the results of one well known General Purpose Similarity Coefficient (GPSC) known as Jaccard. Jaccard works based on using only (0,1) matrix as an input data. The output of the current research referred that there is no significant influence of the real life production features on the FA, where 84% of data sets produced the same number of machine cells by using all the three different types of SCs. However, (16%) of datasets created different solutions Thus, Datasets based on (0,1) matrix and (GPSC), (Jaccard) are sufficient to use in the FA to predict the number of machine cells

    A Review on Data Clustering Algorithms for Mixed Data

    Get PDF
    Clustering is the unsupervised classification of patterns into groups (clusters). The clustering problem has been addressed in many contexts and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. In general, clustering is a method of dividing the data into groups of similar objects. One of significant research areas in data mining is to develop methods to modernize knowledge by using the existing knowledge, since it can generally augment mining efficiency, especially for very bulky database. Data mining uncovers hidden, previously unknown, and potentially useful information from large amounts of data. This paper presents a general survey of various clustering algorithms. In addition, the paper also describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering

    Characterisation of extreme winter precipitation in Mediterranean coastal sites and associated anomalous atmospheric circulation patterns

    Get PDF
    We present an analysis of daily extreme precipitation events for the extended winter season (October–March) at 20 Mediterranean coastal sites covering the period 1950–2006. The heavy tailed behaviour of precipitation extremes and estimated return levels, including associated uncertainties, are derived applying a procedure based on the Generalized Pareto Distribution, in combination with recently developed methods. Precipitation extremes have an important contribution to make seasonal totals (approximately 60% for all series). Three stations (one in the western Mediterranean and the others in the eastern basin) have a 5-year return level above 100 mm, while the lowest value (estimated for two Italian series) is equal to 58 mm. As for the 50-year return level, an Italian station (Genoa) has the highest value of 264 mm, while the other values range from 82 to 200 mm. Furthermore, six series (from stations located in France, Italy, Greece, and Cyprus) show a significant negative tendency in the probability of observing an extreme event. The relationship between extreme precipitation events and the large scale atmospheric circulation at the upper, mid and low troposphere is investigated by using NCEP/NCAR reanalysis data. A 2-step classification procedure identifies three significant anomaly patterns both for the western-central and eastern part of the Mediterranean basin. In the western Mediterranean, the anomalous southwesterly surface to mid-tropospheric flow is connected with enhanced moisture transport from the Atlantic. During ≥5-year return level events, the subtropical jet stream axis is aligned with the African coastline and interacts with the eddy-driven jet stream. This is connected with enhanced large scale ascending motions, instability and leads to the development of severe precipitation events. For the eastern Mediterranean extreme precipitation events, the identified anomaly patterns suggest warm air advection connected with anomalous ascent motions and an increase of the low- to mid-tropospheric moisture. Furthermore, the jet stream position (during ≥5-year return level events) supports the eastern basin being in a divergence area, where ascent motions are favoured. Our results contribute to an improved understanding of daily precipitation extremes in the cold season and associated large scale atmospheric features
    corecore