19 research outputs found
Machine-Part cell formation through visual decipherable clustering of Self Organizing Map
Machine-part cell formation is used in cellular manufacturing in order to
process a large variety, quality, lower work in process levels, reducing
manufacturing lead-time and customer response time while retaining flexibility
for new products. This paper presents a new and novel approach for obtaining
machine cells and part families. In the cellular manufacturing the fundamental
problem is the formation of part families and machine cells. The present paper
deals with the Self Organising Map (SOM) method an unsupervised learning
algorithm in Artificial Intelligence, and has been used as a visually
decipherable clustering tool of machine-part cell formation. The objective of
the paper is to cluster the binary machine-part matrix through visually
decipherable cluster of SOM color-coding and labelling via the SOM map nodes in
such a way that the part families are processed in that machine cells. The
Umatrix, component plane, principal component projection, scatter plot and
histogram of SOM have been reported in the present work for the successful
visualization of the machine-part cell formation. Computational result with the
proposed algorithm on a set of group technology problems available in the
literature is also presented. The proposed SOM approach produced solutions with
a grouping efficacy that is at least as good as any results earlier reported in
the literature and improved the grouping efficacy for 70% of the problems and
found immensely useful to both industry practitioners and researchers.Comment: 18 pages,3 table, 4 figure
Using Discriminant Analysis to Verify the Clustering of Self-Organizing Map
The data models according to the hot spots spreading in Indonesian forests are usually available with the large of feature space and heterogeneous of distribution patterns. The complexities of this hot spot data structure are central to the present analysis. Clustering of the hot spot regions that persist over time are good indicators of fire risk problems. Therefore, the self-organizing map (SOM) was implemented for clustering hot spot regions. This method is a nonlinear statistical technique that can be used for solving data problems that involved classification and information visualization. The finding of study shows that SOM has provided a classification of hot spot via regions into some different clusters. However, a specification of the cluster is needed when the SOM nodes does not clearly reveal the borders of cluster. Under these circumstances, a supervised learning of discriminant analysis (DA) is used to validate the SOM clusters. The main purpose of DA is to predict cluster membership according to a given prior cluster information, through distance measures and distinct coloring of the nodes in the SOM. DA gave highly accurate cluster discrimination, which shows that this method can be a useful tool to verify the SOM clustering. The combination of the proposed methods is a reliable means of classifying and visualizing of the data, and enables interpretation of the disparities of fire risk by regions in forest on the basis of the hot spot data
Clustering based on weighted ensemble
The clustering is an ill-posed problem and it has been proven that there is no algorithm
that would satisfy all the assumptions about good clustering. This is why numerous
clustering algorithms exist, based on various theories and approaches, one of
them being the well-known Kohonen’s self-organizing map (SOM). Unfortunately,
after training the SOM there is no explicitly obtained information about clusters in
the underlying data, so another technique for grouping SOM units has to be applied
afterwards. In the thesis, a contribution towards a two-level clustering of the SOM
is presented, employing principles of Gravitational Law. The proposed algorithm for
gravitational clustering of the SOM (gSOM) is capable of discovering complex cluster
shapes, not only limited to the spherical ones, and is able to automatically determine
the number of clusters. Experimental comparison with other clustering techniques is
conducted on synthetic and real-world data. We show that gSOM achieves promising
results especially on gene-expression data.
As there is no clustering algorithm that can solve all the problems, it turns out as
very beneficial to analyse the data using multiple partitions of them – an ensemble of
partitions. Cluster-ensemble methods have emerged recently as an effective approach
to stabilize and boost the performance of the single-clustering algorithms. Basically,
data clustering with an ensemble involves two steps: generation of the ensemble with
single-clustering methods and the combination of the obtained solutions to produce a
final consensus partition of the data. To alleviate the consensus step the weighted cluster
ensemble was proposed that tries to assess the relevance of ensemble members. One
way to achieve this is to employ internal cluster validity indices to perform partition
relevance analysis (PRA). Our contribution here is two-fold: first, we propose a novel
cluster validity index DNs that extends the Dunn’s index and is based on the shortest
paths between the data points considering the Gabriel graph on the data; second, we propose an enhancement to the weighted cluster ensemble approach by introducing the
reduction step after the assessment of the ensemble partitions is done. The developed
partition relevance analysis with the reduction step (PRAr) yields promising results
when plugged in the three consensus functions, based on the evidence accumulation
principle.
In the thesis we address all the major stages of data clustering: data generation, data
analysis using single-clustering algorithms, cluster validity using internal end external
indices, and finally the cluster ensemble approach with the focus on the weighted variants.
All the contributions are compared to the state-of-art methods using datasets
from various problem domains. Results are positive and encourage the inclusion of
the proposed algorithms in the machine-learning practitioner’s toolbox
Clustering based on weighted ensemble
The clustering is an ill-posed problem and it has been proven that there is no algorithm
that would satisfy all the assumptions about good clustering. This is why numerous
clustering algorithms exist, based on various theories and approaches, one of
them being the well-known Kohonen’s self-organizing map (SOM). Unfortunately,
after training the SOM there is no explicitly obtained information about clusters in
the underlying data, so another technique for grouping SOM units has to be applied
afterwards. In the thesis, a contribution towards a two-level clustering of the SOM
is presented, employing principles of Gravitational Law. The proposed algorithm for
gravitational clustering of the SOM (gSOM) is capable of discovering complex cluster
shapes, not only limited to the spherical ones, and is able to automatically determine
the number of clusters. Experimental comparison with other clustering techniques is
conducted on synthetic and real-world data. We show that gSOM achieves promising
results especially on gene-expression data.
As there is no clustering algorithm that can solve all the problems, it turns out as
very beneficial to analyse the data using multiple partitions of them – an ensemble of
partitions. Cluster-ensemble methods have emerged recently as an effective approach
to stabilize and boost the performance of the single-clustering algorithms. Basically,
data clustering with an ensemble involves two steps: generation of the ensemble with
single-clustering methods and the combination of the obtained solutions to produce a
final consensus partition of the data. To alleviate the consensus step the weighted cluster
ensemble was proposed that tries to assess the relevance of ensemble members. One
way to achieve this is to employ internal cluster validity indices to perform partition
relevance analysis (PRA). Our contribution here is two-fold: first, we propose a novel
cluster validity index DNs that extends the Dunn’s index and is based on the shortest
paths between the data points considering the Gabriel graph on the data; second, we propose an enhancement to the weighted cluster ensemble approach by introducing the
reduction step after the assessment of the ensemble partitions is done. The developed
partition relevance analysis with the reduction step (PRAr) yields promising results
when plugged in the three consensus functions, based on the evidence accumulation
principle.
In the thesis we address all the major stages of data clustering: data generation, data
analysis using single-clustering algorithms, cluster validity using internal end external
indices, and finally the cluster ensemble approach with the focus on the weighted variants.
All the contributions are compared to the state-of-art methods using datasets
from various problem domains. Results are positive and encourage the inclusion of
the proposed algorithms in the machine-learning practitioner’s toolbox
Exploration into The Effect of The Real Life Production Factors in The Assessment of Cellular Manufacturing System
نظام التصنيع الخلوي هو فلسفة تصنيع تعتمد على اسس تكنولوجيا المجموعة. لنظام التصنيع الخلوي فوائد ايجابية في تحسين النوعية وزيادة الانتاجية. ان احد مراحل التصنيع الخلوي المهمة تسمى مرحلة التقييم (FA). تعتبر نتائج مرحلة التقييم نتائج تنبؤية للمرحلة اللاحقة وهي مرحلة التصميم والتي تسمى تكوين الخلايا (CF). وخلال مرحلة التقييم يتم: تحديد عدد خلايا المكائن المتكونة؛ القرار حول تطبيق نظام التصنيع الخلوي ام لا واخيرا نوعية الحل. ان معظم الدراسات السابقة قد ركزت على دراسة تأثير العوامل الانتاجية على مرحلة التصميم (CF) وسجلت نتائج مهمة لهذه العوامل. هذا البحث يمثل محاولة لدراسة تاثير هذه العوامل الانتاجية على مرحلة التقييم (FA). لهذا الغرض تم اختيار اثنان من معاملات التشابه التي تستند على العوامل الانتاجية (حجم الانتاج وحجم الدفعة). النتائج التي تم استحصالها بأستخدام معاملي التشابه المذكورين تم مقارنتها مع احد معاملات التشابه المعروفة والمستخدمة بشكل واسع وتعرف بمعاملات التشابه ذات الاستخدام العام. ومنها معامل يدعى (جاكارد). ان نتائج البحث اشارت الى عدم وجود تأثير مهم عند استخدام هذه العوامل الانتاجية في مرحلة التقييم حيث ان 84% من المصفوفات انتجت نفس العدد من خلايا المكائن بأستخدام معاملات التشابه الثلاثة المختلفة في حين ان 16% فقط من المصفوفات انتجت عدد مختلف من خلايا المكائن. وبناءا على النتائج المستحصلة فأن مصفوفة (صفر-1) ومعامل التشابه العام (جاكارد) يكفي لاستخدامه في مرحلة التقييم لتحديد عدد خلايا المكائن.Cellular Manufacturing (CM) is a production philosophy that operates in view of the Group Technology (GT) morality. CM offers a positive impact in the terms of enhancing the quality and increasing the productivity. One of the earlier and essential stages in the CM is known as a Feasibility Assessment (FA). FA considers as an evaluation stage and its results consider as a prediction results for the next design stage called Cell Formation (CF). The output of the FA includes the predicted number of machine cells, the decision of applying or not the CM and the quality of the expected solution. Most of the previous studies focused on studying the influence of the real life production features on the second stage (CF) and recorded significant results. However, an attempt was carried out in the current paper to study the influence of the real life production features on the first stage FA. For this purpose, 19 data sets, two Similarity Coefficients (SCs) based on the real life production features known as production volume and batch size were selected. The results of these two features compared with the results of one well known General Purpose Similarity Coefficient (GPSC) known as Jaccard. Jaccard works based on using only (0,1) matrix as an input data. The output of the current research referred that there is no significant influence of the real life production features on the FA, where 84% of data sets produced the same number of machine cells by using all the three different types of SCs. However, (16%) of datasets created different solutions Thus, Datasets based on (0,1) matrix and (GPSC), (Jaccard) are sufficient to use in the FA to predict the number of machine cells
A Review on Data Clustering Algorithms for Mixed Data
Clustering is the unsupervised classification of patterns into groups (clusters). The clustering problem has been addressed in many contexts
and by researchers in many disciplines; this reflects its broad appeal and usefulness as one of the steps in exploratory data analysis. In general, clustering is a
method of dividing the data into groups of similar objects. One of significant research areas in data mining is to develop methods to modernize knowledge by using
the existing knowledge, since it can generally augment mining efficiency, especially for very bulky database. Data mining uncovers hidden, previously unknown,
and potentially useful information from large amounts of data. This paper presents a general survey of various clustering algorithms. In addition, the paper also
describes the efficiency of Self-Organized Map (SOM) algorithm in enhancing the mixed data clustering
Characterisation of extreme winter precipitation in Mediterranean coastal sites and associated anomalous atmospheric circulation patterns
We present an analysis of daily extreme precipitation events for the extended winter season (October–March) at 20 Mediterranean coastal sites covering the period 1950–2006. The heavy tailed behaviour of precipitation extremes and estimated return levels, including associated uncertainties, are derived applying a procedure based on the Generalized Pareto Distribution, in combination with recently developed methods. Precipitation extremes have an important contribution to make seasonal totals (approximately 60% for all series). Three stations (one in the western Mediterranean and the others in the eastern basin) have a 5-year return level above 100 mm, while the lowest value (estimated for two Italian series) is equal to 58 mm. As for the 50-year return level, an Italian station (Genoa) has the highest value of 264 mm, while the other values range from 82 to 200 mm. Furthermore, six series (from stations located in France, Italy, Greece, and Cyprus) show a significant negative tendency in the probability of observing an extreme event. The relationship between extreme precipitation events and the large scale atmospheric circulation at the upper, mid and low troposphere is investigated by using NCEP/NCAR reanalysis data. A 2-step classification procedure identifies three significant anomaly patterns both for the western-central and eastern part of the Mediterranean basin. In the western Mediterranean, the anomalous southwesterly surface to mid-tropospheric flow is connected with enhanced moisture transport from the Atlantic. During ≥5-year return level events, the subtropical jet stream axis is aligned with the African coastline and interacts with the eddy-driven jet stream. This is connected with enhanced large scale ascending motions, instability and leads to the development of severe precipitation events. For the eastern Mediterranean extreme precipitation events, the identified anomaly patterns suggest warm air advection connected with anomalous ascent motions and an increase of the low- to mid-tropospheric moisture. Furthermore, the jet stream position (during ≥5-year return level events) supports the eastern basin being in a divergence area, where ascent motions are favoured. Our results contribute to an improved understanding of daily precipitation extremes in the cold season and associated large scale atmospheric features