18 research outputs found
High throughput powder diffraction: II Applications of clustering methods and multivariate data analysis
In high throughput crystallography is possible to accumulate over 1000 powder diffraction patterns on a series of related compounds, often polymorphs. We present a method that can analyse such data, automatically sort the patterns into related clusters or classes, characterise each cluster and identify any unusual samples containing, for example, unknown or unexpected polymorphs. Mixtures may be analysed quantitatively if a database of pure phases is available. A key component of the method is a set of visualisation tools based on dendrograms, cluster analysis, pie charts, principal component based score plots and metric multidimensional scaling. Applications are presented to pharmaceutical data, and inorganic compounds. The procedures have been incorporated into the PolySNAP commercial computer software
SmallSteps : an adaptive distance-based clustering algorithm
In this article we propose a new distance-based clustering algorithm. Distance-based clustering methods operate on data sets that are in similarity space, where the similarities/dissimilarities between the objects are given by a matrix. These algorithms have at least O(n2) time complexity, where n is the number of objects. One of the latest distance-based method is Chameleon which, according to experiences, works well only on larger data sets and fails on relatively smaller ones. This contraditcs the fact that the O(n2) time complexity makes the distance-based algorithms unsuitable for huge data sets. Thus we developed a new distance-based method (SmallSteps), which can handle relatively small amount of objects too. In our solution we are looking for connected graphs which have edges with a maximum weight computed on the environments of the objects. The method is capable to detect clusters with different shapes, sizes or densities, it is able to automatically determine the number of clusters and has a special ability to divide clusters into subclusters
An Algorithm for Detecting the Principal Allotment among Fuzzy Clusters and Its Application as a Technique of Reduction of Analyzed Features Space Dimensionality
This paper describes a modification of a possibilistic clustering method based on the concept of allotment among fuzzy clusters. Basic ideas of the method are considered and the concept of a principal allotment among fuzzy clusters is introduced. The paper provides the description of the plan of the algorithm for detection principal allotment. An analysis of experimental results of the proposed algorithmâs application to the Tamuraâs portrait data in comparison with the basic version of the algorithm and with the NERFCM-algorithm is carried out. A methodology of the algorithmâs application to the dimensionality reduction problem is outlined and the application of the methodology is illustrated on the example of Andersonâs Iris data in comparison with the result of principal component analysis. Preliminary conclusions are formulated also
Un modello multicriterio «fuzzy» per la valutazione degli interventi di riqualificazione urbana
Le tecniche multicriterio (Hwang C.L. e Yoon K., 1981; Nijkamp P. e Voogd H., 1989; Rizzo F., 1990) si presentano congruenti con il carattere multidimensionale della valutazione dei piani e dei progetti di riqualificazione urbana, dovendo essere considerata una pluralitĂ di obiettivi derivanti da istanze di natura diversa -economica, sociale, etica, ecologica- e consentendo le tecniche medesime un'ampia rappresentazione delquadro socio-economico, istituzionale ed ambientale, ail'interno del quale il soggetto pubblico dovrĂ assumere la decisione dell'intervento. Nelle operazioni di riquaiificazione urbana, l'analisi multicriterio interviene in un processo nel quale alla definizione -da parte della Pubblica Amministrazione- degli obiettivi e delle azioni, segue la predisposizione dei progetti che formano la materia delle valutazioni richieste per il confronto e la scelta dell'alternativa da realizzare
Clustering uncertain data using voronoi diagrams and R-tree index
We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods. © 2006 IEEE.published_or_final_versio
Fuzzy clustering with spatial-temporal information
Clustering geographical units based on a set of quantitative features observed at several time occasions requires to deal with the complexity of both space and time information. In particular, one should consider (1) the spatial nature of the units to be clustered, (2) the characteristics of the space of multivariate time trajectories, and (3) the uncertainty related to the assignment of a geographical unit to a given cluster on the basis of the above com- plex features. This paper discusses a novel spatially constrained multivariate time series clustering for units characterised by different levels of spatial proximity. In particular, the Fuzzy Partitioning Around Medoids algorithm with Dynamic Time Warping dissimilarity measure and spatial penalization terms is applied to classify multivariate Spatial-Temporal series. The clustering method has been theoretically presented and discussed using both simulated and real data, highlighting its main features. In particular, the capability of embedding different levels of proximity among units, and the ability of considering time series with different length
The Double Galaxy Cluster Abell 2465 I. Basic Properties: Optical Imaging and Spectroscopy
Optical imaging and spectroscopic observations of the z = 0.245 double galaxy
cluster Abell 2465 are described. This object appears to be undergoing a major
merger. It is a double X-ray source and is detected in the radio at 1.4 GHz.
This paper investigates signatures of the interaction of the two components.
Redshifts were measured to determine velocity dispersions and virial radii of
each component. The technique of fuzzy clustering was used to assign membership
weights to the galaxies in each clump. Using redshifts of 93 cluster members
within 1.4 Mpc of the subcluster centres, the virial masses and anisotropy
parameters are derived. 37% of the spectroscopically observed galaxies show
emission lines and are predominantly star forming in the diagnostic diagram. No
strong AGN sources were found. The emission line galaxies tend to lie between
the two cluster centres with more near the SW clump. The luminosity functions
of the two subclusters differ. The NE component is similar to many rich
clusters, while the SW component has more faint galaxies. The NE clump's light
profile follows a single NFW profile with c = 10 while the SW is better fit
with an extended outer region and a compact inner core, consistent with
available X-ray data indicating that the SW clump has a cooling core. The
observed differences and properties of the two components of Abell 2465 are
interpreted to have been caused by a collision 2-4 Gyr ago, after which they
have moved apart and are now near their apocentres, although the start of a
merger remains a possibility. The number of emission line galaxies gives weight
to the idea that galaxy cluster collisions trigger star formation.Comment: 21 pages, 18 Figures Replaced typos, mostly in references To appear
in MNRAS, Accepted 2010 December 16. Received 2010 December 15; in original
form 2010 November 0