11,511 research outputs found
An Efficient Visual Analysis Method for Cluster Tendency Evaluation, Data Partitioning and Internal Cluster Validation
Visual methods have been extensively studied and performed in cluster data analysis. Given a pairwise dissimilarity matrix D of a set of n objects, visual methods such as Enhanced-Visual Assessment Tendency (E-VAT) algorithm generally represent D as an n times n image I( overlineD) where the objects are reordered to expose the hidden cluster structure as dark blocks along the diagonal of the image. A major constraint of such methods is their lack of ability to highlight cluster structure when D contains composite shaped datasets. This paper addresses this limitation by proposing an enhanced visual analysis method for cluster tendency assessment, where D is mapped to D' by graph based analysis and then reordered to overlineD' using E-VAT resulting graph based Enhanced Visual Assessment Tendency (GE-VAT). An Enhanced Dark Block Extraction (E-DBE) for automatic determination of the number of clusters in I( overlineD') is then proposed as well as a visual data partitioning method for cluster formation from I( overlineD') based on the disparity between diagonal and off-diagonal blocks using permuted indices of GE-VAT. Cluster validation measures are also performed to evaluate the cluster formation. Extensive experimental results on several complex synthetic, UCI and large real-world data sets are analyzed to validate our algorithm
Neuroengineering of Clustering Algorithms
Cluster analysis can be broadly divided into multivariate data visualization, clustering algorithms, and cluster validation. This dissertation contributes neural network-based techniques to perform all three unsupervised learning tasks. Particularly, the first paper provides a comprehensive review on adaptive resonance theory (ART) models for engineering applications and provides context for the four subsequent papers. These papers are devoted to enhancements of ART-based clustering algorithms from (a) a practical perspective by exploiting the visual assessment of cluster tendency (VAT) sorting algorithm as a preprocessor for ART offline training, thus mitigating ordering effects; and (b) an engineering perspective by designing a family of multi-criteria ART models: dual vigilance fuzzy ART and distributed dual vigilance fuzzy ART (both of which are capable of detecting complex cluster structures), merge ART (aggregates partitions and lessens ordering effects in online learning), and cluster validity index vigilance in fuzzy ART (features a robust vigilance parameter selection and alleviates ordering effects in offline learning). The sixth paper consists of enhancements to data visualization using self-organizing maps (SOMs) by depicting in the reduced dimension and topology-preserving SOM grid information-theoretic similarity measures between neighboring neurons. This visualization\u27s parameters are estimated using samples selected via a single-linkage procedure, thereby generating heatmaps that portray more homogeneous within-cluster similarities and crisper between-cluster boundaries. The seventh paper presents incremental cluster validity indices (iCVIs) realized by (a) incorporating existing formulations of online computations for clusters\u27 descriptors, or (b) modifying an existing ART-based model and incrementally updating local density counts between prototypes. Moreover, this last paper provides the first comprehensive comparison of iCVIs in the computational intelligence literature --Abstract, page iv
Family names as indicators of Britain’s changing regional geography
In recent years the geography of surnames has become increasingly researched in genetics, epidemiology, linguistics and geography. Surnames provide a useful data source for the analysis of population structure, migrations, genetic relationships and levels of cultural diffusion and interaction between communities. The Worldnames database (www.publicprofiler.org/worldnames) of 300 million people from 26 countries georeferenced in many cases to the equivalent of UK Postcode level provides a rich source of surname data. This work has focused on the UK component of this dataset, that is the 2001 Enhanced Electoral Role, georeferenced to Output Area level. Exploratory analysis of the distribution of surnames across the UK shows that clear regions exist, such as Cornwall, Central Wales and Scotland, in agreement with anecdotal evidence. This study is concerned with applying a wide range of methods to the UK dataset to test their sensitivity and consistency to surname regions. Methods used thus far are hierarchical and non-hierarchical clustering, barrier algorithms, such as the Monmonier Algorithm, and Multidimensional Scaling. These, to varying degrees, have highlighted the regionality of UK surnames and provide strong foundations to future work and refinement in the UK context. Establishing a firm methodology has enabled comparisons to be made with data from the Great British 1881 census, developing insights into population movements from within and outside Great Britain
Recommended from our members
Digging into Lipid Membrane Permeation for Cardiac Ion Channel Blocker d-Sotalol with All-Atom Simulations.
Interactions of drug molecules with lipid membranes play crucial role in their accessibility of cellular targets and can be an important predictor of their therapeutic and safety profiles. Very little is known about spatial localization of various drugs in the lipid bilayers, their active form (ionization state) or translocation rates and therefore potency to bind to different sites in membrane proteins. All-atom molecular simulations may help to map drug partitioning kinetics and thermodynamics, thus providing in-depth assessment of drug lipophilicity. As a proof of principle, we evaluated extensively lipid membrane partitioning of d-sotalol, well-known blocker of a cardiac potassium channel Kv11.1 encoded by the hERG gene, with reported substantial proclivity for arrhythmogenesis. We developed the positively charged (cationic) and neutral d-sotalol models, compatible with the biomolecular CHARMM force field, and subjected them to all-atom molecular dynamics (MD) simulations of drug partitioning through hydrated lipid membranes, aiming to elucidate thermodynamics and kinetics of their translocation and thus putative propensities for hydrophobic and aqueous hERG access. We found that only a neutral form of d-sotalol accumulates in the membrane interior and can move across the bilayer within millisecond time scale, and can be relevant to a lipophilic channel access. The computed water-membrane partitioning coefficient for this form is in good agreement with experiment. There is a large energetic barrier for a cationic form of the drug, dominant in water, to cross the membrane, resulting in slow membrane translocation kinetics. However, this form of the drug can be important for an aqueous access pathway through the intracellular gate of hERG. This route will likely occur after a neutral form of a drug crosses the membrane and subsequently re-protonates. Our study serves to demonstrate a first step toward a framework for multi-scale in silico safety pharmacology, and identifies some of the challenges that lie therein
Enhanced Dark Block Extraction Method Performed Automatically to Determine the Number of Clusters in Unlabeled Data Sets
One of the major issues in data cluster analysis is to decide the number of clusters or groups from a set of unlabeled data. In addition, the presentation of cluster should be analyzed to provide the accuracy of clustering objects. This paper propose a new method called Enhanced-Dark Block Extraction (E-DBE), which automatically identifies the number of objects groups in unlabeled datasets. The proposed algorithm relies on the available algorithm for visual assessment of cluster tendency of a dataset, by using several common signal and image processing techniques. The method includes the following steps: 1.Generating an Enhanced Visual Assessment Tendency (E-VAT) image from a dissimilarity matrix which is the input for E-DBE algorithm. 2. Processing image segmentation on E-VAT image to obtain a binary image then performs filter techniques. 3. Performing distance transformation to the filtered binary image and projecting the pixels in the main diagonal alignment of the image to figure a projection signal. 4. Smoothing the outcrop signal, computing its first-order derivative and then detecting major peaks and valleys in the resulting signal to acquire the number of clusters. E-DBE is a parameter-free algorithm to perform cluster analysis. Experiments of the method are presented on several UCI, synthetic and real world datasets
- …