20,187 research outputs found

    Comparison of chemical clustering methods using graph- and fingerprint-based similarity measures

    Get PDF
    This paper compares several published methods for clustering chemical structures, using both graph- and fingerprint-based similarity measures. The clusterings from each method were compared to determine the degree of cluster overlap. Each method was also evaluated on how well it grouped structures into clusters possessing a non-trivial substructural commonality. The methods which employ adjustable parameters were tested to determine the stability of each parameter for datasets of varying size and composition. Our experiments suggest that both graph- and fingerprint-based similarity measures can be used effectively for generating chemical clusterings; it is also suggested that the CAST and Yin–Chen methods, suggested recently for the clustering of gene expression patterns, may also prove effective for the clustering of 2D chemical structures

    Two-stage clustering in genotype-by-environment analyses with missing data

    Get PDF
    Cluster analysis has been commonly used in genotype-by-environment (G x E) analyses, but current methods are inadequate when the data matrix is incomplete. This paper proposes a new method, referred to as two-stage clustering, which relies on a partitioning of squared Euclidean distance into two independent components, the G x E interaction and the genotype main effect. These components are used in the first and second stages of clustering respectively. Two-stage clustering forms the basis for imputing missing values in the G x E matrix so that a more complete data array is available for other GxE analyses. Imputation for a given genotype uses information from genotypes with similar interaction profiles. This imputation method is shown to improve on an existing nearest cluster method that confounds the G x E interaction and the genotype main effect

    Formation of machine groups and part families in cellular manufacturing systems using a correlation analysis approach

    Get PDF
    The important step in the design of a cellular manufacturing (CM) system is to identify the part families and machine groups and consequently to form manufacturing cells. The scope of this article is to formulate a multivariate approach based on a correlation analysis for solving cell formation problem. The proposed approach is carried out in three phases. In the first phase, the correlation matrix is used as similarity coefficient matrix. In the second phase, Principal Component Analysis (PCA) is applied to find the eigenvalues and eigenvectors on the correlation similarity matrix. A scatter plot analysis as a cluster analysis is applied to make simultaneously machine groups and part families while maximizing correlation between elements. In the third stage, an algorithm is improved to assign exceptional machines and exceptional parts using respectively angle measure and Euclidian distance. The proposed approach is also applied to the general Group Technology (GT) problem in which exceptional machines and part are considered. Furthermore, the proposed approach has the flexibility to consider the number of cells as a dependent or independent variable. Two numerical examples for the design of cell structures are provided in order to illustrate the three phases of proposed approach. The results of a comparative study based on multiple performance criteria show that the present approach is very effective, efficient and practical.cellular manufacturing; cell formation; correlation matrix; Principal Component Analysis; exceptional machines and parts

    Structure fusion based on graph convolutional networks for semi-supervised classification

    Full text link
    Suffering from the multi-view data diversity and complexity for semi-supervised classification, most of existing graph convolutional networks focus on the networks architecture construction or the salient graph structure preservation, and ignore the the complete graph structure for semi-supervised classification contribution. To mine the more complete distribution structure from multi-view data with the consideration of the specificity and the commonality, we propose structure fusion based on graph convolutional networks (SF-GCN) for improving the performance of semi-supervised classification. SF-GCN can not only retain the special characteristic of each view data by spectral embedding, but also capture the common style of multi-view data by distance metric between multi-graph structures. Suppose the linear relationship between multi-graph structures, we can construct the optimization function of structure fusion model by balancing the specificity loss and the commonality loss. By solving this function, we can simultaneously obtain the fusion spectral embedding from the multi-view data and the fusion structure as adjacent matrix to input graph convolutional networks for semi-supervised classification. Experiments demonstrate that the performance of SF-GCN outperforms that of the state of the arts on three challenging datasets, which are Cora,Citeseer and Pubmed in citation networks
    • …
    corecore