2,315 research outputs found

    Unsupervised Distance Metric Learning Using Predictability

    Get PDF
    Distance-based learning methods, like clustering and SVMs, are dependent on good distance metrics. This paper does unsupervised metric learning in the context of clustering. We seek transformations of data which give clean and well separated clusters where clean clusters are those for which membership can be accurately predicted. The transformation (hence distance metric) is obtained by minimizing the blur ratio, which is defined as the ratio of the within cluster variance divided by the total data variance in the transformed space. For minimization we propose an iterative procedure, Clustering Predictions of Cluster Membership (CPCM). CPCM alternately (a) predicts cluster memberships (e.g., using linear regression) and (b) clusters these predictions (e.g., using k-means). With linear regression and k-means, this algorithm is guaranteed to converge to a fixed point. The resulting clusters are invariant to linear transformations of original features, and tend to eliminate noise features by driving their weights to zero

    Digital strategy for improving resilience of micro, small, and medium enterprises

    Get PDF
    The COVID-19 pandemic impacted several small and medium-sized enter­prises (SMEs) in Indonesia. The viability of some SMEs' business opera­tions was disturbed, especially those who create non-essential goods like carving. The sales have decreased by up to sixty per cent, causing reduced levels of income and the termination of numerous workers. SMEs in Indonesia are highly considered because businesses are one of the country's most significant economic contributors. This research proposes a strategy for SMEs to improve the resilience of business operations of the SMEs. The K-means method was used to investigate three groups of SMEs: micro, small, and medium. Changes in the SME class before and after the pandemic are investigated through changes in the values of the variables in the SME profile. Then the SWOT method is used to identify internal and external factors with the highest weight, which can be used as a basis for developing strategies to increase the resilience of SMEs. Furthermore, the TOPSIS method determines the best plan for dealing with the new digital era. The result shows that the W-T strategy to utilize social media can be prioritized based on the criteria that significantly impact SMEs' product sales and business resilience

    Search for Evergreens in Science: A Functional Data Analysis

    Full text link
    Evergreens in science are papers that display a continual rise in annual citations without decline, at least within a sufficiently long time period. Aiming to better understand evergreens in particular and patterns of citation trajectory in general, this paper develops a functional data analysis method to cluster citation trajectories of a sample of 1699 research papers published in 1980 in the American Physical Society (APS) journals. We propose a functional Poisson regression model for individual papers' citation trajectories, and fit the model to the observed 30-year citations of individual papers by functional principal component analysis and maximum likelihood estimation. Based on the estimated paper-specific coefficients, we apply the K-means clustering algorithm to cluster papers into different groups, for uncovering general types of citation trajectories. The result demonstrates the existence of an evergreen cluster of papers that do not exhibit any decline in annual citations over 30 years.Comment: 40 pages, 9 figure

    Hierarchical indexing for region based image retrieval

    Get PDF
    Region-based image retrieval system has been an active research area. In this study we developed an improved region-based image retrieval system. The system applies image segmentation to divide an image into discrete regions, which if the segmentation is ideal, correspond to objects. The focus of this research is to improve the capture of regions so as to enhance indexing and retrieval performance and also to provide a better similarity distance computation. During image segmentation, we developed a modified k-means clustering algorithm for image retrieval where hierarchical clustering algorithm is used to generate the initial number of clusters and the cluster centers. In addition, to during similarity distance computation we introduced object weight based on object\u27s uniqueness. Therefore, objects that are not unique such as trees and skies will have less weight. The experimental evaluation is based on the same 1000 COREL color image database with the FuzzyClub, IRM and Geometric Histogram and the performance is compared between them. As compared with existing technique and systems, such as IRM, FuzzyClub, and Geometric Histogram, our study demonstrate the following unique advantages: (i) an improvement in image segmentation accuracy using the modified k-means algorithm (ii)an improvement in retrieval accuracy as a result of a better similarity distance computation that considers the importance and uniqueness of objects in an image
    corecore