52 research outputs found

    Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling

    Full text link
    Fuzzy C-Means (FCM) is a widely used clustering method. However, FCM and its many accelerated variants have low efficiency in the mid-to-late stage of the clustering process. In this stage, all samples are involved in the update of their non-affinity centers, and the fuzzy membership grades of the most of samples, whose assignment is unchanged, are still updated by calculating the samples-centers distances. All those lead to the algorithms converging slowly. In this paper, a new affinity filtering technique is developed to recognize a complete set of the non-affinity centers for each sample with low computations. Then, a new membership scaling technique is suggested to set the membership grades between each sample and its non-affinity centers to 0 and maintain the fuzzy membership grades for others. By integrating those two techniques, FCM based on new affinity filtering and membership scaling (AMFCM) is proposed to accelerate the whole convergence process of FCM. Many experimental results performed on synthetic and real-world data sets have shown the feasibility and efficiency of the proposed algorithm. Compared with the state-of-the-art algorithms, AMFCM is significantly faster and more effective. For example, AMFCM reduces the number of the iteration of FCM by 80% on average

    Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm

    Full text link
    K-Means algorithm is a popular clustering method. However, it has two limitations: 1) it gets stuck easily in spurious local minima, and 2) the number of clusters k has to be given a priori. To solve these two issues, a multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is presented. First, based on the structure of the spurious local minima of the K-Means problem, a multi-prototypes sampling (MPS) is designed to select the appropriate number of multi-prototypes for data with arbitrary shapes. A theoretical proof is given to guarantee that the multi-prototypes selected by MPS can achieve a constant factor approximation to the optimal cost of the K-Means problem. Then, a merging technique, called convex merging (CM), merges the multi-prototypes to get a better local minima without k being given a priori. Specifically, CM can obtain the optimal merging and estimate the correct k. By integrating these two techniques with K-Means algorithm, the proposed MCKM is an efficient and explainable clustering algorithm for escaping the undesirable local minima of K-Means problem without given k first. Experimental results performed on synthetic and real-world data sets have verified the effectiveness of the proposed algorithm

    Characterization of Duck (Anas platyrhynchos) Short Tandem Repeat Variation by Population-Scale Genome Resequencing

    Get PDF
    Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2–6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck

    Selection response and estimation of the genetic parameters for multidimensional measured breast meat yield related traits in a long-term breeding Pekin duck line

    Get PDF
    Objective This study was conducted to estimate the genetic parameters and breeding values of breast meat related traits of Pekin ducks. Selection response was also determined by using ultrasound breast muscle thickness (BMT) measurements in combination with bosom breadth (BB) and keel length (KL) values. Methods The traits analyzed were breast meat weight (BMW), body weight (BW), breast meat percentage (BMP) and the three parameters of breast meat (BB, KL, and BMT). These measurements were derived from studying 15,781 Pekin ducks selected from 10 generations based on breast meat weight. Genetic parameters and breeding value were estimated for the analysis of the breeding process. Results Estimated heritability of BMW and BMP were moderate (0.23 and 0.16, respectively), and heritability of BW was high (0.48). Other traits such as BB, KL, and BMT indicated moderate heritability ranging between 0.11 and 0.28. Significant phenotypic correlations of BMW with BW and BMP were discovered (p<0.05), and genetic correlations of BMW with BW and BMP were positive and high (0.83 and 0.66, respectively). It was noted that BMW had positive correlations with all the other traits. Generational average estimated breeding values of all traits increased substantially over the course of selection, which demonstrated that the ducks responded efficiently to increased breast meat yield after 10 generations of breeding. Conclusion The results indicated that duck BMW had the potential to be increased through genetic selection with positive effects on BW and BMP. The ultrasound BMT, in combination with the measurement of BB and KL, is shown to be essential and effective in the process of high breast meat yield duck breeding

    Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy

    No full text
    Feature selection refers to a vital function in machine learning and data mining. The maximum weight minimum redundancy feature selection method not only considers the importance of features but also reduces the redundancy among features. However, the characteristics of various datasets are not identical, and thus the feature selection method should have different feature evaluation criteria for all datasets. Additionally, high-dimensional data analysis poses a challenge to enhancing the classification performance of the different feature selection methods. This study presents a kernel partial least squares feature selection method on the basis of the enhanced maximum weight minimum redundancy algorithm to simplify the calculation and improve the classification accuracy of high-dimensional datasets. By introducing a weight factor, the correlation between the maximum weight and the minimum redundancy in the evaluation criterion can be adjusted to develop an improved maximum weight minimum redundancy method. In this study, the proposed KPLS feature selection method considers the redundancy between the features and the feature weighting between any feature and a class label in different datasets. Moreover, the feature selection method proposed in this study has been tested regarding its classification accuracy on data containing noise and several datasets. The experimental findings achieved using different datasets explore the feasibility and effectiveness of the proposed method which can select an optimal feature subset and obtain great classification performance based on three different metrics when compared with other feature selection methods
    • …
    corecore