52 research outputs found
Accelerated Fuzzy C-Means Clustering Based on New Affinity Filtering and Membership Scaling
Fuzzy C-Means (FCM) is a widely used clustering method. However, FCM and its
many accelerated variants have low efficiency in the mid-to-late stage of the
clustering process. In this stage, all samples are involved in the update of
their non-affinity centers, and the fuzzy membership grades of the most of
samples, whose assignment is unchanged, are still updated by calculating the
samples-centers distances. All those lead to the algorithms converging slowly.
In this paper, a new affinity filtering technique is developed to recognize a
complete set of the non-affinity centers for each sample with low computations.
Then, a new membership scaling technique is suggested to set the membership
grades between each sample and its non-affinity centers to 0 and maintain the
fuzzy membership grades for others. By integrating those two techniques, FCM
based on new affinity filtering and membership scaling (AMFCM) is proposed to
accelerate the whole convergence process of FCM. Many experimental results
performed on synthetic and real-world data sets have shown the feasibility and
efficiency of the proposed algorithm. Compared with the state-of-the-art
algorithms, AMFCM is significantly faster and more effective. For example,
AMFCM reduces the number of the iteration of FCM by 80% on average
Multi-Prototypes Convex Merging Based K-Means Clustering Algorithm
K-Means algorithm is a popular clustering method. However, it has two
limitations: 1) it gets stuck easily in spurious local minima, and 2) the
number of clusters k has to be given a priori. To solve these two issues, a
multi-prototypes convex merging based K-Means clustering algorithm (MCKM) is
presented. First, based on the structure of the spurious local minima of the
K-Means problem, a multi-prototypes sampling (MPS) is designed to select the
appropriate number of multi-prototypes for data with arbitrary shapes. A
theoretical proof is given to guarantee that the multi-prototypes selected by
MPS can achieve a constant factor approximation to the optimal cost of the
K-Means problem. Then, a merging technique, called convex merging (CM), merges
the multi-prototypes to get a better local minima without k being given a
priori. Specifically, CM can obtain the optimal merging and estimate the
correct k. By integrating these two techniques with K-Means algorithm, the
proposed MCKM is an efficient and explainable clustering algorithm for escaping
the undesirable local minima of K-Means problem without given k first.
Experimental results performed on synthetic and real-world data sets have
verified the effectiveness of the proposed algorithm
Characterization of Duck (Anas platyrhynchos) Short Tandem Repeat Variation by Population-Scale Genome Resequencing
Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2–6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck
Selection response and estimation of the genetic parameters for multidimensional measured breast meat yield related traits in a long-term breeding Pekin duck line
Objective This study was conducted to estimate the genetic parameters and breeding values of breast meat related traits of Pekin ducks. Selection response was also determined by using ultrasound breast muscle thickness (BMT) measurements in combination with bosom breadth (BB) and keel length (KL) values. Methods The traits analyzed were breast meat weight (BMW), body weight (BW), breast meat percentage (BMP) and the three parameters of breast meat (BB, KL, and BMT). These measurements were derived from studying 15,781 Pekin ducks selected from 10 generations based on breast meat weight. Genetic parameters and breeding value were estimated for the analysis of the breeding process. Results Estimated heritability of BMW and BMP were moderate (0.23 and 0.16, respectively), and heritability of BW was high (0.48). Other traits such as BB, KL, and BMT indicated moderate heritability ranging between 0.11 and 0.28. Significant phenotypic correlations of BMW with BW and BMP were discovered (p<0.05), and genetic correlations of BMW with BW and BMP were positive and high (0.83 and 0.66, respectively). It was noted that BMW had positive correlations with all the other traits. Generational average estimated breeding values of all traits increased substantially over the course of selection, which demonstrated that the ducks responded efficiently to increased breast meat yield after 10 generations of breeding. Conclusion The results indicated that duck BMW had the potential to be increased through genetic selection with positive effects on BW and BMP. The ultrasound BMT, in combination with the measurement of BB and KL, is shown to be essential and effective in the process of high breast meat yield duck breeding
Kernel Partial Least Squares Feature Selection Based on Maximum Weight Minimum Redundancy
Feature selection refers to a vital function in machine learning and data mining. The maximum weight minimum redundancy feature selection method not only considers the importance of features but also reduces the redundancy among features. However, the characteristics of various datasets are not identical, and thus the feature selection method should have different feature evaluation criteria for all datasets. Additionally, high-dimensional data analysis poses a challenge to enhancing the classification performance of the different feature selection methods. This study presents a kernel partial least squares feature selection method on the basis of the enhanced maximum weight minimum redundancy algorithm to simplify the calculation and improve the classification accuracy of high-dimensional datasets. By introducing a weight factor, the correlation between the maximum weight and the minimum redundancy in the evaluation criterion can be adjusted to develop an improved maximum weight minimum redundancy method. In this study, the proposed KPLS feature selection method considers the redundancy between the features and the feature weighting between any feature and a class label in different datasets. Moreover, the feature selection method proposed in this study has been tested regarding its classification accuracy on data containing noise and several datasets. The experimental findings achieved using different datasets explore the feasibility and effectiveness of the proposed method which can select an optimal feature subset and obtain great classification performance based on three different metrics when compared with other feature selection methods
- …