183,705 research outputs found

    Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm

    Full text link
    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.Comment: 35 pages, 14 figure

    Time-Domain Data Fusion Using Weighted Evidence and Dempster–Shafer Combination Rule: Application in Object Classification

    Get PDF
    To apply data fusion in time-domain based on Dempster–Shafer (DS) combination rule, an 8-step algorithm with novel entropy function is proposed. The 8-step algorithm is applied to time-domain to achieve the sequential combination of time-domain data. Simulation results showed that this method is successful in capturing the changes (dynamic behavior) in time-domain object classification. This method also showed better anti-disturbing ability and transition property compared to other methods available in the literature. As an example, a convolution neural network (CNN) is trained to classify three different types of weeds. Precision and recall from confusion matrix of the CNN are used to update basic probability assignment (BPA) which captures the classification uncertainty. Real data of classified weeds from a single sensor is used test time-domain data fusion. The proposed method is successful in filtering noise (reduce sudden changes—smoother curves) and fusing conflicting information from the video feed. Performance of the algorithm can be adjusted between robustness and fast-response using a tuning parameter which is number of time-steps(ts)

    Direct Nonparametric Predictive Inference Classification Trees

    Get PDF
    Classification is the task of assigning a new instance to one of a set of predefined categories based on the attributes of the instance. A classification tree is one of the most commonly used techniques in the area of classification. In recent years, many statistical methodologies have been developed to make inferences using imprecise probability theory, one of which is nonparametric predictive inference (NPI). NPI has been developed for different types of data and has been successfully applied to several fields, including classification. Due to the predictive nature of NPI, it is well suited for classification, as the nature of classification is explicitly predictive as well. In this thesis, we introduce a novel classification tree algorithm which we call the Direct Nonparametric Predictive Inference (D-NPI) classification algorithm. The D-NPI algorithm is completely based on the NPI approach, and it does not use any other assumptions. As a first step for developing the D-NPI classification method, we restrict our focus to binary and multinomial data types. The D-NPI algorithm uses a new split criterion called Correct Indication (CI), which is completely based on NPI and does not use any additional concepts such as entropy. The CI reflects how informative attribute variables are, hence if the attribute variable is very informative, it gives high NPI lower and upper probabilities for CI. In addition, the CI reports the strength of the evidence that the attribute variables will indicate regarding the possible class state for future instances, based on the data. The performance of the D-NPI classification algorithm is compared against several classification algorithms from the literature, including some imprecise probability algorithms, using different evaluation measures. The experimental results indicate that the D-NPI classification algorithm performs well and tends to slightly outperform other classification algorithms. Finally, a study of the D-NPI classification tree algorithm with noisy data is presented. Noisy data are data that contain incorrect values for the attribute variables or class variable. The performance of the D-NPI classification tree algorithm with noisy data is studied and compared to other classification tree algorithms when different levels of random noise are added to the class variable or to attribute variables. The results indicate that the D-NPI classification algorithm performs well with class noise and slightly outperforms other classification algorithms, while there is no single classification algorithm that acts as the best performing algorithm with attribute noise

    Independent Component Analysis for Improved Defect Detection in Guided Wave Monitoring

    Get PDF
    Guided wave sensors are widely used in a number of industries and have found particular application in the oil and gas industry for the inspection of pipework. Traditionally this type of sensor was used for one-off inspections, but in recent years there has been a move towards permanent installation of the sensor. This has enabled highly repeatable readings of the same section of pipe, potentially allowing improvements in defect detection and classification. This paper proposes a novel approach using independent component analysis to decompose repeat guided wave signals into constituent independent components. This separates the defect from coherent noise caused by changing environmental conditions, improving detectability. This paper demonstrates independent component analysis applied to guided wave signals from a range of industrial inspection scenarios. The analysis is performed on test data from pipe loops that have been subject to multiple temperature cycles both in undamaged and damaged states. In addition to processing data from experimental damaged conditions, simulated damage signals have been added to “undamaged” experimental data, so enabling multiple different damage scenarios to be investigated. The algorithm has also been used to process guided wave signals from finite element simulations of a pipe with distributed shallow general corrosion, within which there is a patch of severe corrosion. In all these scenarios, the independent component analysis algorithm was able to extract the defect signal, rejecting coherent noise

    Data Imputation through the Identification of Local Anomalies

    Get PDF
    We introduce a comprehensive and statistical framework in a model free setting for a complete treatment of localized data corruptions due to severe noise sources, e.g., an occluder in the case of a visual recording. Within this framework, we propose i) a novel algorithm to efficiently separate, i.e., detect and localize, possible corruptions from a given suspicious data instance and ii) a Maximum A Posteriori (MAP) estimator to impute the corrupted data. As a generalization to Euclidean distance, we also propose a novel distance measure, which is based on the ranked deviations among the data attributes and empirically shown to be superior in separating the corruptions. Our algorithm first splits the suspicious instance into parts through a binary partitioning tree in the space of data attributes and iteratively tests those parts to detect local anomalies using the nominal statistics extracted from an uncorrupted (clean) reference data set. Once each part is labeled as anomalous vs normal, the corresponding binary patterns over this tree that characterize corruptions are identified and the affected attributes are imputed. Under a certain conditional independency structure assumed for the binary patterns, we analytically show that the false alarm rate of the introduced algorithm in detecting the corruptions is independent of the data and can be directly set without any parameter tuning. The proposed framework is tested over several well-known machine learning data sets with synthetically generated corruptions; and experimentally shown to produce remarkable improvements in terms of classification purposes with strong corruption separation capabilities. Our experiments also indicate that the proposed algorithms outperform the typical approaches and are robust to varying training phase conditions

    A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering

    Get PDF
    Imbalanced data classification is a major challenge in the field of data mining and machine learning, and oversampling algorithms are a widespread technique for re-sampling imbalanced data. To address the problems that existing oversampling methods tend to introduce noise points and generate overlapping instances, in this paper, we propose a novel oversampling method based on density peaks clustering. Firstly, density peaks clustering algorithm is used to cluster minority instances while screening outlier points. Secondly, sampling weights are assigned according to the size of clustered sub-clusters, and new instances are synthesized by interpolating between cluster cores and other instances of the same sub-cluster. Finally, comparative experiments are conducted on both the artificial data and KEEL datasets. The experiments validate the feasibility and effectiveness of the algorithm and improve the classification accuracy of the imbalanced data
    • …
    corecore