25 research outputs found

    A Bonferroni Mean Based Fuzzy K Nearest Centroid Neighbor Classifier

    Get PDF
    K-nearest neighbor (KNN) is an effective nonparametric classifier that determines the neighbors of a point based only on distance proximity. The classification performance of KNN is disadvantaged by the presence of outliers in small sample size datasets and its performance deteriorates on datasets with class imbalance. We propose a local Bonferroni Mean based Fuzzy K-Nearest Centroid Neighbor (BM-FKNCN) classifier that assigns class label of a query sample dependent on the nearest local centroid mean vector to better represent the underlying statistic of the dataset. The proposed classifier is robust towards outliers because the Nearest Centroid Neighborhood (NCN) concept also considers spatial distribution and symmetrical placement of the neighbors. Also, the proposed classifier can overcome class domination of its neighbors in datasets with class imbalance because it averages all the centroid vectors from each class to adequately interpret the distribution of the classes. The BM-FKNCN classifier is tested on datasets from the Knowledge Extraction based on Evolutionary Learning (KEEL) repository and benchmarked with classification results from the KNN, Fuzzy-KNN (FKNN), BM-FKNN and FKNCN classifiers. The experimental results show that the BM-FKNCN achieves the highest overall average classification accuracy of 89.86% compared to the other four classifiers

    Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces

    Get PDF
    In many fields, e.g., data mining and machine learning, distance-based outlier detection (DOD) is widely employed to remove noises and find abnormal phenomena, because DOD is unsupervised, can be employed in any metric spaces, and does not have any assumptions of data distributions. Nowadays, data mining and machine learning applications face the challenge of dealing with large datasets, which requires efficient DOD algorithms. We address the DOD problem with two different definitions. Our new idea, which solves the problems, is to exploit an in-memory proximity graph. For each problem, we propose a new algorithm that exploits a proximity graph and analyze an appropriate type of proximity graph for the algorithm. Our empirical study using real datasets confirms that our DOD algorithms are significantly faster than state-of-the-art ones.Amagata D., Onizuka M., Hara T.. Fast, exact, and parallel-friendly outlier detection algorithms with proximity graph in metric spaces. VLDB Journal 31, 797 (2022); https://doi.org/10.1007/s00778-022-00729-1

    A Dense Network Model for Outlier Prediction Using Learning Approaches

    Get PDF
    There are various sub-categories in outlier prediction and the investigators show less attention to related domains like outliers in audio recognition, video recognition, music recognition, etc. However, this research is specific to medical data analysis. It specifically concentrates on predicting the outliers from the medical database. Here, feature mapping and representation are achieved by adopting stacked LSTM-based CNN. The extracted features are fed as an input to the Linear Support Vector Machine () is used for classification purposes. Based on the analysis, it is known that there is a strong correlation between the features related to an individual's emotions. It can be analyzed in both a static and dynamic manner. Adopting both learning approaches is done to boost the drawbacks of one another. The statistical analysis is done with MATLAB 2016a environment where metrics like ROC, MCC, AUC, correlation co-efficiency, and prediction accuracy are evaluated and compared to existing approaches like standard CNN, standard SVM, logistic regression, multi-layer perceptrons, and so on. The anticipated learning model shows superior outcomes, and more concentration is provided to select an emotion recognition dataset connected with all the sub-domains
    corecore