18,168 research outputs found

    Efficient Parallel Processing of k-Nearest Neighbor Queries by Using a Centroid-based and Hierarchical Clustering Algorithm

    Get PDF
    The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes. Because of its operation, the application of this classification may be limited to problems with a certain number of instances, particularly, when run time is a consideration. However, the classification of large amounts of data has become a fundamental task in many real-world applications. It is logical to scale the k-Nearest Neighbor method to large scale datasets. This paper proposes a new k-Nearest Neighbor classification method (KNN-CCL) which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts. The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters. The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets. Finally, sets of experiments are conducted on the UCI datasets. The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance

    COMBINATION OF K-MEANS CLUSTERING AND K-NEAREST NEIGHBOR ON ECOMMERCE CUSTOMER SPENDING RATE PREDICTION

    Get PDF
    K-Nearest Neighbor is a classification method that classifies new data into specific classes based on the proximity of characteristics to k members of existing classes. K-Nearest Neighbor relies heavily on training data. In actual circumstances such as the ecommerce customer spending rate dataset, there is no class label for each data. So that to be able to obtain datatraining required additional methods need to be added before the prediction process can be done. This research attempts to use K-Means Clustering to group datasets into multiple clusters which then each cluster will be given a class label according to the centroid characteristics of those clusters. The combination of KNN and K-Means Clustering methods in customer's spending rate predictions gives a fairly good result, where the accuracy of the prediction obtained is 89.6%

    A Local Weighted Nearest Neighbor Algorithm and a Weighted and Constrained Least-Squared Method for Mixed Odor Analysis by Electronic Nose Systems

    Get PDF
    A great deal of work has been done to develop techniques for odor analysis by electronic nose systems. These analyses mostly focus on identifying a particular odor by comparing with a known odor dataset. However, in many situations, it would be more practical if each individual odorant could be determined directly. This paper proposes two methods for such odor components analysis for electronic nose systems. First, a K-nearest neighbor (KNN)-based local weighted nearest neighbor (LWNN) algorithm is proposed to determine the components of an odor. According to the component analysis, the odor training data is firstly categorized into several groups, each of which is represented by its centroid. The examined odor is then classified as the class of the nearest centroid. The distance between the examined odor and the centroid is calculated based on a weighting scheme, which captures the local structure of each predefined group. To further determine the concentration of each component, odor models are built by regressions. Then, a weighted and constrained least-squares (WCLS) method is proposed to estimate the component concentrations. Experiments were carried out to assess the effectiveness of the proposed methods. The LWNN algorithm is able to classify mixed odors with different mixing ratios, while the WCLS method can provide good estimates on component concentrations

    METODE TRANSFORMASI CAKAR AYAM UNTUK MEREDUKSI SEARCH-SPACE PADA INTRUSION DETECTION SYSTEM BERBASIS K-NEAREST NEIGHBOR

    Get PDF
    [Id] Penggunaan Intrusion Detection System (IDS) pada jaringan komputer merupakan hal yang diperlukan untuk menjaga keamanan jaringan. Beberapa IDS berbasis K-nearest neighbor (KNN) memiliki akurasi yang relatif baik namun jika data training terlalu besar, waktu yang diperlukan untuk mendeteksi serangan juga meningkat. Waktu untuk deteksi bisa ditekan dengan mereduksi search space pada data training. Namun problem reduksi search space dengan mempertahankan kualitas deteksi masih merupakan problem terbuka. Pada artikel ini diajukan suatu metode transformasi "cakar ayam" berbasis jumlah jarak data ke centroid dan jarak data ke dua sub-centroid untuk mereduksi search space pada IDS berbasis K-nearest neighbor. Localized K-nearest neighbor dilakukan pada data yang telah tertransformasi. Eksperimen menggunakan agglome-rative hierarchial clustering dengan Unweighted Pair-Group Method of Centroid pada dataset NSL-KDD 20% menunjukkan penurunan search space maksimum sebesar 38% dengan tingkat akurasi sebesar 77.5%. Tingkat akurasi dan specificity maksimum yang dicapai pada eksperimen sebesar 88% dan 88.3% dengan tingkat reduksi sebesar 12% dan tingkat sensitifity maksimum yang dicapai sebesar 80.2% pada tingkat reduksi 11%. Berdasarkan eksperimen, luas search space dapat dikurangi sambil menjaga akurasi deteksi. Rasio tradeoff antara akurasi dan search space mungkin dapat diperbaiki dengan mengganti algortima clustering dengan divisive hierarchial clustring. Abstract : clustering, deteksi intrusi, keamanan jaringan [En] Intrusion detection System (IDS) for computer network has became an essential needs to ensure network security. Some K-nearest neighbor (KNN) based IDS have a relatively good accracy in detecting attack, but the need to use all training data costs time consumption . Detection time cost can be reduced by reducing search space needed for the algorithm. The problem of search space reduction while maintaining decent accuracy still an open problem. In this Paper we propose a new transformation method "chiken claw" method. which based on sum of two distances. The first distance is the distance of data and its cluster. The later is distance of data to 2 of its cluster's sub-centroid..This method is proposed to reduce the search space on K-nearest neighbor based IDS because the search is based on resulted one dimentional transformed data. Experiment using Unweighted Pair-group Method of centroid on NSL-KDD 20% shows maximum search space reduction 38% with 75% accuracy. Maximum accuracy and sensitivity in the experiment is 88% and 88.3% respectively with space reduction 12%. Maximum sensitivity from experiment is at 80.2% with 11% space reduction. Based on experiments, search space can be reduced while maintaining accuracy. Search space-accuracy trade off might be improved by using different clustering algorithm such as divisive hierarchial clusterin

    Speaker Normalization Methods for Vowel Cognition: Comparative Analysis Using Neural Network and Nearest Neighbor Classifiers

    Full text link
    Intrinsic and extrinsic speaker normalization methods are systematically compared using a neural network (fuzzy ARTMAP) and L1 and L2 K-Nearest Neighbor (K-NN) categorizers trained and tested on disjoint sets of speakers of the Peterson-Barney vowel database. Intrinsic methods include one nonscaled, four psychophysical scales (bark, bark with endcorrection, mel, ERB), and three log scales, each tested on four combinations of F0 , F1, F2, F3. Extrinsic methods include four speaker adaptation schemes, each combined with the 32 intrinsic methods: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). ARTMAP and KNN show similar trends, with K-NN performing better, but requiring about ten times as much memory. The optimal intrinsic normalization method is bark scale, or bark with endcorrection, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods is LT, CSi, LS, and CS, with fuzzy ARTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (AFOSR-90-0083, ONR-N00014-92-J-4015); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (F49620-92-J-0225

    Evaluation of Speaker Normalization Methods for Vowel Recognition Using Fuzzy ARTMAP and K-NN

    Full text link
    A procedure that uses fuzzy ARTMAP and K-Nearest Neighbor (K-NN) categorizers to evaluate intrinsic and extrinsic speaker normalization methods is described. Each classifier is trained on preprocessed, or normalized, vowel tokens from about 30% of the speakers of the Peterson-Barney database, then tested on data from the remaining speakers. Intrinsic normalization methods included one nonscaled, four psychophysical scales (bark, bark with end-correction, mel, ERB), and three log scales, each tested on four different combinations of the fundamental (Fo) and the formants (F1 , F2, F3). For each scale and frequency combination, four extrinsic speaker adaptation schemes were tested: centroid subtraction across all frequencies (CS), centroid subtraction for each frequency (CSi), linear scale (LS), and linear transformation (LT). A total of 32 intrinsic and 128 extrinsic methods were thus compared. Fuzzy ARTMAP and K-NN showed similar trends, with K-NN performing somewhat better and fuzzy ARTMAP requiring about 1/10 as much memory. The optimal intrinsic normalization method was bark scale, or bark with end-correction, using the differences between all frequencies (Diff All). The order of performance for the extrinsic methods was LT, CSi, LS, and CS, with fuzzy AHTMAP performing best using bark scale with Diff All; and K-NN choosing psychophysical measures for all except CSi.British Petroleum (89-A-1204); Defense Advanced Research Projects Agency (AFOSR-90-0083, ONR-N00014-92-J-4015); National Science Foundation (IRI-90-00530); Office of Naval Research (N00014-91-J-4100); Air Force Office of Scientific Research (F49620-92-J-0225

    Feature Selection Using Genetic Algorithms for the Generation of a Recognition and Classification of Children Activities Model Using Environmental Sound

    Get PDF
    In the area of recognition and classification of children activities, numerous works have been proposed that make use of different data sources. In most of them, sensors embedded in children’s garments are used. In this work, the use of environmental sound data is proposed to generate a recognition and classification of children activities model through automatic learning techniques, optimized for application on mobile devices. Initially, the use of a genetic algorithm for a feature selection is presented, reducing the original size of the dataset used, an important aspect when working with the limited resources of a mobile device. For the evaluation of this process, five different classification methods are applied, k-nearest neighbor (k-NN), nearest centroid (NC), artificial neural networks (ANNs), random forest (RF), and recursive partitioning trees (Rpart). Finally, a comparison of the models obtained, based on the accuracy, is performed, in order to identify the classification method that presents the best performance in the development of a model that allows the identification of children activity based on audio signals. According to the results, the best performance is presented by the five-feature model developed through RF, obtaining an accuracy of 0.92, which allows to conclude that it is possible to automatically classify children activity based on a reduced set of features with significant accuracy.In the area of recognition and classification of children activities, numerous works have been proposed that make use of different data sources. In most of them, sensors embedded in children’s garments are used. In this work, the use of environmental sound data is proposed to generate a recognition and classification of children activities model through automatic learning techniques, optimized for application on mobile devices. Initially, the use of a genetic algorithm for a feature selection is presented, reducing the original size of the dataset used, an important aspect when working with the limited resources of a mobile device. For the evaluation of this process, five different classification methods are applied, k-nearest neighbor (k-NN), nearest centroid (NC), artificial neural networks (ANNs), random forest (RF), and recursive partitioning trees (Rpart). Finally, a comparison of the models obtained, based on the accuracy, is performed, in order to identify the classification method that presents the best performance in the development of a model that allows the identification of children activity based on audio signals. According to the results, the best performance is presented by the five-feature model developed through RF, obtaining an accuracy of 0.92, which allows to conclude that it is possible to automatically classify children activity based on a reduced set of features with significant accuracy

    FAULT DETECTION AND PROGNOSTICS OF INSULATED GATE BIPOLAR TRANSISTOR (IGBT) USING A K-NEAREST NEIGHBOR CLASSIFICATION ALGORITHM

    Get PDF
    Insulated Gate Bipolar Transistor (IGBT) is a power semiconductor device commonly used in medium to high power applications from household appliances, automotive, and renewable energy. Health assessment of IGBT under field use is of interest due to costly system downtime that may be associated with IGBT failures. Conventional reliability approaches were shown by experimental data to suffer from large uncertainties when predicting IGBT lifetimes, partly due to their inability to adapt to varying loading conditions and part-to-part differences. This study developed a data-driven prognostic method to individually assess IGBT health based on operating data obtained from run-to-failure experiments. IGBT health was classified into healthy and faulty using a K-Nearest Neighbor Centroid Distance classification algorithm. A feature weight optimization method was developed to determine the influence of each feature toward classifying IGBT's health states
    • …
    corecore