25 research outputs found

    Efficient k-NN search on vertically decomposed data

    Get PDF
    Applications like multimedia retrieval require efficient support for similarity search on large data collections. Yet, nearest neighbor search is a difficult problem in high dimensional spaces, rendering efficient applications hard to realize: index structures degrade rapidly with increasing dimensionality, while sequential search is not an attractive solution for repositories with millions of objects. This paper approaches the problem from a different angle. A solution is sought in an unconventional storage scheme, that opens up a new range of techniques for processing k-NN queries, especially suited for high dimensional spaces. The suggested (physical) database design accommodates well a novel variant of branch-and-bound search, t

    Efficient k-NN search on vertically decomposed data

    Full text link

    Interactive retrieval of video using pre-computed shot-shot similarities

    Get PDF
    A probabilistic framework for content-based interactive video retrieval is described. The developed indexing of video fragments originates from the probability of the user's positive judgment about key-frames of video shots. Initial estimates of the probabilities are obtained from low-level feature representation. Only statistically significant estimates are picked out, the rest are replaced by an appropriate constant allowing efficient access at search time without loss of search quality and leading to improvement in most experiments. With time, these probability estimates are updated from the relevance judgment of users performing searches, resulting in further substantial increases in mean average precision

    Penerapan Gini Index dan K-nearest Neighbor untuk Klasifikasi Tingkat Kognitif Soal pada Taksonomi Bloom

    Get PDF
    Sebagai pedoman dalam merancang ujian yang layak, yang terdiri dari soal-soal yang memiliki berbagai tingkatan secara kognitif, Taksonomi Bloom telah diterapkan secara luas. Saat ini, kalangan pendidik mengidentifikasi tingkat kognitif soal pada Taksonomi Bloom masih menggunakan cara manual. Hanya sedikit pendidik yang dapat mengidentifikasi tingkat kognitif dengan benar, sebagian besar melakukan kesalahan dalam mengklasifikasikan soal-soal. K-Nearest Neighbor (KNN) adalah metode yang efektif untuk klasifikasi tingkat kognitif soal pada Taksonomi Bloom, tetapi KNN memiliki kelemahan yaitu kompleksitas komputasi kemiripan datanya besar apabila dimensi fitur datanya tinggi. Untuk menyelesaikan kelemahan tersebut diperlukan metode Gini Index untuk mengurangi dimensi fitur yang tinggi. Beberapa percobaan dilakukan untuk memperoleh arsitektur yang terbaik dan menghasilkan klasifikasi yang akurat. Hasil dari 10 percobaan pada dataset Question Bank dengan KNN diperoleh akurasi tertinggi yaitu 59,97% dan kappa tertinggi yaitu 0,496. Kemudian pada KNN+Gini Index diperoleh akurasi tertinggi yaitu 66,18% dan kappa tertinggi yaitu 0,574. Berdasarkan hasil tersebut maka dapat disimpulkan bahwa Gini Index mampu mengurangi dimensi fitur yang tinggi, sehingga meningkatkan kinerja KNN dan meningkatkan tingkat akurasi klasifikasi tingkat kognitif soal pada Taksonomi Bloom

    IMPLEMENTATION OF GAIN RATIO AND K-NEAREST NEIGHBOR FOR CLASSIFICATION OF STUDENT PERFORMANCE

    Get PDF
    Predicting student performance is very useful in analyzing weak students and providing support to students who face difficulties. However, the work done by educators has not been effective enough in identifying factors that affect student performance. The main predictor factor is an informative student academic score, but that alone is not good enough in predicting student performance. Educators utilize Educational Data Mining (EDM) to predict student performance. KK-Nearest Neighbor is often used in classifying student performance because of its simplicity, but the K-Nearest Neighbor has a weakness in terms of the high dimensional features. To overcome these weaknesses, a Gain Ratio is used to reduce the high dimension of features. The experiment has been carried out 10 times with the value of k is 1 to 10 using the student performance dataset. The results of these experiments are obtained an average accuracy of 74.068 with the K-Nearest Neighbor and obtained an average accuracy of 75.105 with the Gain Ratio and K-Nearest Neighbor. The experimental results show that Gain Ratio is able to reduce the high dimensions of features that are a weakness of K-Nearest Neighbor, so the implementation of Gain Ratio and K-Nearest Neighbor can increase the accuracy of the classification of student performance compared to using the K-Nearest Neighbor alone

    Distributed top-k aggregation queries at large

    Get PDF
    Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments. The optimizations can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address three degrees of freedom: 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, 2) computing data-adaptive scan depths for different input sources, and 3) data-adaptive sampling of a small subset of input sources in scenarios with hundreds or thousands of query-relevant network nodes. All optimizations are based on a statistical cost model that utilizes local synopses, e.g., in the form of histograms, efficiently computed convolutions, and estimators based on order statistics. The paper presents comprehensive experiments, with three different real-life datasets and using the ns-2 network simulator for a packet-level simulation of a large Internet-style network

    IMPLEMENTATION OF K-NEAREST NEIGHBOR AND GINI INDEX METHOD IN CLASSIFICATION OF STUDENT PERFORMANCE

    Get PDF
    Predicting student academic performance is one of the important applications in data mining in education. However, existing work is not enough to identify which factors will affect student performance. Information on academic values ​​or progress on student learning is not enough to be a factor in predicting student performance and helps students and educators to make improvements in learning and teaching. K-Nearest Neighbor is a simple method for classifying student performance, but K-Nearest Neighbor has problems in terms of high feature dimensions. To solve this problem, we need a method of selecting the Gini Index feature in reducing the high feature dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with values ​​of k (1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the highest average accuracy of 74.068 while the K-Nearest Neighbor and Gini Index methods showed the highest average accuracy of 76.516. From the results of these tests it can be concluded that the Gini Index is able to overcome the problem of high feature dimensions in K-Nearest Neighbor, so the application of the K-Nearest Neighbor and Gini Index can improve the accuracy of student performance classification better than using the K-Nearest Neighbor method

    PENERAPAN METODE K-NEAREST NEIGHBOR DAN INFORMATION GAIN PADA KLASIFIKASI KINERJA SISWA

    Get PDF
    Education is a very important problem in the development of a country. One way to reach the level of quality of education is to predict student academic performance. The method used is still using an ineffective way because evaluation is based solely on the educator's assessment of information on the progress of student learning. Information on the progress of student learning is not enough to form indicators in evaluating student performance and helping students and educators to make improvements in learning and teaching. K-Nearest Neighbor is an effective method for classifying student performance, but K-Nearest Neighbor has problems in terms of large vector dimensions. This study aims to predict the academic performance of students using the K-Nearest Neighbor algorithm with the Information Gain feature selection method to reduce vector dimensions. Several experiments were conducted to obtain an optimal architecture and produce accurate classifications. The results of 10 experiments with k values ​​(1 to 10) in the student performance dataset with the K-Nearest Neighbor method showed the largest average accuracy of 74.068 while the K-Nearest Neighbor and Information Gain methods obtained the highest average accuracy of 76.553. From the results of these tests it can be concluded that Information Gain can reduce vector dimensions, so that the application of K-Nearest Neighbor and Information Gain can improve the accuracy of the classification of student performance better than using the K-Nearest Neighbor method

    PENERAPAN GINI INDEX DAN K-NEAREST NEIGHBOR UNTUK KLASIFIKASI TINGKAT KOGNITIF SOAL PADA TAKSONOMI BLOOM

    Get PDF
    Sebagai pedoman dalam merancang ujian yang layak, yang terdiri dari soal-soal yang memiliki berbagai tingkatan secara kognitif, Taksonomi Bloom telah diterapkan secara luas. Saat ini, kalangan pendidik mengidentifikasi tingkat kognitif soal pada Taksonomi Bloom masih menggunakan cara manual. Hanya sedikit pendidik yang dapat mengidentifikasi tingkat kognitif dengan benar, sebagian besar melakukan kesalahan dalam mengklasifikasikan soal-soal. K-Nearest Neighbor (KNN) adalah metode yang efektif untuk klasifikasi tingkat kognitif soal pada Taksonomi Bloom, tetapi KNN memiliki kelemahan yaitu kompleksitas komputasi kemiripan datanya besar apabila dimensi fitur datanya tinggi. Untuk menyelesaikan kelemahan tersebut diperlukan metode Gini Index untuk mengurangi dimensi fitur yang tinggi. Beberapa percobaan dilakukan untuk memperoleh arsitektur yang terbaik dan menghasilkan klasifikasi yang akurat. Hasil dari 10 percobaan pada dataset Question Bank dengan KNN diperoleh akurasi tertinggi yaitu 59,97% dan kappa tertinggi yaitu 0,496. Kemudian pada KNN+Gini Index diperoleh akurasi tertinggi yaitu 66,18% dan kappa tertinggi yaitu 0,574. Berdasarkan hasil tersebut maka dapat disimpulkan bahwa Gini Index mampu mengurangi dimensi fitur yang tinggi, sehingga meningkatkan kinerja KNN dan meningkatkan tingkat akurasi klasifikasi tingkat kognitif soal pada Taksonomi Bloom
    corecore