4 research outputs found

    Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

    Get PDF
    In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%


    Get PDF
    Teknologi pengklasteran dokumen memiliki peran yang signifkan dalam kemajuan teknologi informasi, diantaranya mempunyai peranan penting dalam pengembangan web  di bidang akurasi kategorisasi keyword otomatis pada search engine, kategorisasi berita untuk surat kabar elektronik,  peningkatan rating situs dengan teknologi Search Engine Optimization (SEO) dan sangat memungkinkan untuk diimplementasikan dalam berbagai teknologi informasi lainnya, oleh karena  itu diperlukan penelitian untuk meningkatkan ketepatan akurasi dalam pengklasteran dokumen. Dalam penelitian ini Algoritma Latent Semantic Analysis (LSA) dapat melakukan proses reduksi kalimat dengan lebih baik dibandingkan algoritma Feature Based sehingga mendapatkan hasil akurasi proses clustering dokumen yang lebih akurat. Beberapa tahapan clustering dalam penelitian ini, yaitu preprocessing, peringkas dokumen otomatis dengan metode fitur, peringkas dokumen otomatis dengan LSA, pembobotan kata, dan algoritma clustering. Hasil penelitian menunjukkan tingkat akurasi menggunakan peringkas dokumen otomatis dengan LSA dalam proses clustering dokumen mencapai 71,04 % yang diperoleh pada tingkat peringkas dokumen otomatis dengan LSA 40% dibandingkan dengan hasil clustering tanpa peringkas dokumen otomatis yang hanya mencapai tingkat akurasi 65,97 %. Kata kunci: Text Mining, Clustering, Peringkas Dokumen Otomatis, LSA

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationThe organization of learning materials is often limited by the systems available for delivery of such material. Currently, the learning management system (LMS) is widely used to distribute course materials. These systems deliver the material in a text-based, linear way. As online education continues to expand and educators seek to increase their effectiveness by adding more effective active learning strategies, these delivery methods become a limitation. This work demonstrates the possibility of presenting course materials in a graphical way that expresses important relations and provides support for manipulating the order of those materials. The ENABLE system gathers data from an existing course, uses text analysis techniques, graph theory, graph transformation, and a user interface to create and present graphical course maps. These course maps are able to express information not currently available in the LMS. Student agents have been developed to traverse these course maps to identify the variety of possible paths through the material. The temporal relations imposed by the current course delivery methods have been replaced by prerequisite relations that express ordering that provides educational value. Reducing the connections to these more meaningful relations allows more possibilities for change. Technical methods are used to explore and calibrate linear and nonlinear models of learning. These methods are used to track mastery of learning material and identify relative difficulty values. Several probability models are developed and used to demonstrate that data from existing, temporally based courses can be used to make predictions about student success in courses using the same material but organized without the temporal limitations. Combined, these demonstrate the possibility of tools and techniques that can support the implementation of a graphical course map that allows varied paths and provides an enriched, more informative interface between the educator, the student, and the learning material. This fundamental change in how course materials are presented and interfaced with has the potential to make educational opportunities available to a broader spectrum of people with diverse abilities and circumstances. The graphical course map can be pivotal in attaining this transition

    Evaluating Clusterings by Estimating Clarity

    Get PDF
    In this thesis I examine clustering evaluation, with a subfocus on text clusterings specifically. The principal work of this thesis is the development, analysis, and testing of a new internal clustering quality measure called informativeness. I begin by reviewing clustering in general. I then review current clustering quality measures, accompanying this with an in-depth discussion of many of the important properties one needs to understand about such measures. This is followed by extensive document clustering experiments that show problems with standard clustering evaluation practices. I then develop informativeness, my new internal clustering quality measure for estimating the clarity of clusterings. I show that informativeness, which uses classification accuracy as a proxy for human assessment of clusterings, is both theoretically sensible and works empirically. I present a generalization of informativeness that leverages external clustering quality measures. I also show its use in a realistic application: email spam filtering. I show that informativeness can be used to select clusterings which lead to superior spam filters when few true labels are available. I conclude this thesis with a discussion of clustering evaluation in general, informativeness, and the directions I believe clustering evaluation research should take in the future