28 research outputs found

    Concept Based Labeling of Text Documents Using Support Vector Machine

    Get PDF
    Classification plays a vital role in many information management and retrieval tasks . Text classification uses labeled training data to learn the classification system and then automatically classifies the remaining text using the lear ned system. Classification follows various techniques such as text processing, feature extraction, feature vector construction and final classification. The proposed mining model consists of sentence - based concept analysis, document - based concept analysis, corpus - based concept - analysis, and concept - based similarity measure. The proposed model can efficiently find significant matching concepts between documents, according to the semantics of their sentences. The similarity between documents is calculate d bas ed on a n similarity measure. Then we analyze the term that contributes to the sentence semantics on the sentence, document, and corpus levels rather than the traditional analysis of the document only. With the extracted feature vector for each new document, Support Vector Machine (SVM) algorithm is applied for document classification. The approach enhances the text classification accuracy

    Pengkategorian Otomatis Artikel Ilmiah dalam Pangkalan Data Perpustakaan Digital Menggunakan Metode Kernel Graph

    Full text link
    Artikel ilmiah dalam pangkalan data perpustakaan digital dikelompokkan dalam kategori-kategori tertentu. Pengelompokan artikel ilmiah dalam jumlah besar yang dilakukan secara manual membutuhkan sumber daya manusia yang banyak dan waktu yang tidak singkat. Penelitian ini bertujuan untuk membantu tim pengolah bahan pustaka dalam mengelompokkan artikel ilmiah sesuai dengan kategorinya masing-masing secara otomatis. Dalam penelitian ini, pengkategorian otomatis artikel ilmiah dilakukan dengan menggunakan kernel graph yang diterapkan pada graph bipartite antara dokumen artikel ilmiah dengan kata kuncinya. Lima fungsi kernel digunakan untuk menghitung nilai matriks kernel, yaitu KEGauss, KELinear, KVGauss, KVLinear dan KRW. Matriks kernel dihitung dari proyeksi satu-moda graph bipartit, lalu digunakan sebagai masukan pengklasifikasi SVM (support vector machine) dalam menentukan kategori yang tepat. Kinerja pengkategorian otomatis dihitung dari ketepatan yang merupakan perbandingan antara jumlah artikel yang dikategorikan secara tepat dengan jumlah keseluruhan artikel dalam dataset. Penerapan metode ini dalam pangkalan data ISJD (Indonesian Scientific Journal Database) menghasilkan rata-rata ketepatan yang signifikan yaitu 87,43% untuk fungsi kernel KVGauss. Sedangkan kernel lainnya memberikan hasil berturut-turut 86,14% (KELinear), 85,86% (KEGauss), 42,23% (KVLinear dan 25,15% (KRW). Hasil ini menunjukkan bahwa penggunaan metode kernel graf efektif untuk mengelompokkan artikel ilmiah ke dalam kategori yang ditentukan dalam pangkalan data perpustakaan digital

    Study Lung Tool: A Way to Understand HRTC Lung Parenchyma

    Get PDF
    Abstract-The purpose of the described system is to aid radiologists on their daily routine in the task of analyzing HRCT lung images and to contribute to a more accurate and fast diagnosis. We developed a framework -Study Lung Toolwith the objective of gather information from radiologists, in a systematic way. Using Study Lung Tool framework, the radiologist analyzes HRCT scans, outlines regions of typical pattern and characterizes the patterns. A database of typical patterns associated with common pulmonary diseases was created. The information gathered can be a valuable teaching tool to every one that intends to understand HRCT lung parenchyma. The proposed system discriminates between normal and abnormal patterns of lung parenchyma based on statistical texture analysis extracted from HRCT lung scans. An overall accuracy of 89,2%, a sensitivity of 92,7% and a specificity of 83,6% were achieved

    A REVIEW ON MULTIPLE-FEATURE-BASED ADAPTIVE SPARSE REPRESENTATION (MFASR) AND OTHER CLASSIFICATION TYPES

    Get PDF
    A new technique Multiple-feature-based adaptive sparse representation (MFASR) has been demonstrated for Hyperspectral Images (HSI's) classification. This method involves mainly in four steps at the various stages. The spectral and spatial information reflected from the original Hyperspectral Images with four various features. A shape adaptive (SA) spatial region is obtained in each pixel region at the second step. The algorithm namely sparse representation has applied to get the coefficients of sparse for each shape adaptive region in the form of matrix with multiple features. For each test pixel, the class label is determined with the help of obtained coefficients. The performances of MFASR have much better classification results than other classifiers in the terms of quantitative and qualitative percentage of results. This MFASR will make benefit of strong correlations that are obtained from different extracted features and this make use of effective features and effective adaptive sparse representation. Thus, the very high classification performance was achieved through this MFASR technique

    Latent Variable Models with Applications to Spectral Data Analysis

    Get PDF
    Recent technological advances in automatic data acquisition have created an ever increasing need to extract meaningful information from huge amount of data. Multivariate predictive models have become important statistical tools in solving modern engineering problems. The purpose of this thesis is to develop novel predictive methods based on latent variable models and validate these methods by applying them into spectral data analysis. In this thesis, hybrid models of principal components regression (PCR) and partial least squares regression (PLS) is proposed. The basic idea of hybrid models is to develop more accurate prediction techniques by combining the merits of PCR and PLS. In the hybrid models, both principal components in PCR and latent variables in PLS are involved in the common regression process. Another major contribution of this work is to propose the robust probabilistic multivariate calibration model (RPMC) to overcome the drawback of Gaussian assumption in most latent variable models. The RPMC was designed to be robust to outliers by adopting a Student-t distribution instead of the Gaussian distribution. An efficient Expectation- Maximization algorithm was derived for parameter estimation in the RPMC. It can also be shown that some popular latent variables such as probabilistic PCA (PPCA) and supervised probabilistic PCA (SPPCA) are special cases of the RPMC. Both the predictive models developed in this thesis were assessed on the real-life spectral data datasets. The hybrid models were applied into the shaft misalignment prediction problem and the RPMC are tested on the near-infrared (NIR) dataset. For the classification problem on the NIR data, the fusion of the regularized discriminant analysis (RDA) and principal components analysis (PCA) was also proposed. The experimental results have shown the effectiveness and efficiency of the proposed methods
    corecore