22 research outputs found

    Psoriasis prediction from genome-wide SNP profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the availability of large-scale genome-wide association study (GWAS) data, choosing an optimal set of SNPs for disease susceptibility prediction is a challenging task. This study aimed to use single nucleotide polymorphisms (SNPs) to predict psoriasis from searching GWAS data.</p> <p>Methods</p> <p>Totally we had 2,798 samples and 451,724 SNPs. Process for searching a set of SNPs to predict susceptibility for psoriasis consisted of two steps. The first one was to search top 1,000 SNPs with high accuracy for prediction of psoriasis from GWAS dataset. The second one was to search for an optimal SNP subset for predicting psoriasis. The sequential information bottleneck (sIB) method was compared with classical linear discriminant analysis(LDA) for classification performance.</p> <p>Results</p> <p>The best test harmonic mean of sensitivity and specificity for predicting psoriasis by sIB was 0.674(95% CI: 0.650-0.698), while only 0.520(95% CI: 0.472-0.524) was reported for predicting disease by LDA. Our results indicate that the new classifier sIB performs better than LDA in the study.</p> <p>Conclusions</p> <p>The fact that a small set of SNPs can predict disease status with average accuracy of 68% makes it possible to use SNP data for psoriasis prediction.</p

    Klasifikasi Dokumen Menggunakan Kombinasi Algoritma Principal Component Analysis dan SVM

    Get PDF
    ABSTRAK Klasifikasi dokumen teks adalah masalah yang sederhana namun sangat penting karena manfaatnya cukup besar mengingat jumlah dokumen yang ada setiap hari semakin bertambah. Namun, kebanyakan teknik klasifikasi dokumen yang ada memerlukan labeled documents dalam jumlah besar untuk melakukan tahap training dan testing. Dalam melakukan klasifikasi dokumen, pada tugas akhir ini digunakan algoritma Principal Component Analysis yang dikombinasikan dengan Support Vector Machines untuk supervised document. Principal Component Analysis merupakan suatu teknik yang dapat digunakan untuk mengekstrasi struktur dari suatu data yang berdimensi tinggi tanpa menghilangkan informasi yang signifikan pada keseluruhan data. Kemudian dibutuhkan sebuah algoritma yang dapat menghasilkan prediksi dan akurasi dari dokumen tersebut yaitu Support Vector Machines (SVM). SVM adalah metode learning machine yang bekerja atas prinsip Structural Risk Minimization (SRM) dengan tujuan menemukan hyperplane terbaik yang memisahkan dua buah class pada input space. Hyperplane pemisah terbaik antara kedua kelas dapat ditemukan dengan mengukur margin hyperplane tersebut dan mencari titik maksimalnya. Hasil dari pengujian sistem menggunakan data yang direduksi oleh Principal Component Analysis (PCA) memiliki akurasi yang sedikit lebih rendah untuk dataset tertentu dibandingkan tanpa menggunakan PCA. Data yang digunakan adalah data R8 of Reuters-21578 Text Categorization Collection Data Set. Akurasi terbaik pada penelitian ini dihasilkan dari metode SVM dengan akurasi rata-rata 98.95%, sedangkan untuk metode SVM + PCA akurasi yang diperoleh rata-rata 96.7866%. Kata kunci : Klasifikasi Dokumen, Principal Component Analysis, Support Vector Machin

    Integration of TDOA Features in Information Bottleneck Framework for Fast Speaker Diarization

    Get PDF
    In this paper we address the combination of multiple feature streams in a fast speaker diarization system for meeting recordings. Whenever Multiple Distant Microphones (MDM) are used, it is possible to estimate the Time Delay of Arrival (TDOA) for different channels. In \cite{xavi_comb}, it is shown that TDOA can be used as additional features together with conventional spectral features for improving speaker diarization. We investigate here the combination of TDOA and spectral features in a fast diarization system based on the Information Bottleneck principle. We evaluate the algorithm on the NIST RT06 diarization task. Adding TDOA features to spectral features reduces the speaker error by 3\% absolute. Results are comparable to those of conventional HMM/GMM based systems with consistent reduction in computational complexity

    COMBINATION OF AGGLOMERATIVE AND SEQUENTIAL CLUSTERING FOR SPEAKER DIARIZATION

    Get PDF
    This paper aims at investigating the use of sequential clustering for speaker diarization. Conventional diarization systems are based on parametric models and agglomerative clustering. In our previous work we proposed a non-parametric method based on the agglomerative Information Bottleneck for very fast diarization. Here we consider the combination of sequential and agglomerative clustering for avoiding local maxima of the objective function and for purification. Experiments are run on the RT06 eval data. Sequential Clustering with oracle model selection can reduce the speaker error by 10%10\% w.r.t. agglomerative clustering. When the model selection is based on Normalized Mutual Information criterion, a relative improvement of 5%5\% is obtained using a combination of agglomerative and sequential clustering

    AGGLOMERATIVE INFORMATION BOTTLENECK FOR SPEAKER DIARIZATION OF MEETINGS DATA

    Get PDF
    In this paper, we investigate the use of agglomerative Information Bottleneck (aIB) clustering for the speaker diarization task of meetings data. In contrary to the state-of-the-art diarization systems that models individual speakers with Gaussian Mixture Models, the proposed algorithm is completely non parametric . Both clustering and model selection issues of non-parametric models are addressed in this work. The proposed algorithm is evaluated on meeting data on the RT06 evaluation data set. The system is able to achieve Diarization Error Rates comparable to state-of-the-art systems at a much lower computational complexity

    Agglomerative information bottleneck for speaker diarization of meetings data

    Full text link
    corecore