4 research outputs found

    A Bayesian hierarchical model for comparing average F1 scores

    Get PDF
    In multi-class text classification, the performance (effectiveness) of a classifier is usually measured by micro-averaged and macro-averaged F1 scores. However, the scores themselves do not tell us how reliable they are in terms of forecasting the classifier's future performance on unseen data. In this paper, we propose a novel approach to explicitly modelling the uncertainty of average F1 scores through Bayesian reasoning, and demonstrate that it can provide much more comprehensive performance comparison between text classifiers than the traditional frequentist null hypothesis significance testing (NHST)

    Membangun Synonym Set untuk WordNet Bahasa Inggris Menggunakan Metode Komutatif

    Get PDF
    Synonym Set merupakan satuan terkecil pada WordNet dan harus dibaangun terlebih dahulu sehingga dapat membuat relasi kata dan gloss pada WordNet. Synonym Set merupakan himpunan yang tersusun dari satu atau lebih kata yang memiliki makna sama sehingga dapat menggantikan satu sama lain. Pada penelitian ini dibangun synonym set bahasa Inggris menggunakan metode komutatif. Metode komutatif digunakan karena memiliki sifat yang sama dengan synonym set yang dapat menggantikan satu sama lain dalam penggunaannya. Dataset yang digunakan sebanyak 50 kata bahasa Inggris. Penelitian yang dilakukan menunjukkan bahwa sistem yang dibangun dengan mengimplementasikan metode komutatif dapat menghasilkan keluaran synonym set program yang sesuai dan F1 score antara synonym set hasil program dan synonym set dari Princeton WordNet bernilai 30%

    Investigation into the Predictive Power of Artificial Neural Networks and Logistic Regression for Predicting Default in Chit Funds

    Get PDF
    This study evaluated the performance of an artificial neural network (ANN) multi-layer perceptron model and a logistic regression logitboost (LR) model to predict default in chit funds. The two types of default investigated were late payment of 30 days and late payment of 90 days. The dataset was broken up into training and validation datasets using random sampling and K folds cross validation was used on the training dataset to assess performance of the tuning parameters. The validation dataset was used to compare performance of both algorithms. Principle component analysis (PCA) was used to reduce the feature set while still explaining 95% of the variance in the data. The classes were highly imbalanced and Synthetic Minority Oversampling Technique (SMOTE) and down sampling were used to overcome the class imbalance. 16 experiments were ran, 8 for each of the two defaults. The three key metrics that were measured for these experiments were balanced accuracy, Area under the ROC curve (AUC) and F1 score. After making Bonferroni’s adjustment to the original p value statistical significance was set to 0.003 when comparing multiple experiments. In these experiments the ANN model had the best results for balanced accuracy, AUC and F1score. Statistical analysis using a paired t test showed that there was a statistically significant difference in the results between ANN and LR. The results of these experiments also showed that there was very little difference in the contribution of the top 20 features to the first 30 principal components, which were used to predict default. These features included family id, income and address. Features that had little or no contribution to the principle components included Commission, Auction Amount, and type of relation the nominee is to the chit fund member. These findings are context specific and in this case the context is chit funds from a digital chit fund operator in Indi
    corecore