13 research outputs found

    An Improved Algorithm for Neural Network Classification of Imbalanced Training Sets

    Get PDF
    In this paper, we analyze the reason for the slow rate of convergence of net output error when using the backpropagation algorithm to train neural networks for a two-class problems in which the numbers of exemplars for the two classes differ greatly. This occurs because the negative gradient vector computed by backpropagation for an imbalanced training set does not point initially in a downhill direction for the class with the smaller number of exemplars. Consequently, in the initial iteration, the net error for the exemplars in this class increases significantly. The subsequent rate of convergence of the net error is very low. We suggest a modified technique for calculating a direction in weight-space which is downhill for both classes. Using this algorithm, we have been able to accelerate the rate of learning for two-class classification problems by an order of magnitude

    Klasifikasi Anak Putus Sekolah Di Provinsi Jawa Timur Tahun 2012 Menggunakan Regresi Logistik Biner Dan Kohonen Learning Vector Quantization (LVQ)

    Get PDF
    Wajib belajar 9 tahun merupakan salah satu program pemerintah sebagai solusi untuk meningkatkan angka partisipasi sekolah di Indonesia. Program ini menargetkan angka partisipasi sekolah minimal 95% pada akhir tahun 2008. Salah satu masalah yang menghambat pencapaian wajib belajar 9 tahun adalah siswa yang putus sekolah. Berdasarkan data oleh Kementerian Pendidikan Nasional pada 2008, setiap tahunnya terdapat 1,5 juta remaja di Indonesia yang tidak dapat melanjutkan sekolah. Usaha untuk menyelesaikan masalah tersebut adalah dengan mengidentifikasi siswa putus sekolah, kemudian membantu mereka agar dapat bersekolah kembali serta memberi dukungan hingga mereka berhasil menyelesaikan wajib belajar 9 tahun. Penelitian ini melakukan pengelompokan (klasifikasi) anak putus sekolah untuk mengetahui sebaran dan karakteristiknya. Klasifikasi dilakukan menggunakan model regresi logistik biner dan Learning Vector Quantization (LVQ) dengan variabel prediktor antara lain jenis kelamin, status perkawinan dan status bekerja siswa, tingkat pendidikan dan jenis kelamin kepala rumah tangga, serta pengeluaran, jumlah anggota dan daerah tempat tinggal keluarga. Data merupakan hasil SUSENAS tahun 2012 di Provinsi Jawa Timur. Hasil identifikasi anak putus sekolah dengan model regresi logistik biner mendapatkan ketepatan klasifikasi sebesar 89,6% dengan variabel signifikan antara lain status bekerja dan perkawinan anak, tingkat pendidikan kepala keluarga, serta pengeluaran, jumlah anggota, dan lokasi tempat tinggal keluarga. Sedangkan identifikasi anak putus sekolah dengan jaringan LVQ menggunakan 4 node menghasilkan ketepatan klasifikasi sebesar 88,9%. ========== 9 years compulsory education is one of government program as a solution to increase enrollment ratio in Indonesia. This program targets minimum enrollment ratio of 95% by the end of 2008. One of the main problem that preventing 9 year compulsory education achievement are drop out students. Based on Ministry of National Education data in 2008, there was 1,5 million teens that did not continue their education each year. Attempt to solve this problem is by identifying drop out student, then assist them so that they could go back to school and assist them until they manage to finish their compulsory education. The purpose of this study is to classify drop out student to get the characteristic and distribution of drop out student. Classification is done by using binary logistic regression model and Learning Vector Quantization (LVQ) with predictor variable that is the gender, marital status and work status of student, education level and gender of head of household, also family’s expenditure, the number of member and area of residence. Data is taken from the result of SUSENAS 2012 in East Java Province. Drop out student identification with binary logistic regression model resulting in accuracy of 89,6%, using marital and work status of student, education level of head of household, also family’s expenditure, the number of member, and area of residence. While identification using Learning Vector Quantization using 4 node network produce accuracy rate of 88,9%

    An ensemble approach of dual base learners for multi-class classification problems

    Get PDF
    In this work, we formalise and evaluate an ensemble of classifiers that is designed for the resolution of multi-class problems. To achieve a good accuracy rate, the base learners are built with pairwise coupled binary and multi-class classifiers. Moreover, to reduce the computational cost of the ensemble and to improve its performance, these classifiers are trained using a specific attribute subset. This proposal offers the opportunity to capture the advantages provided by binary decomposition methods, by attribute partitioning methods, and by cooperative characteristics associated with a combination of redundant base learners. To analyse the quality of this architecture, its performance has been tested on different domains, and the results have been compared to other well-known classification methods. This experimental evaluation indicates that our model is, in most cases, as accurate as these methods, but it is much more efficient. (C) 2014 Elsevier B.V. All rights reserved.This research was supported by the Spanish MICINN under Projects TRA2010-20225-C03-01, TRA 2011-29454-C03-02, and TRA 2011-29454-C03-03

    Feature quantization for parsimonious and interpretable predictive models

    Get PDF
    For regulatory and interpretability reasons, the logistic regression is still widely used by financial institutions to learn the refunding probability of a loan from applicant's historical data. To improve prediction accuracy and interpretability, a preprocessing step quantizing both continuous and categorical data is usually performed: continuous features are discretized by assigning factor levels to intervals and, if numerous, levels of categorical features are grouped. However, a better predictive accuracy can be reached by embedding this quantization estimation step directly into the predictive estimation step itself. By doing so, the predictive loss has to be optimized on a huge and untractable discontinuous quantization set. To overcome this difficulty, we introduce a specific two-step optimization strategy: first, the optimization problem is relaxed by approximating discontinuous quan-tization functions by smooth functions; second, the resulting relaxed optimization problem is solved via a particular neural network and stochas-tic gradient descent. The strategy gives then access to good candidates for the original optimization problem after a straightforward maximum a posteriori procedure to obtain cutpoints. The good performances of this approach, which we call glmdisc, are illustrated on simulated and real data from the UCI library and Crédit Agricole Consumer Finance (a major Eu-ropean historic player in the consumer credit market). The results show that practitioners finally have an automatic all-in-one tool that answers their recurring needs of quantization for predictive tasks

    AxoNet: A Deep Learning-based Tool to Count Retinal Ganglion Cell Axons

    Get PDF
    In this work, we develop a robust, extensible tool to automatically and accurately count retinal ganglion cell axons in optic nerve (ON) tissue images from various animal models of glaucoma. We adapted deep learning to regress pixelwise axon count density estimates, which were then integrated over the image area to determine axon counts. The tool, termed AxoNet, was trained and evaluated using a dataset containing images of ON regions randomly selected from whole cross sections of both control and damaged rat ONs and manually annotated for axon count and location. This rat-trained network was then applied to a separate dataset of non-human primate (NHP) ON images. AxoNet was compared to two existing automated axon counting tools, AxonMaster and AxonJ, using both datasets. AxoNet outperformed the existing tools on both the rat and NHP ON datasets as judged by mean absolute error, R2 values when regressing automated vs. manual counts, and Bland-Altman analysis. AxoNet does not rely on hand-crafted image features for axon recognition and is robust to variations in the extent of ON tissue damage, image quality, and species of mammal. Therefore, AxoNet is not species-specific and can be extended to quantify additional ON characteristics in glaucoma and potentially other neurodegenerative diseases.Undergraduat

    Нейросетевая модель прогнозирования времени прибытия маршрутного транспорта

    Get PDF
    Обсуждаются вопросы прогнозирования работы автомобильного транспорта с помощью искусственных нейронных сетей. Изложены теоретические подходы по построению прогнозных моделей объемов перевозок, потока заявок на основе искусственных нейронных сетей. Рассматривается процедура генерирования данных для процесса "обучения" сети. Обсуждаются вопросы практической реализации предложенной модели для прогнозирования транспортных процессов.Discussed the work of road transport forecasting using artificial neural networks. Set out theoretic approaches to build predictive models of transport st ream applications based on artificial neural networks. Follow the steps for generating data for the "leaning" network. Discusses the implementation of the proposed model for predicting transport processes

    SkinCAN AI: A deep learning-based skin cancer classification and segmentation pipeline designed along with a generative model

    Get PDF
    The rarity of Melanoma skin cancer accounts for the dataset collected to be limited and highly skewed, as benign moles can easily mimic the impression of the melanoma-affected area. Such an imbalanced dataset makes training any deep learning classifier network harder by affecting the training stability. We have an intuition that synthesizing such skin lesion medical images could help solve the issue of overfitting in training networks and assist in enforcing the anonymization of actual patients. Despite multiple previous attempts, none of the models were practical for the fast-paced clinical environment. In this thesis, we propose a novel pipeline named SkinCAN AI, inspired by StyleGAN but designed explicitly considering the limitations of the skin lesion dataset and emphasizing the requirement of a faster optimized diagnostic tool that can be easily inferred and integrated into the clinical environment. Our SkinCAN AI model is equipped with its module of adaptive discriminator augmentation that enables limited target data distribution to be learned and artificial data points to be sampled, which further assist the classifier network in learning semantic features. We elucidate the novelty of our SkinCAN AI pipeline by integrating the soft attention module in the classifier network. This module yields an attention mask analyzed by DenseNet201 to focus on learning relevant semantic features from skin lesion images without using any heavy computational burden of artifact removal software. The SkinGAN model achieves an FID score of 0.622 while allowing its synthetic samples to train the DenseNet201 model with an accuracy of 0.9494, AUC of 0.938, specificity of 0.969, and sensitivity of 0.695. We provide evidence in our thesis that our proposed pipelines outperform other state-of-the-art existing networks developed for this task of early diagnosis
    corecore