11 research outputs found

    Does Non-linearity Matter in Retail Credit Risk Modeling?

    Get PDF
    In this research we propose a new method for retail credit risk modeling. In order to capture possible non-linear relationships between credit risk and explanatory variables, we use a learning vector quantization (LVQ) neural network. The model was estimated on a dataset from Slovenian banking sector. The proposed model outperformed the benchmarking (LOGIT) models, which represent the standard approach in banks. The results also demonstrate that the LVQ model is better able to handle the properties of categorical variables.retail banking, credit risk, logistic regression, learning vector quantization

    PENGARUH POSISI DAN PENCAHAYAAN DALAM IDENTIFIKASI WAJAH

    Get PDF
    Penelitian tentang identifikasi wajah telah banyak dilakukan sebagai salah satu kebutuhan dalam sistem keamanan. Namun penelitian tersebut hanya menekankan kepada metode dengan kondisi dimana wajah dalam keadaan normal dan pencahayaan yang sama. Di dalam penelitian ini, input yang diidentifikasi berupa citra wajah yang belum diketahui, sistem selanjutnya akan memberikan output berupa identifikasi wajah yang paling sesuai dengan database yang tersedia. Penelitian ini fokus kepada peningkatan ketelitian dalam identifikasi wajah berdasar pengaruh dari citra input dengan posisi dan pencahayaan berbeda. Penelitian ini mengusulkan 4 tahapan untuk prosesidentifikasi yang meliputi pre-processing (normalisas dan deteksi tepi), transformasi data training dengan Pulse Coupled Neural Network (PCNN), dan klasifikasi menggunakan metode Learning Vector Quantization (LVQ). Dari hasil percobaan identifikasi 540 data training citra wajah terhadap 180citra wajah acuan didapatkan tingkat ketelitian mencapai 90.7%.Kata Kunci : Posisi dan pencahayaan, Pre-processing, PCNN, LVQ

    Explaining Aggregates for Exploratory Analytics

    Get PDF
    Analysts wishing to explore multivariate data spaces, typically pose queries involving selection operators, i.e., range or radius queries, which define data subspaces of possible interest and then use aggregation functions, the results of which determine their exploratory analytics interests. However, such aggregate query (AQ) results are simple scalars and as such, convey limited information about the queried subspaces for exploratory analysis.We address this shortcoming aiding analysts to explore and understand data subspaces by contributing a novel explanation mechanism coined XAXA: eXplaining Aggregates for eXploratory Analytics. XAXA’s novel AQ explanations are represented using functions obtained by a three-fold joint optimization problem. Explanations assume the form of a set of parametric piecewise-linear functions acquired through a statistical learning model. A key feature of the proposed solution is that model training is performed by only monitoring AQs and their answers on-line. In XAXA, explanations for future AQs can be computed without any database (DB) access and can be used to further explore the queried data subspaces, without issuing any more queries to the DB. We evaluate the explanation accuracy and efficiency of XAXA through theoretically grounded metrics over real-world and synthetic datasets and query workloads

    Robust classification of advanced power quality disturbances in smart grids

    Get PDF
    The insertion of new devices, increased data flow, intermittent generation and massive computerization have considerably increased current electrical systems’ complexity. This increase resulted in necessary changes, such as the need for more intelligent electrical net works to adapt to this different reality. Artificial Intelligence (AI) plays an important role in society, especially the techniques based on the learning process, and it is extended to the power systems. In the context of Smart Grids (SG), where the information and innovative solutions in monitoring is a primary concern, those techniques based on AI can present several applications. This dissertation investigates the use of advanced signal processing and ML algorithms to create a Robust Classifier of Advanced Power Quality (PQ) Dis turbances in SG. For this purpose, known models of PQ disturbances were generated with random elements to approach real applications. From these models, thousands of signals were generated with the performance of these disturbances. Signal processing techniques using Discrete Wavelet Transform (DWT) were used to extract the signal’s main charac teristics. This research aims to use ML algorithms to classify these data according to their respective features. ML algorithms were trained, validated, and tested. Also, the accuracy and confusion matrix were analyzed, relating the logic behind the results. The stages of data generation, feature extraction and optimization techniques were performed in the MATLAB software. The Classification Learner toolbox was used for training, validation and testing the 27 different ML algorithms and assess each performance. All stages of the work were previously idealized, enabling their correct development and execution. The results show that the Cubic Support Vector Machine (SVM) classifier achieved the maximum accuracy of all algorithms, indicating the effectiveness of the proposed method for classification. Considerations about the results were interpreted as explaining the per formance of each technique, its relations and their respective justifications.A inserção de novos dispositivos na rede, aumento do fluxo de dados, geração intermitente e a informatização massiva aumentaram consideravelmente a complexidade dos sistemas elétricos atuais. Esse aumento resultou em mudanças necessárias, como a necessidade de redes elétricas mais inteligentes para se adaptarem a essa realidade diferente. A nova ger ação de técnicas de Inteligência Artificial, representada pelo "Big Data", Aprendizado de Máquina ("Machine Learning"), Aprendizagem Profunda e Reconhecimento de Padrões representa uma nova era na sociedade e no desenvolvimento global baseado na infor mação e no conhecimento. Com as mais recentes Redes Inteligentes, o uso de técnicas que utilizem esse tipo de inteligência será ainda mais necessário. Esta dissertação investiga o uso de processamento de sinais avançado e também algoritmos de Aprendizagem de Máquina para desenvolver um classificador robusto de distúrbios de qualidade de energia no contexto das Redes Inteligentes. Para isso, modelos já conhecidos de alguns proble mas de qualidade foram gerados junto com ruídos aleatórios para que o sistema fosse similar a aplicações reais. A partir desses modelos, milhares de sinais foram gerados e a Transformada Wavelet Discreta foi usada para extrair as principais características destas perturbações. Esta dissertação tem como objetivo utilizar algoritmos baseados no con ceito de Aprendizado de Máquina para classificar os dados gerados de acordo com suas classes. Todos estes algoritmos foram treinados, validados e por fim, testados. Além disso, a acurácia e a matriz de confusão de cada um dos modelos foi apresentada e analisada. As etapas de geração de dados, extração das principais características e otimização dos dados foram realizadas no software MATLAB. Uma toolbox específica deste programa foi us ada para treinar, validar e testar os 27 algoritmos diferentes e avaliar cada desempenho. Todas as etapas do trabalho foram previamente idealizadas, possibilitando seu correto desenvolvimento e execução. Os resultados mostram que o classificador "Cubic Support Vector Machine" obteve a máxima precisão entre todos os algoritmos, indicando a eficácia do método proposto para classificação. As considerações sobre os resultados foram inter pretadas, como por exemplo a explicação da performance de cada técnica, suas relações e suas justificativas

    Final Report: Autonomous and Intelligent Neurofuzzy Decision Maker for Smart Drilling Systems, September 2, 1998 - March 17, 1999

    Full text link

    Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data

    Get PDF
    The size of the training data set is a major determinant of classification accuracy. Neverthe- less, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algo- rithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project

    Object-Based Supervised Machine Learning Regional-Scale Land-Cover Classification Using High Resolution Remotely Sensed Data

    Get PDF
    High spatial resolution (HR) (1m – 5m) remotely sensed data in conjunction with supervised machine learning classification are commonly used to construct land-cover classifications. Despite the increasing availability of HR data, most studies investigating HR remotely sensed data and associated classification methods employ relatively small study areas. This work therefore drew on a 2,609 km2, regional-scale study in northeastern West Virginia, USA, to investigates a number of core aspects of HR land-cover supervised classification using machine learning. Issues explored include training sample selection, cross-validation parameter tuning, the choice of machine learning algorithm, training sample set size, and feature selection. A geographic object-based image analysis (GEOBIA) approach was used. The data comprised National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters. Stratified-statistical-based training sampling methods were found to generate higher classification accuracies than deliberative-based sampling. Subset-based sampling, in which training data is collected from a small geographic subset area within the study site, did not notably decrease the classification accuracy. For the five machine learning algorithms investigated, support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), and learning vector quantization (LVQ), increasing the size of the training set typically improved the overall accuracy of the classification. However, RF was consistently more accurate than the other four machine learning algorithms, even when trained from a relatively small training sample set. Recursive feature elimination (RFE), which can be used to reduce the dimensionality of a training set, was found to increase the overall accuracy of both SVM and NEU classification, however the improvement in overall accuracy diminished as sample size increased. RFE resulted in only a small improvement the overall accuracy of RF classification, indicating that RF is generally insensitive to the Hughes Phenomenon. Nevertheless, as feature selection is an optional step in the classification process, and can be discarded if it has a negative effect on classification accuracy, it should be investigated as part of best practice for supervised machine land-cover classification using remotely sensed data
    corecore