11 research outputs found
Does Non-linearity Matter in Retail Credit Risk Modeling?
In this research we propose a new method for retail credit risk modeling. In order to capture possible non-linear relationships between credit risk and explanatory variables, we use a learning vector quantization (LVQ) neural network. The model was estimated on a dataset from Slovenian banking sector. The proposed model outperformed the benchmarking (LOGIT) models, which represent the standard approach in banks. The results also demonstrate that the LVQ model is better able to handle the properties of categorical variables.retail banking, credit risk, logistic regression, learning vector quantization
PENGARUH POSISI DAN PENCAHAYAAN DALAM IDENTIFIKASI WAJAH
Penelitian tentang identifikasi wajah telah banyak dilakukan sebagai salah satu kebutuhan dalam sistem keamanan. Namun penelitian tersebut hanya menekankan kepada metode dengan kondisi dimana wajah dalam keadaan normal dan pencahayaan yang sama. Di dalam penelitian ini, input yang diidentifikasi berupa citra wajah yang belum diketahui, sistem selanjutnya akan memberikan output berupa identifikasi wajah yang paling sesuai dengan database yang tersedia. Penelitian ini fokus kepada peningkatan ketelitian dalam identifikasi wajah berdasar pengaruh dari citra input dengan posisi dan pencahayaan berbeda. Penelitian ini mengusulkan 4 tahapan untuk prosesidentifikasi yang meliputi pre-processing (normalisas dan deteksi tepi), transformasi data training dengan Pulse Coupled Neural Network (PCNN), dan klasifikasi menggunakan metode Learning Vector Quantization (LVQ). Dari hasil percobaan identifikasi 540 data training citra wajah terhadap 180citra wajah acuan didapatkan tingkat ketelitian mencapai 90.7%.Kata Kunci : Posisi dan pencahayaan, Pre-processing, PCNN, LVQ
Explaining Aggregates for Exploratory Analytics
Analysts wishing to explore multivariate data spaces,
typically pose queries involving selection operators, i.e., range
or radius queries, which define data subspaces of possible
interest and then use aggregation functions, the results of which
determine their exploratory analytics interests. However, such
aggregate query (AQ) results are simple scalars and as such,
convey limited information about the queried subspaces for
exploratory analysis.We address this shortcoming aiding analysts
to explore and understand data subspaces by contributing a novel
explanation mechanism coined XAXA: eXplaining Aggregates for
eXploratory Analytics. XAXA’s novel AQ explanations are represented
using functions obtained by a three-fold joint optimization
problem. Explanations assume the form of a set of parametric
piecewise-linear functions acquired through a statistical learning
model. A key feature of the proposed solution is that model
training is performed by only monitoring AQs and their answers
on-line. In XAXA, explanations for future AQs can be computed
without any database (DB) access and can be used to further
explore the queried data subspaces, without issuing any more
queries to the DB. We evaluate the explanation accuracy and
efficiency of XAXA through theoretically grounded metrics over
real-world and synthetic datasets and query workloads
Robust classification of advanced power quality disturbances in smart grids
The insertion of new devices, increased data flow, intermittent generation and massive
computerization have considerably increased current electrical systems’ complexity. This
increase resulted in necessary changes, such as the need for more intelligent electrical net works to adapt to this different reality. Artificial Intelligence (AI) plays an important role
in society, especially the techniques based on the learning process, and it is extended to the
power systems. In the context of Smart Grids (SG), where the information and innovative
solutions in monitoring is a primary concern, those techniques based on AI can present
several applications. This dissertation investigates the use of advanced signal processing
and ML algorithms to create a Robust Classifier of Advanced Power Quality (PQ) Dis turbances in SG. For this purpose, known models of PQ disturbances were generated with
random elements to approach real applications. From these models, thousands of signals
were generated with the performance of these disturbances. Signal processing techniques
using Discrete Wavelet Transform (DWT) were used to extract the signal’s main charac teristics. This research aims to use ML algorithms to classify these data according to their
respective features. ML algorithms were trained, validated, and tested. Also, the accuracy
and confusion matrix were analyzed, relating the logic behind the results. The stages of
data generation, feature extraction and optimization techniques were performed in the
MATLAB software. The Classification Learner toolbox was used for training, validation
and testing the 27 different ML algorithms and assess each performance. All stages of
the work were previously idealized, enabling their correct development and execution.
The results show that the Cubic Support Vector Machine (SVM) classifier achieved the
maximum accuracy of all algorithms, indicating the effectiveness of the proposed method
for classification. Considerations about the results were interpreted as explaining the per formance of each technique, its relations and their respective justifications.A inserção de novos dispositivos na rede, aumento do fluxo de dados, geração intermitente
e a informatização massiva aumentaram consideravelmente a complexidade dos sistemas
elétricos atuais. Esse aumento resultou em mudanças necessárias, como a necessidade de
redes elétricas mais inteligentes para se adaptarem a essa realidade diferente. A nova ger ação de técnicas de Inteligência Artificial, representada pelo "Big Data", Aprendizado de
Máquina ("Machine Learning"), Aprendizagem Profunda e Reconhecimento de Padrões
representa uma nova era na sociedade e no desenvolvimento global baseado na infor mação e no conhecimento. Com as mais recentes Redes Inteligentes, o uso de técnicas que
utilizem esse tipo de inteligência será ainda mais necessário. Esta dissertação investiga
o uso de processamento de sinais avançado e também algoritmos de Aprendizagem de
Máquina para desenvolver um classificador robusto de distúrbios de qualidade de energia
no contexto das Redes Inteligentes. Para isso, modelos já conhecidos de alguns proble mas de qualidade foram gerados junto com ruÃdos aleatórios para que o sistema fosse
similar a aplicações reais. A partir desses modelos, milhares de sinais foram gerados e a
Transformada Wavelet Discreta foi usada para extrair as principais caracterÃsticas destas
perturbações. Esta dissertação tem como objetivo utilizar algoritmos baseados no con ceito de Aprendizado de Máquina para classificar os dados gerados de acordo com suas
classes. Todos estes algoritmos foram treinados, validados e por fim, testados. Além disso,
a acurácia e a matriz de confusão de cada um dos modelos foi apresentada e analisada. As
etapas de geração de dados, extração das principais caracterÃsticas e otimização dos dados
foram realizadas no software MATLAB. Uma toolbox especÃfica deste programa foi us ada para treinar, validar e testar os 27 algoritmos diferentes e avaliar cada desempenho.
Todas as etapas do trabalho foram previamente idealizadas, possibilitando seu correto
desenvolvimento e execução. Os resultados mostram que o classificador "Cubic Support
Vector Machine" obteve a máxima precisão entre todos os algoritmos, indicando a eficácia
do método proposto para classificação. As considerações sobre os resultados foram inter pretadas, como por exemplo a explicação da performance de cada técnica, suas relações
e suas justificativas
Effects of Training Set Size on Supervised Machine-Learning Land-Cover Classification of Large-Area High-Resolution Remotely Sensed Data
The size of the training data set is a major determinant of classification accuracy. Neverthe- less, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algo- rithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project
Object-Based Supervised Machine Learning Regional-Scale Land-Cover Classification Using High Resolution Remotely Sensed Data
High spatial resolution (HR) (1m – 5m) remotely sensed data in conjunction with supervised machine learning classification are commonly used to construct land-cover classifications. Despite the increasing availability of HR data, most studies investigating HR remotely sensed data and associated classification methods employ relatively small study areas. This work therefore drew on a 2,609 km2, regional-scale study in northeastern West Virginia, USA, to investigates a number of core aspects of HR land-cover supervised classification using machine learning. Issues explored include training sample selection, cross-validation parameter tuning, the choice of machine learning algorithm, training sample set size, and feature selection. A geographic object-based image analysis (GEOBIA) approach was used. The data comprised National Agricultural Imagery Program (NAIP) orthoimagery and LIDAR-derived rasters. Stratified-statistical-based training sampling methods were found to generate higher classification accuracies than deliberative-based sampling. Subset-based sampling, in which training data is collected from a small geographic subset area within the study site, did not notably decrease the classification accuracy. For the five machine learning algorithms investigated, support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), and learning vector quantization (LVQ), increasing the size of the training set typically improved the overall accuracy of the classification. However, RF was consistently more accurate than the other four machine learning algorithms, even when trained from a relatively small training sample set. Recursive feature elimination (RFE), which can be used to reduce the dimensionality of a training set, was found to increase the overall accuracy of both SVM and NEU classification, however the improvement in overall accuracy diminished as sample size increased. RFE resulted in only a small improvement the overall accuracy of RF classification, indicating that RF is generally insensitive to the Hughes Phenomenon. Nevertheless, as feature selection is an optional step in the classification process, and can be discarded if it has a negative effect on classification accuracy, it should be investigated as part of best practice for supervised machine land-cover classification using remotely sensed data