9 research outputs found

    Como minimizar la tasa de error en la clasificación de los préstamos: el caso peer to peer lending

    Get PDF
    En este trabajo se va a analizar la posibilidad de minimizar la tasa de error en la clasificación de los préstamos sociales, también denominados entre iguales, alternativa online de financiación sin intermediación financiera tradicional y que en los últimos años está obteniendo una relevancia considerable. El procedimiento utilizado consiste en la utilización de varios algoritmos que seleccionan aquellas variables consideradas significativas para minimizar el error en la clasificación dada una muestra de entrenamiento. Los resultados sin embargo son pocos coherentes, ya que muestran que minimizamos el error con una única variable significativa que en la práctica no tendría sentido. El trabajo se estructura de la siguiente manera: en el epígrafe 1 se presenta la literatura sobre los préstamos entre iguales y en la sección 2 se muestra la metodología y los datos utilizados. En la sección 3 se presentan los resultados y en el epígrafe final las conclusiones

    Comparação de técnicas de machine learning para predição de default e aplicação da heurística VNS para seleção de variáveis

    Get PDF
    Credit scoring possui um papel fundamental para instituições financeiras no processo de análise para concessão de crédito. Nesse sentido, técnicas de machine learning têm sido utilizadas para desenvolver modelos de credit scoring, uma vez que elas buscam reconhecer padrões existentes em bases de dados contendo o histórico de tomadores de crédito, e assim podem inferir quais indivíduos terão mais propensão a cometer um calote (default). Entretanto, essas bases de dados comumente apresentam um grande número de variáveis, algumas das quais podem ser ruidosas, o que prejudica a análise. No presente trabalho, é proposta uma técnica de seleção de variáveis baseada em um conceito de vizinhança variável, chamado VNS. A aplicabilidade do método é avaliada em conjunto com sete das principais técnicas utilizadas para fazer predição de default em problemas de análise de crédito. Seu desempenho foi comparado com a seleção de variáveis obtida pelo conhecido método estatístico PCA. Os resultados indicam performance superior do VNS na maior parte dos testes aplicados, sugerindo a robustez do método.Credit scoring plays a major role for financial institutions when making credit-granting decisions. In this context, machine learning techniques have been used to develop a credit scoring model, as they seek to recognize existing patterns in databases containing the credit history of borrowers to infer potential defaulters. However, these databases often contain a large number of variables, some of which can be noisy, leading to imprecise results. In the present work, a feature selection technique is proposed based on a variable neighborhood concept, so-called VNS. The applicability of the method is assessed in conjunction with seven of the main techniques used to make default prediction in credit analysis problems. Its performance was compared to the feature selection obtained by the well-known PCA statistical method. The results indicate superior performance of the VNS in most of the applied tests, suggesting the robustness of the method

    Analisis Ensemble Support Vector Machine dan Survival Support Vector Machine pada Data Nasabah Gadai di Perusahaan Financial Technology-X

    Get PDF
    Terdapat dua kategori nasabah gadai pada perusahaan Fintech X yakni nasabah early payment dan late payment. Setiap kategori nasabah terdapat durasi pelunasan barang tanggungan. Oleh sebab itu penting bagi perusahaan untuk mendapat informasi awal terkait kondisi nasabah apakah baik atau buruk. Nasabah yang baik adalah nasabah yang semakin cepat dalam melunasi tanggungan sedangkan nasabah yang buruk merupakan nasabah yang semakin lama melunasi tanggungan. Untuk mengatasi problem tersebut terdapat dua tahap permodelan yang dilakukan. Tahap pertama adalah klasifikasi nasabah yang early payment atau late payment. Tahap kedua menganalisis survival untuk masing-masing kategori nasabah. Adapun metode yang digunakan pada tahap pertama yakni Regresi Logistik Biner, SVM dan Ensemble SVM. Sedangkan pada tahap kedua adalah Cox Proportional Hazard dan survival SVM. Untuk mendukung kesimpulan pada tahap klasifikasi, dilakukan studi simulasi dengan membangkitan beberapa skenario variabel prediktor. Hasil studi simulasi diperoleh bahwa Ensemble SVM mampu mengimbangi kinerja SVM dan regresi logistik. Akan tetapi ketika diaplikasikan pada data nasabah Fintech X, peforma metode klasifikasi yang diajukan tidak memberikan hasil yang baik. Hal tersebut disebabkan tidak adanya variabel yang benar-benar dapat mendiskriminasi kategori nasabah early payment maupun late payment. Pada tahap berikutnya, survival SVM memiliki peforma yang baik dibandingkan Cox Proportional Hazard. Survival SVM unggul pada setiap kategori nasabah. Salah satu kemungkinan survival SVM unggul karena asumsi dari Cox Proportional Hazard tidak terpenuhi. ====================================================================================================== There are two categories of pawning customers in Fintech X companies, namely early payment and late payment customers. Each category of customer there is the duration of repayment of dependent goods. Therefore it is important for the company to get initial information related to the condition of the customer whether good or bad. A good customer is a customer who is getting faster in paying off the dependents while a bad customer is a customer who is paying off the dependent longer. To overcome the problem there are two stages of modeling. The first stage is the classification of customers who are early payment or late payment. The second phase analyzes survival for each customer category. The method used in the first stage of Binary Logistic Regression, SVM and Ensemble SVM. While in the second stage is Cox Proportional Hazard and SVM survival. To support the conclusions at the classification stage, a simulation study was conducted by generating some predictor variable scenarios. The results of the simulation study found that Ensemble SVM is able to compensate for SVM performance and logistic regression. However, when applied to customer data Fintech X, the performance of the proposed classification method does not give good results. This is due to the absence of variables that can really discriminate the category of early payment customers and late payment. In the next stage, SVM survival has a better performance than Cox Proportional Hazard. SVM Survival excels in every customer category. One possible survival of SVM better because the assumption of Cox Proportional Hazard is not met

    Using machine learning technique to classify geographic areas with socioeconomic potential for broadband investment in Malaysia

    Get PDF
    The telecommunication companies (TELCO) in Malaysia commonly use the return on investment (ROI) model for techno-economic analysis to strategize their network investment plan in their intended markets. The number of subscribers and average revenue per user (ARPU) are two dominant contributions to a good ROI. Rural areas are lacking in both dominant factors and thus very often fall outside the radar of TELCO’s investment plans. The government agencies, therefore, shoulder the responsibility to provide broadband services in rural areas through the implementation of national broadband initiatives, regulated policies and funding for universal service provision. This thesis outlines a framework of machine learning technique which the TELCOs and government agencies can use to plan for broadband investments in Malaysia, especially for rural areas. The framework is implemented in four stages: data collection, machine learning, machine testing, and machine application. In this framework, a curve-fitting technique will be applied to formulate an empirical model by using prototyping data from the World Bank databank. The empirical model serves as a fitness function for a genetic algorithm (GA) to generate large virtual samples to train, validate and test the support vector machines (SVM). Real-life field data for geographic areas in Malaysia are then provided to the tested SVM to predict which areas have the socioeconomic potential for broadband investment. By using this technique as a policy tool, TELCOs and government agencies will be able to prioritize areas where broadband infrastructure can be implemented using a government-industry partnership approach. Both public and private parties can share the initial cost and collect future revenues appropriately as the socioeconomic correlation coefficient improves

    Multiple classifier systems based on directed attribute selection in credit risk assessment

    Get PDF
    Kao nastavak prethodnih istraživanja autora, ova doktorska disertacija predstavlja sljedeći korak istraživanja problema klasifikacije kreditnog rizika. Utemeljena na opservaciji ponašanja koje intuitivno primjenjuje društvo u svakodnevnom životu, ideja kombiniranja glasova stručnjaka je dobila posebnu pozornost istraživačke zajednice na području klasifikacije podataka. Sve veći fokus istraživača ali i obećavajući pronalasci na području kombinacije klasifikatora usmjerili su interes autora prema tom području.Svrha istraživanja provedenih i opisanih u ovom radu je istražiti primjenjivost sustava višestrukih klasifikatora temeljnog na odabiru atributa na problem procjene kreditnog rizika građana. U skladu sa svrhom provedeno je više istraživanja koja zajednički predstavljajujedan kompleksni pristup odabranom problemu. Glavni cilj ovog rada jest razviti brzu,robusnu tehniku za kombiniranje klasifikatora koja će na temelju upravljanog odabira atributa stvarati efikasne i kvalitetne sustave za ocjenu sposobnosti tražitelja kredita da vrati kredit navrijeme i u skladu s ugovorenim uvjetima. Povrh navedenog, nova tehnika mora biti dovoljno jednostavna za laku implementaciju i široku primjenu u istraživačkoj zajednici uključujući i istraživače koji primarno ne istražuju navedeno područje.Dva glavna elementa nove tehnike su: (1) odabir atributa kao strategija za postizanje raznolikosti odluka klasifikatora i (2) smanjivanje sustava kao način uključivanja samo bitnih klasifikatora koji doprinose kvaliteti sustava. Odabir atributa počiva na korištenju nekoliko različitih brzih tehnika koje rangiraju atribute po kvaliteti. Prilikom odabira tehnika, kako bise osigurao odabir različitih atributa, bitno je voditi računa o mjerama koje se koriste prilikom rangiranja atributa. Tako odabrani podskupovi atributa koriste se za trening klasifikatora, kojina temelju različitih ulaza produciraju različite modele. U sljedećem koraku tehnika odabiresamo one modele koji kombinirani mogu pozitivno utjecati na performanse sustava, temeljem odluka novog, u radu predloženog pohlepnog algoritma. Uključivanje smanjivanja sustava pozitivno utječe na efikasnost sustava i kvalitetu odluke.Nova tehnika je kreirana na kreditnim skupovima podataka s ciljem testiranja postavljenih hipoteza doktorske disertacije. U istraživanju se uspoređuju rezultati nove tehnike u odnosuna rezultate pojedinačnih klasifikatora koji su uključeni u konačni sustav, da bi se utvrdilaopravdanost kombiniranja klasifikatora. Povrh toga, analizirane su odluke algoritma zasmanjivanje i način odabira klasifikatora u sustav te odnos točnosti i Q statistike na treniranim sustavima. U slijedećem krugu istraživanja, rezultati tehnike su vrednovani pomoću tehnika Bagging i Boosting. Rezultati su uspoređivani pomoću četiri različite mjere performansi:točnosti, greške tipa I, greške tipa II i AUC mjere. Osim odabranih mjera uspoređena su i vremena potrebna za treniranje i test klasifikacijskih modela pomoću odabranih tehnika.Rezultati pokazuju da se korištenjem nove tehnike mogu poboljšati rezultati klasifikacijepodataka u odnosu na pojedinačne klasifikatore uključene u sustav. Dodatno, rezultati sukvalitetom usporedivi s najpopularnijim tehnikama, štoviše tri od četiri odabrane mjere pokazuju superiornost nove tehnike. U skladu s ciljem konstruiranja, nova tehnika ostvaruje najbolje rezultate na sustavima s manjim brojem članova i vremenski nije zahtjevna uusporedbi s tehnikama Bagging i Boosting. Ostvareni rezultati su obećavajući a predložena tehnika predstavlja dobru alternativu postojećim tehnikama za konstruiranje sustava višestrukih klasifikatora.Following the previous authors researches, this doctoral dissertation is the next step in creditrisk classification research. Based on observations of behavior that can be found in nature andsociety, the idea of combining experts decisions has gained significant importance inresearch community, especially in the area of data classification. Increasing focus of researchers as well as promising findings have directed authors interest to the mentioned research area.The purpose of researches, conducted and elaborated in this dissertation is to investigate the application of multiple classifier systems based on attribute selection on credit risk assessment. In accordance with the purpose, several researches have been conducted, that jointly represent a complex approach to the selected problem. The main goal of this paper isto develop fast and robust technique for combining classifiers, based on directed attribute selection, which will be able to create efficient and accurate systems for credit risk assessmentin retail. The afore mentioned technique must be sufficiently simple for easy implementationand wide application by the research community, including researchers that are not primarily focused on this field.Two key elements of the new technique are: (1) attribute selection used as strategy fortraining diverse classifiers and (2) ensemble thinning used to include only those classifiersthat contribute to overall system quality. Attribute selection in this context refers to the implementation of several different fast techniques which rank attributes by their quality. In order to ensure selection of different attributes, it is necessary to consider techniques based on different evaluation criteria for attribute ranking. Subsets of attributes, selected in suchmanner, are used in training process of classifiers, thus ensuring difference in produced models. In the next step technique selects only those models which when combined together,positively contribute to performances of ensemble. The selection is conducted using new, inthis paper proposed, greedy algorithm for ensemble thinning. Including ensemble thinning innew technique increases efficiency and quality of decisions.The new technique has been tested on credit data sets in accordance with defined research hypothesis of this doctoral dissertation. In presented research the results obtained using new technique are compared to results of individual classifiers included in the final ensemble, inorder to justify combining action. Additionally, decisions made by algorithm for ensemblethinning are analyzed as well as relationship between Q statistics and ensemble accuracy. Infollowing research, the results of the new technique are evaluated by techniques Bagging and Boosting. Results are evaluated with four different performance measures: accuracy, errortype I, error type II and AUC. Moreover, time necessary for training and testing of models aremeasured and compared in research.Results show significant improvement of classification performance compared toindividual classifiers as a direct result of the new technique. Furthermore, quality of obtained results can be compared with results of most popular techniques; moreover three out of four performance measures show superiority of the new technique. In accordance with the design,the new technique performs best on ensembles with small number of members and it is nottime consuming compared to Bagging and Boosting

    Desenvolvimento de frameworks para a modelagem do risco de crédito por meio de algoritmos de classificação

    Get PDF
    Granting credit is a vital activity in the financial industry. For the success of financial institutions, as well as the equilibrium of the credit system as a whole, it is important that credit risk management systems efficiently evaluate the probability of default of potential debtors based on their historical data. Classification algorithms are an interesting approach to this problem in the form of Credit Scoring models. Since the emergence of quantitative analytical methods with this purpose, statistical models persist as the most commonly chosen method, given their easier implementation and inherent interpretability. However, advances in Machine Learning have developed new and more complex algorithms capable of handling a bigger amount of data, often with an increase in predictive power. These new approaches, although not always readily transferable to practical applications in the financial industry, present an opportunity for the development of credit risk modeling and have piqued the interest of researchers in the field. Nonetheless, researchers seem to focus on model performance, not appropriately setting up guidelines to optimize the modeling process or considering the present regulation for model implementation. Thereby, this dissertation establishes frameworks for consumer credit risk modeling based on classification algorithms while guided by a systematic literature review on the topic. The proposed frameworks incorporate ML techniques, data preprocessing and balancing, feature selection (FS), and hyperparameter optimization (HPO). In addition to the bibliographic research, which introduces us to the main classification algorithms and appropriate modeling steps, the development of the frameworks is also based on experiments with hundreds of models for credit risk classification, using Logistic Regression (LR), Decision Trees (DT), Support Vector Machines (SVM), Random Forest (RF), as well as boosting and stacking ensembles, to efficiently guide the construction of robust and parsimonious models for credit risk analysis in consumer lending.Agência 1A concessão de crédito é uma atividade vital da indústria financeira. Para o funcionamento e sucesso das instituições financeiras, assim como a manutenção do equilíbrio do sistema creditício, a modelagem de risco de crédito tem o papel de avaliar a probabilidade de inadimplência de potenciais devedores com base em dados históricos. Algoritmos de classificação apresentam uma abordagem interessante para esta finalidade na elaboração de modelos para Credit Scoring. Desde o surgimento das metodologias analíticas e quantitativas para esta modelagem, persistem na indústria modelos estatísticos, dotados de maior interpretabilidade e fácil implementação. Contudo, com o desenvolvimento na área de Machine Learning (ML), surgiram novos algoritmos capazes de trabalhar com um maior volume de dados e com melhor performance preditiva. Estes algoritmos, apesar de nem sempre prontamente transferíveis da academia para a indústria, apresentam uma oportunidade para o desenvolvimento da modelagem do risco de crédito, tendo consequentemente despertado um interesse de pesquisadores na área. A literatura, por sua vez, se enfoca na performance dos modelos, dificilmente estabelecendo diretrizes para a otimização do processo de modelagem ou se atentando às regulamentações vigentes para a sua aplicação prática na indústria financeira. Desta forma, esta dissertação, embasada por uma revisão sistemática de literatura, propõe frameworks para a modelagem do risco de crédito incorporando o uso de técnicas de ML, pré-processamento e balanceamento de dados, feature selection (FS) e otimização de hiper-parâmetros (OHP). Além da pesquisa bibliográfica, que possibilita uma familiarização com os principais algoritmos de classificação e as etapas de modelagem apropriadas, o desenvolvimento dos frameworks também é fundamentado pela elaboraçao de centenas de modelos para classificação do risco de crédito, partindo dos algoritmos de Regressão Logística (Logistic Regression - LR), Árvores de Decisão (Decision Trees - DT), Support Vector Machines (SVM), Random Forest (RF), assim como ensembles de boosting e stacking, para direcionar de maneira eficiente a construção de modelos robustos e parcimoniosos para a análise do risco na concessão de crédito ao consumidor

    Multiple classifier systems based on directed attribute selection in credit risk assessment

    Get PDF
    Kao nastavak prethodnih istraživanja autora, ova doktorska disertacija predstavlja sljedeći korak istraživanja problema klasifikacije kreditnog rizika. Utemeljena na opservaciji ponašanja koje intuitivno primjenjuje društvo u svakodnevnom životu, ideja kombiniranja glasova stručnjaka je dobila posebnu pozornost istraživačke zajednice na području klasifikacije podataka. Sve veći fokus istraživača ali i obećavajući pronalasci na području kombinacije klasifikatora usmjerili su interes autora prema tom području.Svrha istraživanja provedenih i opisanih u ovom radu je istražiti primjenjivost sustava višestrukih klasifikatora temeljnog na odabiru atributa na problem procjene kreditnog rizika građana. U skladu sa svrhom provedeno je više istraživanja koja zajednički predstavljajujedan kompleksni pristup odabranom problemu. Glavni cilj ovog rada jest razviti brzu,robusnu tehniku za kombiniranje klasifikatora koja će na temelju upravljanog odabira atributa stvarati efikasne i kvalitetne sustave za ocjenu sposobnosti tražitelja kredita da vrati kredit navrijeme i u skladu s ugovorenim uvjetima. Povrh navedenog, nova tehnika mora biti dovoljno jednostavna za laku implementaciju i široku primjenu u istraživačkoj zajednici uključujući i istraživače koji primarno ne istražuju navedeno područje.Dva glavna elementa nove tehnike su: (1) odabir atributa kao strategija za postizanje raznolikosti odluka klasifikatora i (2) smanjivanje sustava kao način uključivanja samo bitnih klasifikatora koji doprinose kvaliteti sustava. Odabir atributa počiva na korištenju nekoliko različitih brzih tehnika koje rangiraju atribute po kvaliteti. Prilikom odabira tehnika, kako bise osigurao odabir različitih atributa, bitno je voditi računa o mjerama koje se koriste prilikom rangiranja atributa. Tako odabrani podskupovi atributa koriste se za trening klasifikatora, kojina temelju različitih ulaza produciraju različite modele. U sljedećem koraku tehnika odabiresamo one modele koji kombinirani mogu pozitivno utjecati na performanse sustava, temeljem odluka novog, u radu predloženog pohlepnog algoritma. Uključivanje smanjivanja sustava pozitivno utječe na efikasnost sustava i kvalitetu odluke.Nova tehnika je kreirana na kreditnim skupovima podataka s ciljem testiranja postavljenih hipoteza doktorske disertacije. U istraživanju se uspoređuju rezultati nove tehnike u odnosuna rezultate pojedinačnih klasifikatora koji su uključeni u konačni sustav, da bi se utvrdilaopravdanost kombiniranja klasifikatora. Povrh toga, analizirane su odluke algoritma zasmanjivanje i način odabira klasifikatora u sustav te odnos točnosti i Q statistike na treniranim sustavima. U slijedećem krugu istraživanja, rezultati tehnike su vrednovani pomoću tehnika Bagging i Boosting. Rezultati su uspoređivani pomoću četiri različite mjere performansi:točnosti, greške tipa I, greške tipa II i AUC mjere. Osim odabranih mjera uspoređena su i vremena potrebna za treniranje i test klasifikacijskih modela pomoću odabranih tehnika.Rezultati pokazuju da se korištenjem nove tehnike mogu poboljšati rezultati klasifikacijepodataka u odnosu na pojedinačne klasifikatore uključene u sustav. Dodatno, rezultati sukvalitetom usporedivi s najpopularnijim tehnikama, štoviše tri od četiri odabrane mjere pokazuju superiornost nove tehnike. U skladu s ciljem konstruiranja, nova tehnika ostvaruje najbolje rezultate na sustavima s manjim brojem članova i vremenski nije zahtjevna uusporedbi s tehnikama Bagging i Boosting. Ostvareni rezultati su obećavajući a predložena tehnika predstavlja dobru alternativu postojećim tehnikama za konstruiranje sustava višestrukih klasifikatora.Following the previous authors researches, this doctoral dissertation is the next step in creditrisk classification research. Based on observations of behavior that can be found in nature andsociety, the idea of combining experts decisions has gained significant importance inresearch community, especially in the area of data classification. Increasing focus of researchers as well as promising findings have directed authors interest to the mentioned research area.The purpose of researches, conducted and elaborated in this dissertation is to investigate the application of multiple classifier systems based on attribute selection on credit risk assessment. In accordance with the purpose, several researches have been conducted, that jointly represent a complex approach to the selected problem. The main goal of this paper isto develop fast and robust technique for combining classifiers, based on directed attribute selection, which will be able to create efficient and accurate systems for credit risk assessmentin retail. The afore mentioned technique must be sufficiently simple for easy implementationand wide application by the research community, including researchers that are not primarily focused on this field.Two key elements of the new technique are: (1) attribute selection used as strategy fortraining diverse classifiers and (2) ensemble thinning used to include only those classifiersthat contribute to overall system quality. Attribute selection in this context refers to the implementation of several different fast techniques which rank attributes by their quality. In order to ensure selection of different attributes, it is necessary to consider techniques based on different evaluation criteria for attribute ranking. Subsets of attributes, selected in suchmanner, are used in training process of classifiers, thus ensuring difference in produced models. In the next step technique selects only those models which when combined together,positively contribute to performances of ensemble. The selection is conducted using new, inthis paper proposed, greedy algorithm for ensemble thinning. Including ensemble thinning innew technique increases efficiency and quality of decisions.The new technique has been tested on credit data sets in accordance with defined research hypothesis of this doctoral dissertation. In presented research the results obtained using new technique are compared to results of individual classifiers included in the final ensemble, inorder to justify combining action. Additionally, decisions made by algorithm for ensemblethinning are analyzed as well as relationship between Q statistics and ensemble accuracy. Infollowing research, the results of the new technique are evaluated by techniques Bagging and Boosting. Results are evaluated with four different performance measures: accuracy, errortype I, error type II and AUC. Moreover, time necessary for training and testing of models aremeasured and compared in research.Results show significant improvement of classification performance compared toindividual classifiers as a direct result of the new technique. Furthermore, quality of obtained results can be compared with results of most popular techniques; moreover three out of four performance measures show superiority of the new technique. In accordance with the design,the new technique performs best on ensembles with small number of members and it is nottime consuming compared to Bagging and Boosting

    Hybrid techniques of combinatorial optimization based on genetic algorithms with application to feature selection in retail credit risk assessment

    Get PDF
    Hibridne tehnike kombinatorne optimizacije predstavljaju rastuće područje istraživanja, namijenjeno za rješavanje složenih problema kombinatorne optimizacije. U prvom dijelu ove disertacije, usredotočiti smo se na metodološku pozadinu hibridnih tehnika kombinatorne optimizacije, usmjeravajući posebnu pozornost na važne koncepte u području kombinatorne optimizacije i računske teorije složenosti, kao i na strategije hibridizacije koje su važne pri razvoju hibridnih tehnika kombinatorne optimizacije. U skladu s prikazanim odnosima među tehnikama kombinatorne optimizacije, strategijama njihova kombiniranja kao i konceptima za rješavanje problema kombinatorne optimizacije, ova disertacija kreira nove hibridne tehnike za odabir atributa i klasifikaciju pri procjeni kreditnog rizika. Disertacija naglašava važnost hibridizacije kao koncepta suradnje među metaheuristikama i drugim tehnikama za optimizaciju. Važnost takve suradnje potvrđuju rezultati koji su predstavljeni u eksperimentalnom dijelu rada, koji su dobiveni na hrvatskom i njemačkom kreditnom skupu podataka korištenjem hibridnih tehnika kombinatorne optimizacije kreiranim u ovoj disertaciji. Znanstveni doprinos disertacije: Kreirane hibridne tehnike selekcije atributa (GA-NN i HGA-NN), posebno prilagođene problemskoj domeni - temeljene na genetskim algoritmima i umjetnim neuronskim mrežama. Kreiran novi hibridni genetski algoritam uključivanjem rezultata filtarskih tehnika i a priori spoznaja u početnu populaciju genetskog algoritma. Kreiran novi operator selekcije kod genetskog algoritma, jedinstvena selekcija (engl. unique selection). Kreirani sofisticirani kreditni modeli koji omogućuju povećanje učinkovitosti alokacije kapitala.The purpose of this dissertation is to thoroughly investigate the overall data set available to the bank and to determine the extent to which these data can be a good basis for predicting the credit worthiness of the loan applicant. Such a prediction of the applicants ability should be done without seeking additional information from the client, assuming that the loan applicant is a long-time customer of the bank and that the bank has collected sufficient data on the client in its database. Banks worldwide have accumulated large amounts of data and information about their clients, their financial solvency and payment history. The issue is usually in the multitude of irrelevant data or attributes contained in the accumulated data. In this context, irrelevant attributes are a problem. Irrelevant attributes in the training data set will not lead to more accurate results of classification analysis, but will: (1) increase the cost of data collection, (2) increase the time required for learning and constructing models as well as (3) decrease the user-friendliness of the model itself. Hence, there is the need for classification data preprocessing in order to: improve the quality of the constructed model, reduce the complexity of the model and to reduce the cost of usage. In the data preprocessing, one of the most important activities is the feature selection. The ultimate objectives of the study were twofold: (1) to develop a highly efficient hybrid technique, in line with the latest scientific and technical knowledge, to select the optimal subset of features when assessing the credit worthiness of the loan applicant, and (2) to collect additional knowledge and experience about the specific advantages and disadvantages of individual techniques as well as combine these techniques to other similar problems in meaningful ways. Theoretically speaking, the selection of the optimal features subset belongs to the class of combinatorial optimization problems. Such problems are usually solved by combining: exact and heuristic algorithms or more (meta) heuristic algorithms. The newly generated algorithms, in this case hybrids, are in various ways trying to combine the advantages of two or more different types of algorithms. The paper discusses different forms of hybridization. From hybridization at a low level, where the result is one unique optimization technique which is a functionally indivisible whole, to the hybridization at a high level at which different algorithms are independent entities and their form of collaboration is cooperation. Various optimization techniques, from exact ones to heuristics, were combined with the hypothesis that the benefit Xcomes from the synergy of different techniques. It is of paramount importance to establish a dynamic balance between diversification and intensification for the quick identification of areas in the search space with high-quality solutions, without losing too much time in the search space that have already been explored or do not provide quality solutions. In addition to the potential benefits, hybrid techniques bring some unavoidable disadvantages such as: the increased complexity of technique, the need for more knowledge and effort in the design and implementation of the solution, and the narrow orientation for solving specific problems only. Here the well-known theorem "No free lunch" (Wolpert and Macready, 1996) gains prominence. It says that there is no technique that would be better than all others in all conditions. More hybrid algorithms are developed and shown in the paper. The first of these is a combination of genetic algorithms and artificial neural networks (GA-NN). The specificity of the mentioned algorithm is that it simultaneously performs the selection of the optimal subset of attributes, and accordingly to the attributes of a given set, adjusts the parameters of artificial neural networks. The second algorithm is a combination of the hybrid genetic algorithm and the artificial neural network (HGA-NN). The latter is a logical continuation of the first, and an extension of the GA-NN algorithm in terms of the preliminary restriction attributes to only those attributes that have been distinguished by fast filtering algorithms or domain experts. Also, some improvements have been made through the genetic algorithm: (1) the creation of the initial population and (2) the introduction of the incremental stage. In the third experiment, special emphasis is given to the problems related to the classification of imbalanced datasets. An overview of the main paradigm characteristics was presented that is traditionally applied to the classification of imbalanced data. Techniques for mitigating problems related to the cost-sensitive classification of class imbalanced data in combination with techniques based on genetic algorithms, GA-NN and HGA-NN, are explored. Performance is measured by a variety of measures, focusing on the relative cost of misclassification. The study was conducted on Croatian and German data sets. The results showed that the specified extension, from the cost point of view, results in the HGA-NN ROS technique which is better compared to the results presented in the literature. The results of the presented algorithms clearly indicate the potential in: solving the attributes selection problem and citizens credit risk evaluation, thereby justifying a larger effort in the design and implementation. The presented algorithms potential in evaluating citizens credit risk may be used to improve the way in which banks manage the citizens credit risk, which is the promotion of a stable and healthy banking. The need for better Xmanagement of credit risks and sophisticated credit models motivated the research presented in this paper
    corecore