Search CORE

4 research outputs found

Multiple classifier systems based on directed attribute selection in credit risk assessment

Author: Oreški Goran
Publication venue: University of Zagreb. Faculty of Organization and Informatics Varaždin.
Publication date: 01/06/2016
Field of study

Kao nastavak prethodnih istraživanja autora, ova doktorska disertacija predstavlja sljedeći korak istraživanja problema klasifikacije kreditnog rizika. Utemeljena na opservaciji ponašanja koje intuitivno primjenjuje društvo u svakodnevnom životu, ideja kombiniranja glasova stručnjaka je dobila posebnu pozornost istraživačke zajednice na području klasifikacije podataka. Sve veći fokus istraživača ali i obećavajući pronalasci na području kombinacije klasifikatora usmjerili su interes autora prema tom području.Svrha istraživanja provedenih i opisanih u ovom radu je istražiti primjenjivost sustava višestrukih klasifikatora temeljnog na odabiru atributa na problem procjene kreditnog rizika građana. U skladu sa svrhom provedeno je više istraživanja koja zajednički predstavljajujedan kompleksni pristup odabranom problemu. Glavni cilj ovog rada jest razviti brzu,robusnu tehniku za kombiniranje klasifikatora koja će na temelju upravljanog odabira atributa stvarati efikasne i kvalitetne sustave za ocjenu sposobnosti tražitelja kredita da vrati kredit navrijeme i u skladu s ugovorenim uvjetima. Povrh navedenog, nova tehnika mora biti dovoljno jednostavna za laku implementaciju i široku primjenu u istraživačkoj zajednici uključujući i istraživače koji primarno ne istražuju navedeno područje.Dva glavna elementa nove tehnike su: (1) odabir atributa kao strategija za postizanje raznolikosti odluka klasifikatora i (2) smanjivanje sustava kao način uključivanja samo bitnih klasifikatora koji doprinose kvaliteti sustava. Odabir atributa počiva na korištenju nekoliko različitih brzih tehnika koje rangiraju atribute po kvaliteti. Prilikom odabira tehnika, kako bise osigurao odabir različitih atributa, bitno je voditi računa o mjerama koje se koriste prilikom rangiranja atributa. Tako odabrani podskupovi atributa koriste se za trening klasifikatora, kojina temelju različitih ulaza produciraju različite modele. U sljedećem koraku tehnika odabiresamo one modele koji kombinirani mogu pozitivno utjecati na performanse sustava, temeljem odluka novog, u radu predloženog pohlepnog algoritma. Uključivanje smanjivanja sustava pozitivno utječe na efikasnost sustava i kvalitetu odluke.Nova tehnika je kreirana na kreditnim skupovima podataka s ciljem testiranja postavljenih hipoteza doktorske disertacije. U istraživanju se uspoređuju rezultati nove tehnike u odnosuna rezultate pojedinačnih klasifikatora koji su uključeni u konačni sustav, da bi se utvrdilaopravdanost kombiniranja klasifikatora. Povrh toga, analizirane su odluke algoritma zasmanjivanje i način odabira klasifikatora u sustav te odnos točnosti i Q statistike na treniranim sustavima. U slijedećem krugu istraživanja, rezultati tehnike su vrednovani pomoću tehnika Bagging i Boosting. Rezultati su uspoređivani pomoću četiri različite mjere performansi:točnosti, greške tipa I, greške tipa II i AUC mjere. Osim odabranih mjera uspoređena su i vremena potrebna za treniranje i test klasifikacijskih modela pomoću odabranih tehnika.Rezultati pokazuju da se korištenjem nove tehnike mogu poboljšati rezultati klasifikacijepodataka u odnosu na pojedinačne klasifikatore uključene u sustav. Dodatno, rezultati sukvalitetom usporedivi s najpopularnijim tehnikama, štoviše tri od četiri odabrane mjere pokazuju superiornost nove tehnike. U skladu s ciljem konstruiranja, nova tehnika ostvaruje najbolje rezultate na sustavima s manjim brojem članova i vremenski nije zahtjevna uusporedbi s tehnikama Bagging i Boosting. Ostvareni rezultati su obećavajući a predložena tehnika predstavlja dobru alternativu postojećim tehnikama za konstruiranje sustava višestrukih klasifikatora.Following the previous authors researches, this doctoral dissertation is the next step in creditrisk classification research. Based on observations of behavior that can be found in nature andsociety, the idea of combining experts decisions has gained significant importance inresearch community, especially in the area of data classification. Increasing focus of researchers as well as promising findings have directed authors interest to the mentioned research area.The purpose of researches, conducted and elaborated in this dissertation is to investigate the application of multiple classifier systems based on attribute selection on credit risk assessment. In accordance with the purpose, several researches have been conducted, that jointly represent a complex approach to the selected problem. The main goal of this paper isto develop fast and robust technique for combining classifiers, based on directed attribute selection, which will be able to create efficient and accurate systems for credit risk assessmentin retail. The afore mentioned technique must be sufficiently simple for easy implementationand wide application by the research community, including researchers that are not primarily focused on this field.Two key elements of the new technique are: (1) attribute selection used as strategy fortraining diverse classifiers and (2) ensemble thinning used to include only those classifiersthat contribute to overall system quality. Attribute selection in this context refers to the implementation of several different fast techniques which rank attributes by their quality. In order to ensure selection of different attributes, it is necessary to consider techniques based on different evaluation criteria for attribute ranking. Subsets of attributes, selected in suchmanner, are used in training process of classifiers, thus ensuring difference in produced models. In the next step technique selects only those models which when combined together,positively contribute to performances of ensemble. The selection is conducted using new, inthis paper proposed, greedy algorithm for ensemble thinning. Including ensemble thinning innew technique increases efficiency and quality of decisions.The new technique has been tested on credit data sets in accordance with defined research hypothesis of this doctoral dissertation. In presented research the results obtained using new technique are compared to results of individual classifiers included in the final ensemble, inorder to justify combining action. Additionally, decisions made by algorithm for ensemblethinning are analyzed as well as relationship between Q statistics and ensemble accuracy. Infollowing research, the results of the new technique are evaluated by techniques Bagging and Boosting. Results are evaluated with four different performance measures: accuracy, errortype I, error type II and AUC. Moreover, time necessary for training and testing of models aremeasured and compared in research.Results show significant improvement of classification performance compared toindividual classifiers as a direct result of the new technique. Furthermore, quality of obtained results can be compared with results of most popular techniques; moreover three out of four performance measures show superiority of the new technique. In accordance with the design,the new technique performs best on ensembles with small number of members and it is nottime consuming compared to Bagging and Boosting

Faculty of Organization and Informatics - Digital Repository

Croatian Digital Dissertations Repository

University of Zagreb Repository

Multiple classifier systems based on directed attribute selection in credit risk assessment

Author: Oreški Goran
Publication venue: University of Zagreb. Faculty of Organization and Informatics Varaždin.
Publication date: 01/06/2016
Field of study

Croatian Digital Dissertations Repository

PROBABILISTIC ENSEMBLES FOR IMPROVED INFERENCE IN PROTEIN-STRUCTURE DETERMINATION ¤

Author: Ameet Soni
Jude Shavlik
Publication venue
Publication date: 01/01/2012
Field of study

Protein X-ray crystallography the most popular method for determining protein structures remains a laborious process requiring a great deal of manual crystallographer effort to interpret low-quality protein images. Automating this process is critical in creating a highthroughput protein-structure determination pipeline. Previously, our group developed ACMI, a probabilistic framework for producing protein-structure models from electron-density maps produced via X-ray crystallography. ACMI uses a Markov Random Field to model the threedimensional (3D) location of each non-hydrogen atom in a protein. Calculating the best structure in this model is intractable, so ACMI uses approximate inference methods to estimate the optimal structure. While previous results have shown ACMI to be the state-of-the-art method on this task, its approximate inference algorithm remains computationally expensive and susceptible to errors. In this work, we develop Probabilistic Ensembles in ACMI (PEA), a framework for leveraging multiple, independent runs of approximate inference to produce estimates of protein structures. Our results show statistically significant improvements in the accuracy of inference resulting in more complete and accurate protein structures. In addition, PEA provides a general framework for advanced approximate inference methods in complex problem domains