87 research outputs found

    Who performs better? AVMs vs hedonic models

    Get PDF
    Purpose: In the literature there are numerous tests that compare the accuracy of automated valuation models (AVMs). These models first train themselves with price data and property characteristics, then they are tested by measuring their ability to predict prices. Most of them compare the effectiveness of traditional econometric models against the use of machine learning algorithms. Although the latter seem to offer better performance, there is not yet a complete survey of the literature to confirm the hypothesis. Design/methodology/approach: All tests comparing regression analysis and AVMs machine learning on the same data set have been identified. The scores obtained in terms of accuracy were then compared with each other. Findings: Machine learning models are more accurate than traditional regression analysis in their ability to predict value. Nevertheless, many authors point out as their limit their black box nature and their poor inferential abilities. Practical implications: AVMs machine learning offers a huge advantage for all real estate operators who know and can use them. Their use in public policy or litigation can be critical. Originality/value: According to the author, this is the first systematic review that collects all the articles produced on the subject done comparing the results obtained

    Property valuation with interpretable machine learning

    Get PDF
    Property valuation is an important task for various stakeholders, including banks, local authorities, property developers, and brokers. As a result of the characteristics of the real estate market, such as the infrequency of trades, limited supply, negotiated prices, and small submarkets with unique traits, there is no clear market value for properties. Traditionally property valuations are done by expert appraisers. Property valuation can also be done accurately with machine learning methods, but the lack of interpretability with accurate machine learning methods can limit the adoption of those methods. Interpretable machine learning methods could be a solution to this issue, but there are concerns related to the accuracy of these methods. This thesis aims to evaluate the feasibility of interpretable machine learning methods in property valuation by comparing a promising interpretable method to a more complex machine learning method that has had good results in property valuation previously. The promising interpretable method and the well-performed machine learning method are chosen based on previous literature. The two chosen methods, Extreme Gradient Boosting (XGB) and Explainable Boosting Machine (EBM) are compared in terms of prediction accuracy of properties in six big municipalities of Denmark. In addition to the accuracy comparison, the interpretability of the EBM is highlighted. The accuracy of the XGB method is better, even though there are no big differences between the two methods in individual municipalities. The interpretability of the EBM is good, as it is possible to understand, how the model makes predictions in general, and how individual predictions are made

    Statistical Data Modeling and Machine Learning with Applications

    Get PDF
    The modeling and processing of empirical data is one of the main subjects and goals of statistics. Nowadays, with the development of computer science, the extraction of useful and often hidden information and patterns from data sets of different volumes and complex data sets in warehouses has been added to these goals. New and powerful statistical techniques with machine learning (ML) and data mining paradigms have been developed. To one degree or another, all of these techniques and algorithms originate from a rigorous mathematical basis, including probability theory and mathematical statistics, operational research, mathematical analysis, numerical methods, etc. Popular ML methods, such as artificial neural networks (ANN), support vector machines (SVM), decision trees, random forest (RF), among others, have generated models that can be considered as straightforward applications of optimization theory and statistical estimation. The wide arsenal of classical statistical approaches combined with powerful ML techniques allows many challenging and practical problems to be solved. This Special Issue belongs to the section “Mathematics and Computer Science”. Its aim is to establish a brief collection of carefully selected papers presenting new and original methods, data analyses, case studies, comparative studies, and other research on the topic of statistical data modeling and ML as well as their applications. Particular attention is given, but is not limited, to theories and applications in diverse areas such as computer science, medicine, engineering, banking, education, sociology, economics, among others. The resulting palette of methods, algorithms, and applications for statistical modeling and ML presented in this Special Issue is expected to contribute to the further development of research in this area. We also believe that the new knowledge acquired here as well as the applied results are attractive and useful for young scientists, doctoral students, and researchers from various scientific specialties

    A BIM and machine learning integration framework for automated property valuation

    Get PDF
    Property valuation contributes significantly to market economic activities, while it has been continuously questioned on its low transparency, inaccuracy and inefficiency. With Big Data applications in real estate domain growing fast, computer-aided valuation systems such as AI-enhanced automated valuation models (AVMs) have the potential to address these issues. On the one hand, while the advantages of Machine Learning for property valuation have been recognized by researchers and professionals, the predictive accuracy and model interpretability of current AVMs still need to be improved. On the other hand, the benefits and opportunities of BIM for property valuation have gradually captured the attention, but little effort has been made on standard data interpretation and information exchange in property valuation process. This thesis presents a novel system that leverages a holistic data interpretation, facilitates information exchange between AEC projects and property valuation, and an improved AVM for property valuation. A BIM and Machine Learning (ML) integration framework for automated property valuation was proposed which contains an IFC extension for property valuation, an IFC-based information extraction and an automated valuation model based on genetic algorithm optimized machine learning (GA-GBR). This research contributes to managing information exchange between AEC projects and property valuation and enhancing automated valuation models. The main findings indicated the proposed BIM-ML system: (1) in terms o

    Data-driven method for enhanced corrosion assessment of reinforced concrete structures

    Get PDF
    Corrosion is a major problem affecting the durability of reinforced concrete structures. Corrosion related maintenance and repair of reinforced concrete structures cost multibillion USD per annum globally. It is often triggered by the ingression of carbon dioxide and/or chloride into the pores of concrete. Estimation of these corrosion causing factors using the conventional models results in suboptimal assessment since they are incapable of capturing the complex interaction of parameters. Hygrothermal interaction also plays a role in aggravating the corrosion of reinforcement bar and this is usually counteracted by applying surface protection systems. These systems have different degree of protection and they may even cause deterioration to the structure unintentionally. The overall objective of this dissertation is to provide a framework that enhances the assessment reliability of the corrosion controlling factors. The framework is realized through the development of data-driven carbonation depth, chloride profile and hygrothermal performance prediction models. The carbonation depth prediction model integrates neural network, decision tree, boosted and bagged ensemble decision trees. The ensemble tree based chloride profile prediction models evaluate the significance of chloride ingress controlling variables from various perspectives. The hygrothermal interaction prediction models are developed using neural networks to evaluate the status of corrosion and other unexpected deteriorations in surface-treated concrete elements. Long-term data for all models were obtained from three different field experiments. The performance comparison of the developed carbonation depth prediction model with the conventional one confirmed the prediction superiority of the data-driven model. The variable importance measure revealed that plasticizers and air contents are among the top six carbonation governing parameters out of 25. The discovered topmost chloride penetration controlling parameters representing the composition of the concrete are aggregate size distribution, amount and type of plasticizers and supplementary cementitious materials. The performance analysis of the developed hygrothermal model revealed its prediction capability with low error. The integrated exploratory data analysis technique with the hygrothermal model had identified the surfaceprotection systems that are able to protect from corrosion, chemical and frost attacks. All the developed corrosion assessment models are valid, reliable, robust and easily reproducible, which assist to define proactive maintenance plan. In addition, the determined influential parameters could help companies to produce optimized concrete mix that is able to resist carbonation and chloride penetration. Hence, the outcomes of this dissertation enable reduction of lifecycle costs

    On the predictability of U.S. stock market using machine learning and deep learning techniques

    Get PDF
    Conventional market theories are considered to be inconsistent approach in modern financial analysis. This thesis focuses mainly on the application of sophisticated machine learning and deep learning techniques in stock market statistical predictability and economic significance over the benchmark conventional efficient market hypothesis and econometric models. Five chapters and three publishable papers were proposed altogether, and each chapter is developed to solve specific identifiable problem(s). Chapter one gives the general introduction of the thesis. It presents the statement of the research problems identified in the relevant literature, the objective of the study and the significance of the study. Chapter two applies a plethora of machine learning techniques to forecast the direction of the U.S. stock market. The notable sophisticated techniques such as regularization, discriminant analysis, classification trees, Bayesian and neural networks were employed. The empirical findings revealed that the discriminant analysis classifiers, classification trees, Bayesian classifiers and penalized binary probit models demonstrate significant outperformance over the binary probit models both statistically and economically, proving significant alternatives to portfolio managers. Chapter three focuses mainly on the application of regression training (RT) techniques to forecast the U.S. equity premium. The RT models demonstrate significant evidence of equity premium predictability both statistically and economically relative to the benchmark historical average, delivering significant utility gains. Chapter four investigates the statistical predictive power and economic significance of financial stock market data by deep learning techniques. Chapter five give the summary, conclusion and present area(s) of further research. The techniques are proven to be robust both statistically and economically when forecasting the equity premium out-of-sample using recursive window method. Overall, the deep learning techniques produced the best result in this thesis. They seek to provide meaningful economic information on mean-variance portfolio investment for investors who are timing the market to earn future gains at minimal risk

    Multiple classifier systems based on directed attribute selection in credit risk assessment

    Get PDF
    Kao nastavak prethodnih istraživanja autora, ova doktorska disertacija predstavlja sljedeći korak istraživanja problema klasifikacije kreditnog rizika. Utemeljena na opservaciji ponašanja koje intuitivno primjenjuje društvo u svakodnevnom životu, ideja kombiniranja glasova stručnjaka je dobila posebnu pozornost istraživačke zajednice na području klasifikacije podataka. Sve veći fokus istraživača ali i obećavajući pronalasci na području kombinacije klasifikatora usmjerili su interes autora prema tom području.Svrha istraživanja provedenih i opisanih u ovom radu je istražiti primjenjivost sustava višestrukih klasifikatora temeljnog na odabiru atributa na problem procjene kreditnog rizika građana. U skladu sa svrhom provedeno je više istraživanja koja zajednički predstavljajujedan kompleksni pristup odabranom problemu. Glavni cilj ovog rada jest razviti brzu,robusnu tehniku za kombiniranje klasifikatora koja će na temelju upravljanog odabira atributa stvarati efikasne i kvalitetne sustave za ocjenu sposobnosti tražitelja kredita da vrati kredit navrijeme i u skladu s ugovorenim uvjetima. Povrh navedenog, nova tehnika mora biti dovoljno jednostavna za laku implementaciju i široku primjenu u istraživačkoj zajednici uključujući i istraživače koji primarno ne istražuju navedeno područje.Dva glavna elementa nove tehnike su: (1) odabir atributa kao strategija za postizanje raznolikosti odluka klasifikatora i (2) smanjivanje sustava kao način uključivanja samo bitnih klasifikatora koji doprinose kvaliteti sustava. Odabir atributa počiva na korištenju nekoliko različitih brzih tehnika koje rangiraju atribute po kvaliteti. Prilikom odabira tehnika, kako bise osigurao odabir različitih atributa, bitno je voditi računa o mjerama koje se koriste prilikom rangiranja atributa. Tako odabrani podskupovi atributa koriste se za trening klasifikatora, kojina temelju različitih ulaza produciraju različite modele. U sljedećem koraku tehnika odabiresamo one modele koji kombinirani mogu pozitivno utjecati na performanse sustava, temeljem odluka novog, u radu predloženog pohlepnog algoritma. Uključivanje smanjivanja sustava pozitivno utječe na efikasnost sustava i kvalitetu odluke.Nova tehnika je kreirana na kreditnim skupovima podataka s ciljem testiranja postavljenih hipoteza doktorske disertacije. U istraživanju se uspoređuju rezultati nove tehnike u odnosuna rezultate pojedinačnih klasifikatora koji su uključeni u konačni sustav, da bi se utvrdilaopravdanost kombiniranja klasifikatora. Povrh toga, analizirane su odluke algoritma zasmanjivanje i način odabira klasifikatora u sustav te odnos točnosti i Q statistike na treniranim sustavima. U slijedećem krugu istraživanja, rezultati tehnike su vrednovani pomoću tehnika Bagging i Boosting. Rezultati su uspoređivani pomoću četiri različite mjere performansi:točnosti, greške tipa I, greške tipa II i AUC mjere. Osim odabranih mjera uspoređena su i vremena potrebna za treniranje i test klasifikacijskih modela pomoću odabranih tehnika.Rezultati pokazuju da se korištenjem nove tehnike mogu poboljšati rezultati klasifikacijepodataka u odnosu na pojedinačne klasifikatore uključene u sustav. Dodatno, rezultati sukvalitetom usporedivi s najpopularnijim tehnikama, štoviše tri od četiri odabrane mjere pokazuju superiornost nove tehnike. U skladu s ciljem konstruiranja, nova tehnika ostvaruje najbolje rezultate na sustavima s manjim brojem članova i vremenski nije zahtjevna uusporedbi s tehnikama Bagging i Boosting. Ostvareni rezultati su obećavajući a predložena tehnika predstavlja dobru alternativu postojećim tehnikama za konstruiranje sustava višestrukih klasifikatora.Following the previous authors researches, this doctoral dissertation is the next step in creditrisk classification research. Based on observations of behavior that can be found in nature andsociety, the idea of combining experts decisions has gained significant importance inresearch community, especially in the area of data classification. Increasing focus of researchers as well as promising findings have directed authors interest to the mentioned research area.The purpose of researches, conducted and elaborated in this dissertation is to investigate the application of multiple classifier systems based on attribute selection on credit risk assessment. In accordance with the purpose, several researches have been conducted, that jointly represent a complex approach to the selected problem. The main goal of this paper isto develop fast and robust technique for combining classifiers, based on directed attribute selection, which will be able to create efficient and accurate systems for credit risk assessmentin retail. The afore mentioned technique must be sufficiently simple for easy implementationand wide application by the research community, including researchers that are not primarily focused on this field.Two key elements of the new technique are: (1) attribute selection used as strategy fortraining diverse classifiers and (2) ensemble thinning used to include only those classifiersthat contribute to overall system quality. Attribute selection in this context refers to the implementation of several different fast techniques which rank attributes by their quality. In order to ensure selection of different attributes, it is necessary to consider techniques based on different evaluation criteria for attribute ranking. Subsets of attributes, selected in suchmanner, are used in training process of classifiers, thus ensuring difference in produced models. In the next step technique selects only those models which when combined together,positively contribute to performances of ensemble. The selection is conducted using new, inthis paper proposed, greedy algorithm for ensemble thinning. Including ensemble thinning innew technique increases efficiency and quality of decisions.The new technique has been tested on credit data sets in accordance with defined research hypothesis of this doctoral dissertation. In presented research the results obtained using new technique are compared to results of individual classifiers included in the final ensemble, inorder to justify combining action. Additionally, decisions made by algorithm for ensemblethinning are analyzed as well as relationship between Q statistics and ensemble accuracy. Infollowing research, the results of the new technique are evaluated by techniques Bagging and Boosting. Results are evaluated with four different performance measures: accuracy, errortype I, error type II and AUC. Moreover, time necessary for training and testing of models aremeasured and compared in research.Results show significant improvement of classification performance compared toindividual classifiers as a direct result of the new technique. Furthermore, quality of obtained results can be compared with results of most popular techniques; moreover three out of four performance measures show superiority of the new technique. In accordance with the design,the new technique performs best on ensembles with small number of members and it is nottime consuming compared to Bagging and Boosting

    Multiple classifier systems based on directed attribute selection in credit risk assessment

    Get PDF
    Kao nastavak prethodnih istraživanja autora, ova doktorska disertacija predstavlja sljedeći korak istraživanja problema klasifikacije kreditnog rizika. Utemeljena na opservaciji ponašanja koje intuitivno primjenjuje društvo u svakodnevnom životu, ideja kombiniranja glasova stručnjaka je dobila posebnu pozornost istraživačke zajednice na području klasifikacije podataka. Sve veći fokus istraživača ali i obećavajući pronalasci na području kombinacije klasifikatora usmjerili su interes autora prema tom području.Svrha istraživanja provedenih i opisanih u ovom radu je istražiti primjenjivost sustava višestrukih klasifikatora temeljnog na odabiru atributa na problem procjene kreditnog rizika građana. U skladu sa svrhom provedeno je više istraživanja koja zajednički predstavljajujedan kompleksni pristup odabranom problemu. Glavni cilj ovog rada jest razviti brzu,robusnu tehniku za kombiniranje klasifikatora koja će na temelju upravljanog odabira atributa stvarati efikasne i kvalitetne sustave za ocjenu sposobnosti tražitelja kredita da vrati kredit navrijeme i u skladu s ugovorenim uvjetima. Povrh navedenog, nova tehnika mora biti dovoljno jednostavna za laku implementaciju i široku primjenu u istraživačkoj zajednici uključujući i istraživače koji primarno ne istražuju navedeno područje.Dva glavna elementa nove tehnike su: (1) odabir atributa kao strategija za postizanje raznolikosti odluka klasifikatora i (2) smanjivanje sustava kao način uključivanja samo bitnih klasifikatora koji doprinose kvaliteti sustava. Odabir atributa počiva na korištenju nekoliko različitih brzih tehnika koje rangiraju atribute po kvaliteti. Prilikom odabira tehnika, kako bise osigurao odabir različitih atributa, bitno je voditi računa o mjerama koje se koriste prilikom rangiranja atributa. Tako odabrani podskupovi atributa koriste se za trening klasifikatora, kojina temelju različitih ulaza produciraju različite modele. U sljedećem koraku tehnika odabiresamo one modele koji kombinirani mogu pozitivno utjecati na performanse sustava, temeljem odluka novog, u radu predloženog pohlepnog algoritma. Uključivanje smanjivanja sustava pozitivno utječe na efikasnost sustava i kvalitetu odluke.Nova tehnika je kreirana na kreditnim skupovima podataka s ciljem testiranja postavljenih hipoteza doktorske disertacije. U istraživanju se uspoređuju rezultati nove tehnike u odnosuna rezultate pojedinačnih klasifikatora koji su uključeni u konačni sustav, da bi se utvrdilaopravdanost kombiniranja klasifikatora. Povrh toga, analizirane su odluke algoritma zasmanjivanje i način odabira klasifikatora u sustav te odnos točnosti i Q statistike na treniranim sustavima. U slijedećem krugu istraživanja, rezultati tehnike su vrednovani pomoću tehnika Bagging i Boosting. Rezultati su uspoređivani pomoću četiri različite mjere performansi:točnosti, greške tipa I, greške tipa II i AUC mjere. Osim odabranih mjera uspoređena su i vremena potrebna za treniranje i test klasifikacijskih modela pomoću odabranih tehnika.Rezultati pokazuju da se korištenjem nove tehnike mogu poboljšati rezultati klasifikacijepodataka u odnosu na pojedinačne klasifikatore uključene u sustav. Dodatno, rezultati sukvalitetom usporedivi s najpopularnijim tehnikama, štoviše tri od četiri odabrane mjere pokazuju superiornost nove tehnike. U skladu s ciljem konstruiranja, nova tehnika ostvaruje najbolje rezultate na sustavima s manjim brojem članova i vremenski nije zahtjevna uusporedbi s tehnikama Bagging i Boosting. Ostvareni rezultati su obećavajući a predložena tehnika predstavlja dobru alternativu postojećim tehnikama za konstruiranje sustava višestrukih klasifikatora.Following the previous authors researches, this doctoral dissertation is the next step in creditrisk classification research. Based on observations of behavior that can be found in nature andsociety, the idea of combining experts decisions has gained significant importance inresearch community, especially in the area of data classification. Increasing focus of researchers as well as promising findings have directed authors interest to the mentioned research area.The purpose of researches, conducted and elaborated in this dissertation is to investigate the application of multiple classifier systems based on attribute selection on credit risk assessment. In accordance with the purpose, several researches have been conducted, that jointly represent a complex approach to the selected problem. The main goal of this paper isto develop fast and robust technique for combining classifiers, based on directed attribute selection, which will be able to create efficient and accurate systems for credit risk assessmentin retail. The afore mentioned technique must be sufficiently simple for easy implementationand wide application by the research community, including researchers that are not primarily focused on this field.Two key elements of the new technique are: (1) attribute selection used as strategy fortraining diverse classifiers and (2) ensemble thinning used to include only those classifiersthat contribute to overall system quality. Attribute selection in this context refers to the implementation of several different fast techniques which rank attributes by their quality. In order to ensure selection of different attributes, it is necessary to consider techniques based on different evaluation criteria for attribute ranking. Subsets of attributes, selected in suchmanner, are used in training process of classifiers, thus ensuring difference in produced models. In the next step technique selects only those models which when combined together,positively contribute to performances of ensemble. The selection is conducted using new, inthis paper proposed, greedy algorithm for ensemble thinning. Including ensemble thinning innew technique increases efficiency and quality of decisions.The new technique has been tested on credit data sets in accordance with defined research hypothesis of this doctoral dissertation. In presented research the results obtained using new technique are compared to results of individual classifiers included in the final ensemble, inorder to justify combining action. Additionally, decisions made by algorithm for ensemblethinning are analyzed as well as relationship between Q statistics and ensemble accuracy. Infollowing research, the results of the new technique are evaluated by techniques Bagging and Boosting. Results are evaluated with four different performance measures: accuracy, errortype I, error type II and AUC. Moreover, time necessary for training and testing of models aremeasured and compared in research.Results show significant improvement of classification performance compared toindividual classifiers as a direct result of the new technique. Furthermore, quality of obtained results can be compared with results of most popular techniques; moreover three out of four performance measures show superiority of the new technique. In accordance with the design,the new technique performs best on ensembles with small number of members and it is nottime consuming compared to Bagging and Boosting

    Forecasting: theory and practice

    Get PDF
    Forecasting has always been in the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The lack of a free-lunch theorem implies the need for a diverse set of forecasting methods to tackle an array of applications. This unique article provides a non-systematic review of the theory and the practice of forecasting. We offer a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts, including operations, economics, finance, energy, environment, and social good. We do not claim that this review is an exhaustive list of methods and applications. The list was compiled based on the expertise and interests of the authors. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of the forecasting theory and practice

    Forecasting: theory and practice

    Get PDF
    Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.info:eu-repo/semantics/publishedVersio
    corecore