4,618 research outputs found

    A Review of Machine Learning Approaches for Real Estate Valuation

    Get PDF
    Real estate managers must identify the value for properties in their current market. Traditionally, this involved simple data analysis with adjustments made based on manager’s experience. Given the amount of money currently involved in these decisions, and the complexity and speed at which valuation decisions must be made, machine learning technologies provide a newer alternative for property valuation that could improve upon traditional methods. This study utilizes a systematic literature review methodology to identify published studies from the past two decades where specific machine learning technologies have been applied to the property valuation task. We develop a data, reasoning, usefulness (DRU) framework that provides a set of theoretical and practice-based criteria for a multi-faceted performance assessment for each system. This assessment provides the basis for identifying the current state of research in this domain as well as theoretical and practical implications and directions for future research

    Combination of linear classifiers using score function -- analysis of possible combination strategies

    Full text link
    In this work, we addressed the issue of combining linear classifiers using their score functions. The value of the scoring function depends on the distance from the decision boundary. Two score functions have been tested and four different combination strategies were investigated. During the experimental study, the proposed approach was applied to the heterogeneous ensemble and it was compared to two reference methods -- majority voting and model averaging respectively. The comparison was made in terms of seven different quality criteria. The result shows that combination strategies based on simple average, and trimmed average are the best combination strategies of the geometrical combination

    Used Cars Price Prediction and Valuation using Data Mining Techniques

    Get PDF
    Due to the unprecedented number of cars being purchased and sold, used car price prediction is a topic of high interest. Because of the affordability of used cars in developing countries, people tend more purchase used cars. A primary objective of this project is to estimate used car prices by using attributes that are highly correlated with a label (Price). To accomplish this, data mining technology has been employed. Null, redundant, and missing values were removed from the dataset during pre-processing. In this supervised learning study, three regressors (Random Forest Regressor, Linear Regression, and Bagging Regressor) have been trained, tested, and compared against a benchmark dataset. Among all the experiments, the Random Forest Regressor had the highest score at 95%, followed by 0.025 MSE, 0.0008 MAE, and 0.0378 RMSE respectively. In addition to Random Forest Regression, Bagging Regression performed well with an 88% score, followed by Linear Regression having an 85% mark. A train-test split of 80/20 with 40 random states was used in all experiments. The researchers of this project anticipate that in the near future, the most sophisticated algorithm is used for making predictions, and then the model will be integrated into a mobile app or web page for the general public to use

    Improving Prediction Models for Mass Assessment: A Data Stream Approach

    Get PDF
    Mass appraisal is the process of valuing a large collection of properties within a city/municipality usually for tax purposes. The common methodology for mass appraisal is based on multiple regression though this methodology has been found to be deficient. Data mining methods have been proposed and tested as an alternative but the results are very mixed. This study introduces a new approach to building prediction models for assessing residential property values by treating past sales transactions as a data stream. The study used 110,525 sales transaction records from a municipality in the Midwest of the US. Our results show that a data stream based approach outperforms the traditional regression approach, thus showing its potential in improving the performance of prediction models for mass assessment

    Who performs better? AVMs vs hedonic models

    Get PDF
    Purpose: In the literature there are numerous tests that compare the accuracy of automated valuation models (AVMs). These models first train themselves with price data and property characteristics, then they are tested by measuring their ability to predict prices. Most of them compare the effectiveness of traditional econometric models against the use of machine learning algorithms. Although the latter seem to offer better performance, there is not yet a complete survey of the literature to confirm the hypothesis. Design/methodology/approach: All tests comparing regression analysis and AVMs machine learning on the same data set have been identified. The scores obtained in terms of accuracy were then compared with each other. Findings: Machine learning models are more accurate than traditional regression analysis in their ability to predict value. Nevertheless, many authors point out as their limit their black box nature and their poor inferential abilities. Practical implications: AVMs machine learning offers a huge advantage for all real estate operators who know and can use them. Their use in public policy or litigation can be critical. Originality/value: According to the author, this is the first systematic review that collects all the articles produced on the subject done comparing the results obtained

    Land valuation using an innovative model combining machine learning and spatial context

    Get PDF
    Valuation predictions are used by buyers, sellers, regulators, and authorities to assess the fairness of the value being asked. Urbanization demands a modern and efficient land valuation system since the conventional approach is costly, slow, and relatively subjective towards locational factors. This necessitates the development of alternative methods that are faster, user-friendly, and digitally based. These approaches should use geographic information systems and strong analytical tools to produce reliable and accurate valuations. Location information in the form of spatial data is crucial because the price can vary significantly based on the neighborhood and context of where the parcel is located. In this thesis, a model has been proposed that combines machine learning and spatial context. It integrates raster information derived from remote sensing as well as vector information from geospatial analytics to predict land values, in the City of Springfield. These are used to investigate whether a joint model can improve the value estimation. The study also identifies the factors that are most influential in driving these models. A geodatabase was created by calculating proximity and accessibility to key locations as well as integrating socio-economic variables, and by adding statistics related to green space density and vegetation index utilizing Sentinel-2 -satellite data. The model has been trained using Greene County government data as truth appraisal land values through supervised machine learning models and the impact of each data type on price prediction was explored. Two types of modeling were conducted. Initially, only spatial context data were used to assess their predictive capability. Subsequently, socio-economic variables were added to the dataset to compare the performance of the models. The results showed that there was a slight difference in performance between the random forest and gradient boosting algorithm as well as using distance measures data derived from GIS and adding socioeconomic variables to them. Furthermore, spatial autocorrelation analysis was conducted to investigate how the distribution of similar attributes related to the location of the land affects its value. This analysis also aimed to identify the disparities that exist in terms of socio-economic structure and to measure their magnitude.Includes bibliographical references

    A BIM and machine learning integration framework for automated property valuation

    Get PDF
    Property valuation contributes significantly to market economic activities, while it has been continuously questioned on its low transparency, inaccuracy and inefficiency. With Big Data applications in real estate domain growing fast, computer-aided valuation systems such as AI-enhanced automated valuation models (AVMs) have the potential to address these issues. While a plethora of research has focused on improving predictive performance of AVMs, little effort has been made on information requirements for valuation models. As the amount of data in BIM is rising exponentially, the value-relevant design information has not been widely utilized for property valuation. This paper presents a system that leverages a holistic data interpretation, improves information exchange between AEC projects and property valuation, and automates specific workflows for property valuation. A mixed research method was adopted combining the archival literature research, qualitative and quantitative data analysis. A BIM and Machine learning (ML) integration framework for automated property valuation was proposed which contains a fundamental database interpretation, an IFC-based information extraction and an automated valuation model based on genetic algorithm optimized machine learning (GA-GBR). The main findings indicated: (1) Partial information requirements can be extracted from BIM models, (2) Property valuation can be performed in a more accurate and efficient way. This research contributes to managing information exchange between AEC projects and property valuation and supporting automated property valuation. It was suggested that the infusion of BIM, ML and other emerging digital technologies might add values to property valuation and the construction industry

    SecREP : A Framework for Automating the Extraction and Prioritization of Security Requirements Using Machine Learning and NLP Techniques

    Get PDF
    Gathering and extracting security requirements adequately requires extensive effort, experience, and time, as large amounts of data need to be analyzed. While many manual and academic approaches have been developed to tackle the discipline of Security Requirements Engineering (SRE), a need still exists for automating the SRE process. This need stems mainly from the difficult, error-prone, and time-consuming nature of traditional and manual frameworks. Machine learning techniques have been widely used to facilitate and automate the extraction of useful information from software requirements documents and artifacts. Such approaches can be utilized to yield beneficial results in automating the process of extracting and eliciting security requirements. However, the extraction of security requirements alone leaves software engineers with yet another tedious task of prioritizing the most critical security requirements. The competitive and fast-paced nature of software development, in addition to resource constraints make the process of security requirements prioritization crucial for software engineers to make educated decisions in risk-analysis and trade-off analysis. To that end, this thesis presents an automated framework/pipeline for extracting and prioritizing security requirements. The proposed framework, called the Security Requirements Extraction and Prioritization Framework (SecREP) consists of two parts: SecREP Part 1: Proposes a machine learning approach for identifying/extracting security requirements from natural language software requirements artifacts (e.g., the Software Requirement Specification document, known as the SRS documents) SecREP Part 2: Proposes a scheme for prioritizing the security requirements identified in the previous step. For the first part of the SecREP framework, three machine learning models (SVM, Naive Bayes, and Random Forest) were trained using an enhanced dataset the “SecREP Dataset” that was created as a result of this work. Each model was validated using resampling (80% of for training and 20% for validation) and 5-folds cross validation techniques. For the second part of the SecREP framework, a prioritization scheme was established with the aid of NLP techniques. The proposed prioritization scheme analyzes each security requirement using Part-of-speech (POS) and Named Entity Recognition methods to extract assets, security attributes, and threats from the security requirement. Additionally, using a text similarity method, each security requirement is compared to a super-sentence that was defined based on the STRIDE threat model. This prioritization scheme was applied to the extracted list of security requirements obtained from the case study in part one, and the priority score for each requirement was calculated and showcase

    Combination of linear classifiers using score function -- analysis of possible combination strategies

    Get PDF
    In this work, we addressed the issue of combining linear classifiers using their score functions. The value of the scoring function depends on the distance from the decision boundary. Two score functions have been tested and four different combination strategies were investigated. During the experimental study, the proposed approach was applied to the heterogeneous ensemble and it was compared to two reference methods -- majority voting and model averaging respectively. The comparison was made in terms of seven different quality criteria. The result shows that combination strategies based on simple average, and trimmed average are the best combination strategies of the geometrical combination
    corecore