Mixed-input second-hand car price estimation model based on scraped data

Abstract

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceThe number of second-hand cars is growing year by year. More and more people prefer to buy a second-hand car rather than a new one due to the increasing cost of new cars and their fast devaluation in price. Consequently, there has also been an increase in online marketplaces for peerto- peer (P2P) second-hand cars trades. A robust price estimation is needed for both dealers, to have a good idea on how to price their cars, and buyers, to understand whether a listing is overpriced or not. Price estimation for second-hand cars has been, to my knowledge, so far only explored with numerical and categorical features such as mileage driven, brand or production year. An approach that also uses image data has yet to be developed. This work aims to investigate the use of a multi-input price estimation model for second-hand cars taking advantage of a convolutional neural network (CNN), to extract features from car images, combined with an artificial neural network (ANN), dealing with the categorical-numerical features, and assess whether this method improves accuracy in price estimation over more traditional single-input methods. To train and evaluate the model, a dataset of second-hand car images and textual features is scraped from a marketplace and curated such that more than 700 images can be used for the training

    Similar works