3,950 research outputs found

    Estimating UK House Prices using Machine Learning

    Get PDF
    House price estimation is an important subject for property owners, property developers, investors and buyers. It has featured in many academic research papers and some government and commercial reports. The price of a house may vary depending on several features including geographic location, tenure, age, type, size, market, etc. Existing studies have largely focused on applying single or multiple machine learning techniques to single or groups of datasets to identify the best performing algorithms, models and/or most important predictors, but this paper proposes a cumulative layering approach to what it describes as a Multi-feature House Price Estimation (MfHPE) framework. The MfHPE is a process-oriented, data-driven and machine learning based framework that does not just identify the best performing algorithms or features that drive the accuracy of models but also exploits a cumulative multi-feature layering approach to creating machine learning models, optimising and evaluating them so as to produce tangible insights that enable the decision-making process for stakeholders within the housing ecosystem for a more realistic estimation of house prices. Fundamentally, the MfHPE framework development leverages the Design Science Research Methodology (DSRM) and HM Land Registry’s Price Paid Data is ingested as the base transactions data. 1.1 million London-based transaction records between January 2011 and December 2020 have been exploited for model design, optimisation and evaluation, while 84,051 2021 transactions have been used for model validation. With the capacity for updates to existing datasets and the introduction of new datasets and algorithms, the proposed framework has also leveraged a range of neighbourhood and macroeconomic features including the location of rail stations, supermarkets, bus stops, inflation rate, GDP, employment rate, Consumer Price Index (CPIH) and unemployment rate to explore their impact on the estimation of house prices and their influence on the behaviours of machine learning algorithms. Five machine learning algorithms have been exploited and three evaluation metrics have been used. Results show that the layered introduction of new variety of features in multiple tiers led to improved performance in 50% of models, a change in the best performing models as new variety of features are introduced, and that the choice of evaluation metrics should not just be based on technical problem types but on three components: (i) critical business objectives or project goals; (ii) variety of features; and (iii) machine learning algorithms

    A Data Mining-based Exploration of Antecedents of Voluntary Knowledge Contribution to Organizational Repositories

    Get PDF
    Knowledge Management systems are often based on the assumption that employees will contribute their job related knowledge to electronic knowledge repositories, though organizations can’t force its employees to do so. In a previous work Stewart & Osei-Bryson (2013) developed and tested a research model that was based on the theory of planned behavior. In this paper we use a data mining approach to explore the same data in order to see if there could be additional hypothesis that could be worthy of future exploration

    A probabilistic model for predicting software development effort

    Full text link

    2018 SDSU Data Science Symposium Program

    Get PDF
    Table of Contents: Letter from SDSU PresidentLetter from SDSU Department of Mathematics and Statistics Dept. HeadSponsorsGeneral InformationKeynote SpeakersInvited SpeakersSunday ScheduleWorkshop InformationMonday ScheduleAbstracts| Invited SpeakersAbstracts | Oral PresentationsPoster PresentationCommittee and Volunteer

    Links between sustainability-related innovation and sustainability management

    Get PDF
    This paper analyses the link between sustainability-related innovation and sustainability performance and the role that family firms play in this. This theme is particular relevant from a European point of view given the large number of firms that are family-owned. Governments often support environmentally and socially beneficial innovation with various policy instruments with the intention is to increase international competitiveness and simultaneously support sustainable development. In parallel, firms use corporate social responsibility (CSR) and environmental management systems partly in the hope that this will foster such innovation in their organisation. Hence the main research question of this paper is about the association of CSR and environmental management with environmentally and socially beneficial innovation and its determinants. Based on panel data, the paper analyses the link of corporate sustainability performance with sustainability innovation and the effect of being a family firm using panel estimation techniques. The paper discusses the results of the analysis, which point to a moderating role of family firms on the link of sustainability innovation and performance and assesses the policy implications of this insight.sustainability, innovation, management, quantitative methods, family firms

    An Event-based Analysis Framework for Open Source Software Development Projects

    Get PDF
    The increasing popularity and success of Open Source Software (OSS) development projects has drawn significant attention of academics and open source participants over the last two decades. As one of the key areas in OSS research, assessing and predicting OSS performance is of great value to both OSS communities and organizations who are interested in investing in OSS projects. Most existing research, however, has considered OSS project performance as the outcome of static cross-sectional factors such as number of developers, project activity level, and license choice. While variance studies can identify some predictors of project outcomes, they tend to neglect the actual process of development. Without a closer examination of how events occur, an understanding of OSS projects is incomplete. This dissertation aims to combine both process and variance strategy, to investigate how OSS projects change over time through their development processes; and to explore how these changes affect project performance. I design, instantiate, and evaluate a framework and an artifact, EventMiner, to analyze OSS projects’ evolution through development activities. This framework integrates concepts from various theories such as distributed cognition (DCog) and complexity theory, applying data mining techniques such as decision trees, motif analysis, and hidden Markov modeling to automatically analyze and interpret the trace data of 103 OSS projects from an open source repository. The results support the construction of process theories on OSS development. The study contributes to literature in DCog, design routines, OSS development, and OSS performance. The resulting framework allows OSS researchers who are interested in OSS development processes to share and reuse data and data analysis processes in an open-source manner

    Exploring fish purchasing behaviour using data analytics

    Get PDF
    Nas últimas décadas têm ocorrido mudanças significativas no setor do retalho resultantes da globalização, do aumento de competitividade e da transformação do comportamento de compra do consumidor. Esta mudança de paradigma também se aplica ao setor do peixe fresco, que tem sido alvo do interesse de investigadores internacionais por razões políticas e económicas. Tendo em conta este ambiente competitivo, que valoriza a qualidade e o serviço fornecido ao consumidor assente em custos aceitáveis, é necessário a adoção de estratégias focadas no cliente. Esta dissertação está integrada no projeto ValorMar, que nasceu do compromisso de um conjunto alargado de entidades, desde empresas até centros de investigação posicionados pela relevância da economia marítima na cadeia de valor do pescado. Assim, esta dissertação irá tentar compreender relações que se revelem críticas para a tomada de decisão dos consumidores no momento de compra de peixe fresco. Para tal, irão ser usados dados transacionais e técnicas de data mining adequadas ao problema.A metodologia proposta por esta dissertação tem como objetivo não só a identificação de clientes recorrendo a técnicas de segmentação, mas também uma análise ao carrinho de compras de um cliente de peixe fresco. Estas análises aos dados irão mostrar que a extração de conhecimento de grandes bases de dados permite melhorar as decisões estratégicas das empresas e a sua relação com os clientes.In the last decades there have been significant changes in the retail sector resulting from globalization, the increased competitiveness and transformation on consumer's purchasing behaviour. This paradigm shift also applies to the fish sector, that has been capturing the interest of researchers internationally for political and economic reasons. Taking this competitive environment into account, which values the quality and the service given to the customer based on acceptable costs, it is necessary to adopt customer focused strategies.This thesis is integrated in the ValorMar's project, which was born from the commitment of a broad spectrum of entities, from companies to research centers, positioned by the relevance of the sea economy in the fishery value chain. Thus, this dissertation will try to understand critical relations for the decision making of customers when buying fresh fish.For this, transactional data and data mining techniques appropriate to the problem will be used.The methodology proposed by this thesis aims not only to identify customers using clustering techniques, but also to analyze the market basket of a fresh fish customer. These data analyzis will show that the knowledge extraction from large databases allows to improve the companies strategic decisions and their relationship with customers

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models
    • …
    corecore