Multimethod approach to learning from text-based construction failure data

Abstract

To be sustainable, the construction industry must learn from, and avoid, repetitive failures. At present, there is heavy reliance on learning from case-studies of catastrophic events and a lack of attention to the more frequent, lower consequence and yet repetitive failures. These smaller failures can have huge cumulative impact. This is important as the construction sector is worth ยฃ113 billion per year to the UK economy (6% of UK GDP) and provides over 2.4 million jobs (7% UK jobs). This impressive contribution is undermined by a large number of construction projects which run over time and over cost. This undermining is all the more damaging for those high profile, often publicly funded, infrastructure programmes which attract severe negative publicity when they run overbudget. While other factors contribute to this overspend (for example, inaccurate tender estimates and scope or design change), previous research found that correcting quality mistakes can account for over 20% of a contract's value. Another failure of the construction industry is its safety performance. In the 2017/18 fiscal year, the fatal injury rate for those working in UK construction was four times the national average at 1.64 per 100,000 workers. Additionally, the Health and Safety Executive in its 2018 Annual Report estimated that safety injuries on site cost ยฃ490M to the UK economy. It is therefore both a moral and economic imperative that the industry is learning to avoid repetitive failure. There is a wealth of information contained within accounts of more frequent, lower consequence incidents and safety observation reports, which should be used. These reports are collected as part of the lifecycle of the project. However, to date, these data have been inaccessible to traditional analysis techniques due to physical accessibility issues and the format of unstructured text data, requiring time consuming manual analysis. This project harnessed the potential of modern data science methods, including natural language processing (NLP) and machine learning (ML), to produce automated methods and recommendations for analysing these data for the construction industry. A multi-method approach was applied. First, a qualitative investigation used semi-structured interviews and thematic analysis to explore failure in the construction industry, with particular attention to present `learning from failure' practice, human factors and biases. Second, the text-based construction site failure data was analysed using recent data science methods. This analysis relied upon the insights from the first investigation to inform methodological decisions. It was decided to transform the unstructured text data into structured attributes, using machine learning classification methods, for further analysis. Transforming the unstructured text descriptions in this way allows further analysis methods to be performed. Possible further analyses unlocked by this method include risk analysis, graphical analysis, learning, and finer trend analysis. Finally, qualitative information from the thematic analysis was used to assess usefulness and form recommendations for industrial application of the data analysis methods employed to develop techniques that allow the capture and analysis of data to measure and mitigate the cumulative impact of smaller failures

    Similar works