2 research outputs found

    Accessing Big (Commercial) Data across a Global Research Infrastructure - Modelling Consumer Behaviour in China

    Get PDF
    (1) Business School (2) EPCCThe use of globally distributed computing systems and globally distributed data to understand and manage global organisations is a well-established vision. It can be found in patents awarded for electrical communications systems that are integrated with electro-mechanical computing devices as far back as 1927. Use of electrical communications to reproduce images goes back even further to the first fax patent awarded to Scottish inventor Alexander Bain in 1843, preceding Alexander Graham Bell's patent for the telephone by over 30 years. Like many other company assets, data has value, however it has two additional characteristics that establish tensions with a globally distributed vision: (i) its value cannot be assessed until after it has been analysed, and (ii) that analysis may prove to be of more value to a competitor than the company itself. This type of concern is not typical of the global scientific collaborations that have driven the development of global network infrastructure, a distinction Jim Gray of Microsoft highlighted by describing data exchanged in radio-astronomy collaborations as “completely worthless”, by which he meant that it had all the dimensionality and scale of the most complex problems in business or medicine, but none of the sensitivities that impede how and with whom you share that data, or what analyses you attempt. Since the Economic and Social Research Council defines social science as “the study of society and the manner in which people behave and influence the world around us” it is clear that the sensitivities of exposing commercial data on behaviour in global markets to globally distributed computational environments presents a major challenge for (Social) Data Scientists. This paper describes some of the challenges of building the first Global Computing Grid to connect collaborating sites in three continents and installing an embedded analytical facility within a Chinese commercial organisation that has enabled collaborative analysis of millions of consumers. We report how this access has provided new insights into consumer behaviour within China ranging from testing strategic models of economic development to exploring ‘digital exclusion’ and the impact of migration on technology adoption

    Fitting the reproduction number from UK coronavirus case data and why it is close to 1.

    Get PDF
    Peer reviewed: TrueWe present a method for rapid calculation of coronavirus growth rates and [Formula: see text]-numbers tailored to publicly available UK data. We assume that the case data comprise a smooth, underlying trend which is differentiable, plus systematic errors and a non-differentiable noise term, and use bespoke data processing to remove systematic errors and noise. The approach is designed to prioritize up-to-date estimates. Our method is validated against published consensus [Formula: see text]-numbers from the UK government and is shown to produce comparable results two weeks earlier. The case-driven approach is combined with weight-shift-scale methods to monitor trends in the epidemic and for medium-term predictions. Using case-fatality ratios, we create a narrative for trends in the UK epidemic: increased infectiousness of the B1.117 (Alpha) variant, and the effectiveness of vaccination in reducing severity of infection. For longer-term future scenarios, we base future [Formula: see text] on insight from localized spread models, which show [Formula: see text] going asymptotically to 1 after a transient, regardless of how large the [Formula: see text] transient is. This accords with short-lived peaks observed in case data. These cannot be explained by a well-mixed model and are suggestive of spread on a localized network. This article is part of the theme issue 'Technical challenges of modelling real-life epidemics and examples of overcoming these'
    corecore