    Analytics of human presence and movement behaviour within specific environments

    The vast amounts of detailed information, generated by Wi-Fi and other mobile communication technologies, provide an invaluable opportunity to study different aspects of presence and movement behaviours of people within a given environment; for example, a university campus, an organisation office complex, or a city centre. Utilising such data, this thesis studies three main aspects of the human presence and movement behaviours: spatio-temporal movement (where and when do people move), user identification (how to uniquely identify people from their presence and movement historical records), and social grouping (how do people interact). Previous research works have predominantly studied two out of these three aspects, at most. Conversely, we investigate all three aspects in order to develop a coherent view of the human presence and movement behaviour within selected environments. More specifically, we create stochastic models for movement prediction and user identification. We also devise a set of clustering models for the detection of the social groups within a given environment. The thesis makes the following contributions: 1. Proposes a family of predictive models that allows for inference of locations though a collaborative mechanism which does not require the profiling of individual users. These prediction models utilise suffix trees as their core underlying data structure, where predictions about a specific individual are computed over an aggregate model incorporating the collective record of observed behaviours of multiple users. 2. Defines a mobility fingerprint as a profile constructed from the users historical mobility traces. The proposed method for constructing such a profile is a principled and scalable implementation of a variable length Markov model based on n-grams. 3. Proposes density-based clustering methods that discover social groups by analysing activity traces of mobile users as they move around, from one location to another, within an observed environment. We utilise two large collections of mobility traces: a GPS data set from Nokia and an Eduroam network log from Birkbeck, University of London, for the evaluation of the proposed models reported herein

    Enhanced water demand analysis via symbolic approximation within an epidemiology-based forecasting framework

    Enhanced water demand analysis via symbolic approximation within an epidemiology-based forecasting framework

Epidemiology-based models have shown to have successful adaptations to deal with challenges coming from various areas of Engineering, such as those related to energy use or asset management. This paper deals with urban water demand, and data analysis is based on an Epidemiology tool-set herein developed. This combination represents a novel framework in urban hydraulics. Specifically, various reduction tools for time series analyses based on a symbolic approximate (SAX) coding technique able to deal with simple versions of data sets are presented. Then, a neural-network-based model that uses SAX-based knowledge-generation from various time series is shown to improve forecasting abilities. This knowledge is produced by identifying water distribution district metered areas of high similarity to a given target area and sharing demand patterns with the latter. The proposal has been tested with databases from a Brazilian water utility, providing key knowledge for improving water management and hydraulic operation of the distribution system. This novel analysis framework shows several benefits in terms of accuracy and performance of neural network models for water demand

    [EN] Epidemiology-based models have shown to have successful adaptations to deal with challenges coming from various areas of Engineering, such as those related to energy use or asset management. This paper deals with urban water demand, and data analysis is based on an Epidemiology tool-set herein developed. This combination represents a novel framework in urban hydraulics. Specifically, various reduction tools for time series analyses based on a symbolic approximate (SAX) coding technique able to deal with simple versions of data sets are presented. Then, a neural-network-based model that uses SAX-based knowledge-generation from various time series is shown to improve forecasting abilities. This knowledge is produced by identifying water distribution district metered areas of high similarity to a given target area and sharing demand patterns with the latter.     The Parallelism Motifs of Genomic Data Analysis

    Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing