285 research outputs found

    Google Earth Engine cloud computing platform for remote sensing big data applications: a comprehensive review

    Get PDF
    Remote sensing (RS) systems have been collecting massive volumes of datasets for decades, managing and analyzing of which are not practical using common software packages and desktop computing resources. In this regard, Google has developed a cloud computing platform, called Google Earth Engine (GEE), to effectively address the challenges of big data analysis. In particular, this platformfacilitates processing big geo data over large areas and monitoring the environment for long periods of time. Although this platformwas launched in 2010 and has proved its high potential for different applications, it has not been fully investigated and utilized for RS applications until recent years. Therefore, this study aims to comprehensively explore different aspects of the GEE platform, including its datasets, functions, advantages/limitations, and various applications. For this purpose, 450 journal articles published in 150 journals between January 2010 andMay 2020 were studied. It was observed that Landsat and Sentinel datasets were extensively utilized by GEE users. Moreover, supervised machine learning algorithms, such as Random Forest, were more widely applied to image classification tasks. GEE has also been employed in a broad range of applications, such as Land Cover/land Use classification, hydrology, urban planning, natural disaster, climate analyses, and image processing. It was generally observed that the number of GEE publications have significantly increased during the past few years, and it is expected that GEE will be utilized by more users from different fields to resolve their big data processing challenges.Peer ReviewedPostprint (published version

    Semantic location extraction from crowdsourced data

    Get PDF
    Crowdsourced Data (CSD) has recently received increased attention in many application areas including disaster management. Convenience of production and use, data currency and abundancy are some of the key reasons for attracting this high interest. Conversely, quality issues like incompleteness, credibility and relevancy prevent the direct use of such data in important applications like disaster management. Moreover, location information availability of CSD is problematic as it remains very low in many crowd sourced platforms such as Twitter. Also, this recorded location is mostly related to the mobile device or user location and often does not represent the event location. In CSD, event location is discussed descriptively in the comments in addition to the recorded location (which is generated by means of mobile device's GPS or mobile communication network). This study attempts to semantically extract the CSD location information with the help of an ontological Gazetteer and other available resources. 2011 Queensland flood tweets and Ushahidi Crowd Map data were semantically analysed to extract the location information with the support of Queensland Gazetteer which is converted to an ontological gazetteer and a global gazetteer. Some preliminary results show that the use of ontologies and semantics can improve the accuracy of place name identification of CSD and the process of location information extraction

    Deriving trajectory embeddings from global positioning system movement data

    Get PDF
    Mini Dissertation (MSc (Advanced Data Analytics))--University of Pretoria, 2022.Analysing unstructured data with minimal contextual information is a challenge faced in spatial applications such as movement data. Movement data are sequences of time-stamped locations of a moving entity analogous to text data as sequences of words in a document. Text analytics is rich in methods to learn word embeddings and latent semantic clusters from unstructured data. In this work, the successes from probabilistic topic models which are used in natural language processing (NLP) were the inspiration for applying these methods on movement data. The motivation is based on the fact that topic models exhibit characteristics which are found both in clustering and dimensionality reduction techniques. Furthermore, the inferred matrices can be used as interpretable topic distributions for movement behaviour and the lower dimensional embeddings generated by the LDA model can be used to cluster movement behaviour. In this work various existing techniques for trajectory clustering in the literature are explored and the advantages and disadvantages of each method are considered. The challenges of trajectory modelling with LDA are examined and solutions to these challenges are suggested. Lastly, the advantages of using LDA compared to traditional clustering techniques are discussed. The analysis in this work explores the use of LDA to two use cases. Firstly, the ability of LDA to infer interpretable topics is explored by analysing the movement of jaguars in South America. Secondly, the ability of the LDA to cluster movement trajectories is investigated by clustering driver behaviour based on real world driving data. The results of the two experiments show that it is possible to derive interpretable topics and to cluster movement behavior of trajectories using the LDA model.StatisticsMSc (Advanced Data Analytics)Unrestricte

    End-to-end anomaly detection in stream data

    Get PDF
    Nowadays, huge volumes of data are generated with increasing velocity through various systems, applications, and activities. This increases the demand for stream and time series analysis to react to changing conditions in real-time for enhanced efficiency and quality of service delivery as well as upgraded safety and security in private and public sectors. Despite its very rich history, time series anomaly detection is still one of the vital topics in machine learning research and is receiving increasing attention. Identifying hidden patterns and selecting an appropriate model that fits the observed data well and also carries over to unobserved data is not a trivial task. Due to the increasing diversity of data sources and associated stochastic processes, this pivotal data analysis topic is loaded with various challenges like complex latent patterns, concept drift, and overfitting that may mislead the model and cause a high false alarm rate. Handling these challenges leads the advanced anomaly detection methods to develop sophisticated decision logic, which turns them into mysterious and inexplicable black-boxes. Contrary to this trend, end-users expect transparency and verifiability to trust a model and the outcomes it produces. Also, pointing the users to the most anomalous/malicious areas of time series and causal features could save them time, energy, and money. For the mentioned reasons, this thesis is addressing the crucial challenges in an end-to-end pipeline of stream-based anomaly detection through the three essential phases of behavior prediction, inference, and interpretation. The first step is focused on devising a time series model that leads to high average accuracy as well as small error deviation. On this basis, we propose higher-quality anomaly detection and scoring techniques that utilize the related contexts to reclassify the observations and post-pruning the unjustified events. Last but not least, we make the predictive process transparent and verifiable by providing meaningful reasoning behind its generated results based on the understandable concepts by a human. The provided insight can pinpoint the anomalous regions of time series and explain why the current status of a system has been flagged as anomalous. Stream-based anomaly detection research is a principal area of innovation to support our economy, security, and even the safety and health of societies worldwide. We believe our proposed analysis techniques can contribute to building a situational awareness platform and open new perspectives in a variety of domains like cybersecurity, and health

    Sequence Mining Analysis on Shopping Data

    Get PDF
    Vivemos numa altura onde o acesso à informação é cada vez mais fácil. Esta facilidade leva a que pessoas e empresas tentem extrair o máximo de valor inerente. Um pouco por toda a parte as grandes marcas de retalho e de centros comerciais competem entre si para conseguir a oportunidade de acesso a dados relativos aos clientes e aos seus hábitos. A informação é encontrada através do uso de técnicas de Data Mining. Esta procura implacável leva a que se tente encontrar novos meios para a detetar com o objetivo de conseguir obter vantagem competitiva sobre os seus concorrentes.Nesta dissertação é apresentada um conjunto de análises feitas num dataset composto por visitas de clientes a lojas. Atualmente, existem já vários testes que se implementam nestes datasets com o objetivo de conhecerem melhor os clientes. No entanto, as técnicas de sequence mining raramente são usadas. O principal objetivo destas técnicas é analisar grandes conjuntos de dados organizados por tempo(sequenciais) e extrair o conjunto de sequências compostas por semelhanças entre os elementos. Se aplicarmos estas técnicas corretamente num dataset com formato sequencial poderemos extrair informação com qualidade e diferenciadora em relação a outros métodos usados.O dataset usado é composto por informação espácio-temporal real da localização de clientes dentro de um espaço comercial. Cada visita contém um identificador de cliente, a loja em que se encontra, o tempo específico em que a deteção foi feita, entre outros. Através destes tipos de elementos é possível criarem-se diferentes tipos de sequências. Esta dissertação demonstra algumas dessas possíveis sequências, bem como a explicação da análise feita referente a cada uma delas.Being so easy to have access to information it's only natural that people and companies try to extract the maximum real value from it. Every large retail stores and commercial centres in the world fight to have the opportunity to be in possession of data about their customers and habits. This data has been extracted through the use of data mining techniques. Due to this relentless demand for new data, every new mean of finding it can bring great competitive advantages over other competitors.This dissertation presents a group of analyses made to a dataset composed by stores' visits. There are already several types of tests made to datasets of this kind in order to better understand the clients. However, the sequence mining techniques are rarely used. These techniques' main goal is to analyse a large set of data with a sequence temporal format and extract the set of sequences with similarities between all the elements. By applying these techniques correctly in a sequence dataset we can find that they can help to extract different and quality information.The dataset is composed of real spacial-time data from clients' locations in a commercial centre. Each element of this data contains a client ID, a store, the specific time of that detection and other information. Through these elements, different types of sequences can be made. The dissertation presents some of these possible sequences as well as the types of sequence mining analyses performed on each one

    Topic modeling in marketing: recent advances and research opportunities

    Get PDF
    Using a probabilistic approach for exploring latent patterns in high-dimensional co-occurrence data, topic models offer researchers a flexible and open framework for soft-clustering large data sets. In recent years, there has been a growing interest among marketing scholars and practitioners to adopt topic models in various marketing application domains. However, to this date, there is no comprehensive overview of this rapidly evolving field. By analyzing a set of 61 published papers along with conceptual contributions, we systematically review this highly heterogeneous area of research. In doing so, we characterize extant contributions employing topic models in marketing along the dimensions data structures and retrieval of input data, implementation and extensions of basic topic models, and model performance evaluation. Our findings confirm that there is considerable progress done in various marketing sub-areas. However, there is still scope for promising future research, in particular with respect to integrating multiple, dynamic data sources, including time-varying covariates and the combination of exploratory topic models with powerful predictive marketing models

    Tracking the Temporal-Evolution of Supernova Bubbles in Numerical Simulations

    Get PDF
    The study of low-dimensional, noisy manifolds embedded in a higher dimensional space has been extremely useful in many applications, from the chemical analysis of multi-phase flows to simulations of galactic mergers. Building a probabilistic model of the manifolds has helped in describing their essential properties and how they vary in space. However, when the manifold is evolving through time, a joint spatio-temporal modelling is needed, in order to fully comprehend its nature. We propose a first-order Markovian process that propagates the spatial probabilistic model of a manifold at fixed time, to its adjacent temporal stages. The proposed methodology is demonstrated using a particle simulation of an interacting dwarf galaxy to describe the evolution of a cavity generated by a Supernov

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity

    Urban Informatics

    Get PDF
    This open access book is the first to systematically introduce the principles of urban informatics and its application to every aspect of the city that involves its functioning, control, management, and future planning. It introduces new models and tools being developed to understand and implement these technologies that enable cities to function more efficiently – to become ‘smart’ and ‘sustainable’. The smart city has quickly emerged as computers have become ever smaller to the point where they can be embedded into the very fabric of the city, as well as being central to new ways in which the population can communicate and act. When cities are wired in this way, they have the potential to become sentient and responsive, generating massive streams of ‘big’ data in real time as well as providing immense opportunities for extracting new forms of urban data through crowdsourcing. This book offers a comprehensive review of the methods that form the core of urban informatics from various kinds of urban remote sensing to new approaches to machine learning and statistical modelling. It provides a detailed technical introduction to the wide array of tools information scientists need to develop the key urban analytics that are fundamental to learning about the smart city, and it outlines ways in which these tools can be used to inform design and policy so that cities can become more efficient with a greater concern for environment and equity
    • …
    corecore