    A Two-Level Information Modelling Translation Methodology and Framework to Achieve Semantic Interoperability in Constrained GeoObservational Sensor Systems

    As geographical observational data capture, storage and sharing technologies such as in situ remote monitoring systems and spatial data infrastructures evolve, the vision of a Digital Earth, first articulated by Al Gore in 1998 is getting ever closer. However, there are still many challenges and open research questions. For example, data quality, provenance and heterogeneity remain an issue due to the complexity of geo-spatial data and information representation. Observational data are often inadequately semantically enriched by geo-observational information systems or spatial data infrastructures and so they often do not fully capture the true meaning of the associated datasets. Furthermore, data models underpinning these information systems are typically too rigid in their data representation to allow for the ever-changing and evolving nature of geo-spatial domain concepts. This impoverished approach to observational data representation reduces the ability of multi-disciplinary practitioners to share information in an interoperable and computable way. The health domain experiences similar challenges with representing complex and evolving domain information concepts. Within any complex domain (such as Earth system science or health) two categories or levels of domain concepts exist. Those concepts that remain stable over a long period of time, and those concepts that are prone to change, as the domain knowledge evolves, and new discoveries are made. Health informaticians have developed a sophisticated two-level modelling systems design approach for electronic health documentation over many years, and with the use of archetypes, have shown how data, information, and knowledge interoperability among heterogenous systems can be achieved. This research investigates whether two-level modelling can be translated from the health domain to the geo-spatial domain and applied to observing scenarios to achieve semantic interoperability within and between spatial data infrastructures, beyond what is possible with current state-of-the-art approaches. A detailed review of state-of-the-art SDIs, geo-spatial standards and the two-level modelling methodology was performed. A cross-domain translation methodology was developed, and a proof-of-concept geo-spatial two-level modelling framework was defined and implemented. The Open Geospatial Consortium’s (OGC) Observations & Measurements (O&M) standard was re-profiled to aid investigation of the two-level information modelling approach. An evaluation of the method was undertaken using II specific use-case scenarios. Information modelling was performed using the two-level modelling method to show how existing historical ocean observing datasets can be expressed semantically and harmonized using two-level modelling. Also, the flexibility of the approach was investigated by applying the method to an air quality monitoring scenario using a technologically constrained monitoring sensor system. This work has demonstrated that two-level modelling can be translated to the geospatial domain and then further developed to be used within a constrained technological sensor system; using traditional wireless sensor networks, semantic web technologies and Internet of Things based technologies. Domain specific evaluation results show that twolevel modelling presents a viable approach to achieve semantic interoperability between constrained geo-observational sensor systems and spatial data infrastructures for ocean observing and city based air quality observing scenarios. This has been demonstrated through the re-purposing of selected, existing geospatial data models and standards. However, it was found that re-using existing standards requires careful ontological analysis per domain concept and so caution is recommended in assuming the wider applicability of the approach. While the benefits of adopting a two-level information modelling approach to geospatial information modelling are potentially great, it was found that translation to a new domain is complex. The complexity of the approach was found to be a barrier to adoption, especially in commercial based projects where standards implementation is low on implementation road maps and the perceived benefits of standards adherence are low. Arising from this work, a novel set of base software components, methods and fundamental geo-archetypes have been developed. However, during this work it was not possible to form the required rich community of supporters to fully validate geoarchetypes. Therefore, the findings of this work are not exhaustive, and the archetype models produced are only indicative. The findings of this work can be used as the basis to encourage further investigation and uptake of two-level modelling within the Earth system science and geo-spatial domain. Ultimately, the outcomes of this work are to recommend further development and evaluation of the approach, building on the positive results thus far, and the base software artefacts developed to support the approach

    A New Map Symbol Design Method for Real-Time Visualization of Geo-Sensor Data (Short Paper)

    Maps are an excellent way to present data with spatial components. For the large-scale geo-sensors being utilized in recent years, the map-based management and visualization of geo-senor data have become ubiquitous. Without a doubt, managing and visualizing geo-sensor data on maps will have vastly more future applications. However, current maps typically do not support real-time communication in the Internet of Things (IoT), and it is difficult to implement real-time visualization of sensor data on a map. Map symbols are the language of maps. In this paper, we describe a new map symbol design method for geo-sensor data acquisition and visualization on maps. We refer to the sensor data visual method in supervisory control and data acquisition system (SCADA) and apply it to the design process of map symbols. Based on the traditional vector map symbol, the mapping relationship between the sensor data and the graphic element is defined in the map symbol design process. When the map symbol is rendered in the map, the map symbol is integrated into the map layer. The communication module in the map that communicates with the sensor device receives real-time sensor data and triggers a refresh of the map layer according to the mapping profile. All the methods and processes shown herein have been verified in GeoTools

    Towards an Efficient, Scalable Stream Query Operator Framework for Representing and Analyzing Continuous Fields

    Advancements in sensor technology have made it less expensive to deploy massive numbers of sensors to observe continuous geographic phenomena at high sample rates and stream live sensor observations. This fact has raised new challenges since sensor streams have pushed the limits of traditional geo-sensor data management technology. Data Stream Engines (DSEs) provide facilities for near real-time processing of streams, however, algorithms supporting representing and analyzing Spatio-Temporal (ST) phenomena are limited. This dissertation investigates near real-time representation and analysis of continuous ST phenomena, observed by large numbers of mobile, asynchronously sampling sensors, using a DSE and proposes two novel stream query operator frameworks. First, the ST Interpolation Stream Query Operator Framework (STI-SQO framework) continuously transforms sensor streams into rasters using a novel set of stream query operators that perform ST-IDW interpolation. A key component of the STI-SQO framework is the 3D, main memory-based, ST Grid Index that enables high performance ST insertion and deletion of massive numbers of sensor observations through Isotropic Time Cell and Time Block-based partitioning. The ST Grid Index facilitates fast ST search for samples using ST shell-based neighborhood search templates, namely the Cylindrical Shell Template and Nested Shell Template. Furthermore, the framework contains the stream-based ST-IDW algorithms ST Shell and ST ak-Shell for high performance, parallel grid cell interpolation. Secondly, the proposed ST Predicate Stream Query Operator Framework (STP-SQO framework) efficiently evaluates value predicates over ST streams of ST continuous phenomena. The framework contains several stream-based predicate evaluation algorithms, including Region-Growing, Tile-based, and Phenomenon-Aware algorithms, that target predicate evaluation to regions with seed points and minimize the number of raster cells that are interpolated when evaluating value predicates. The performance of the proposed frameworks was assessed with regard to prediction accuracy of output results and runtime. The STI-SQO framework achieved a processing throughput of 250,000 observations in 2.5 s with a Normalized Root Mean Square Error under 0.19 using a 500×500 grid. The STP-SQO framework processed over 250,000 observations in under 0.25 s for predicate results covering less than 40% of the observation area, and the Scan Line Region Growing algorithm was consistently the fastest algorithm tested

    Spatial and Temporal Sentiment Analysis of Twitter data

    The public have used Twitter world wide for expressing opinions. This study focuses on spatio-temporal variation of georeferenced Tweets’ sentiment polarity, with a view to understanding how opinions evolve on Twitter over space and time and across communities of users. More specifically, the question this study tested is whether sentiment polarity on Twitter exhibits specific time-location patterns. The aim of the study is to investigate the spatial and temporal distribution of georeferenced Twitter sentiment polarity within the area of 1 km buffer around the Curtin Bentley campus boundary in Perth, Western Australia. Tweets posted in campus were assigned into six spatial zones and four time zones. A sentiment analysis was then conducted for each zone using the sentiment analyser tool in the Starlight Visual Information System software. The Feature Manipulation Engine was employed to convert non-spatial files into spatial and temporal feature class. The spatial and temporal distribution of Twitter sentiment polarity patterns over space and time was mapped using Geographic Information Systems (GIS). Some interesting results were identified. For example, the highest percentage of positive Tweets occurred in the social science area, while science and engineering and dormitory areas had the highest percentage of negative postings. The number of negative Tweets increases in the library and science and engineering areas as the end of the semester approaches, reaching a peak around an exam period, while the percentage of negative Tweets drops at the end of the semester in the entertainment and sport and dormitory area. This study will provide some insights into understanding students and staff ’s sentiment variation on Twitter, which could be useful for university teaching and learning management

    Open source SCADA systems for small renewable power generation

    Low cost monitoring and control is essential for small renewable power systems. While large renewable power systems can use existing commercial technology for monitoring and control, that is not cost-effective for small renewable generation. Such small assets require cost-effective, flexible, secure, and reliable real-time coordinated data monitoring and control systems. Supervisory control and data acquisition (SCADA) is the perfect technology for this task. The available commercial SCADA solutions are mostly pricey and economically unjustifiable for smaller applications. They also pose interoperability issues with the existing components which are often from multiple vendors. Therefore, an open source SCADA system represents the most flexible and the most cost-effective SCADA solution. This thesis has been done in two phases. The first phase demonstrates the design and dynamic simulation of a small hybrid power system with a renewable power generation system as a case study. In the second phase, after an extensive study of the proven commercial SCADA solutions and some open source SCADA packages, three different secure, reliable, low-cost open source SCADA options are developed using the most recent SCADA architecture, the Internet of Things. The implemented prototypes of the three open source SCADA systems were tested extensively with a small renewable power system (a solar PV system). The results show that the developed open source SCADA systems perform optimally and accurately, and could serve as viable options for smaller applications such as renewable generation that cannot afford commercial SCADA solutions

    European Handbook of Crowdsourced Geographic Information

    This book focuses on the study of the remarkable new source of geographic information that has become available in the form of user-generated content accessible over the Internet through mobile and Web applications. The exploitation, integration and application of these sources, termed volunteered geographic information (VGI) or crowdsourced geographic information (CGI), offer scientists an unprecedented opportunity to conduct research on a variety of topics at multiple scales and for diversified objectives. The Handbook is organized in five parts, addressing the fundamental questions: What motivates citizens to provide such information in the public domain, and what factors govern/predict its validity?What methods might be used to validate such information? Can VGI be framed within the larger domain of sensor networks, in which inert and static sensors are replaced or combined by intelligent and mobile humans equipped with sensing devices? What limitations are imposed on VGI by differential access to broadband Internet, mobile phones, and other communication technologies, and by concerns over privacy? How do VGI and crowdsourcing enable innovation applications to benefit human society? Chapters examine how crowdsourcing techniques and methods, and the VGI phenomenon, have motivated a multidisciplinary research community to identify both fields of applications and quality criteria depending on the use of VGI. Besides harvesting tools and storage of these data, research has paid remarkable attention to these information resources, in an age when information and participation is one of the most important drivers of development. The collection opens questions and points to new research directions in addition to the findings that each of the authors demonstrates. Despite rapid progress in VGI research, this Handbook also shows that there are technical, social, political and methodological challenges that require further studies and research

    Statistical Models for Querying and Managing Time-Series Data

    In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice — the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage

    Libro de Actas JCC&BD 2018 : VI Jornadas de Cloud Computing & Big Data

    Se recopilan las ponencias presentadas en las VI Jornadas de Cloud Computing & Big Data (JCC&BD), realizadas entre el 25 al 29 de junio de 2018 en la Facultad de Informática de la Universidad Nacional de La Plata.Universidad Nacional de La Plata (UNLP) - Facultad de Informátic

