8 research outputs found

    Graphing of E-Science Data with varying user requirements

    Get PDF
    Based on our experience in the Swiss Experiment, exploring experimental, scientific data is often done in a visual way. Starting from a global overview the users are zooming in on interesting events. In case of huge data volumes special data structures have to be introduced to provide fast and easy access to the data. Since it is hard to predict on how users will work with the data a generic approach requires self-adaptation of the required special data structures. In this paper we describe the underlying NP-hard problem and present several approaches to address the problem with varying properties. The approaches are illustrated with a small example and are evaluated with a synthetic data set and user queries

    Semantic Sensor Data Search in a Large-Scale Federated Sensor Network

    Get PDF
    Sensor network deployments are a primary source of massive amounts of data about the real world that surrounds us, measuring a wide range of physical properties in real time. However, in large-scale deployments it becomes hard to eectively exploit the data captured by the sensors, since there is no precise information about what devices are available and what properties they measure. Even when metadata is available, users need to know low-level details such as database schemas or names of properties that are specic to a device or platform. Therefore the task of coherently searching, correlating and combining sensor data becomes very challenging. We propose an ontology-based approach, that consists in exposing sensor observations in terms of ontologies enriched with semantic metadata, providing information such as: which sensor recorded what, where, when, and in which conditions. For this, we allow dening virtual semantic streams, whose ontological terms are related to the underlying sensor data schemas through declarative mappings, and can be queried in terms of a high level sensor network ontology

    Internet of things platform for smart farming: experiences and lessons learnt

    Full text link
    Improving farm productivity is essential for increasing farm profitability and meeting the rapidly growing demand for food that is fuelled by rapid population growth across the world. Farm productivity can be increased by understanding and forecasting crop performance in a variety of environmental conditions. Crop recommendation is currently based on data collected in field-based agricultural studies that capture crop performance under a variety of conditions (e.g., soil quality and environmental conditions). However, crop performance data collection is currently slow, as such crop studies are often undertaken in remote and distributed locations, and such data are typically collected manually. Furthermore, the quality of manually collected crop performance data is very low, because it does not take into account earlier conditions that have not been observed by the human operators but is essential to filter out collected data that will lead to invalid conclusions (e.g., solar radiation readings in the afternoon after even a short rain or overcast in the morning are invalid, and should not be used in assessing crop performance). Emerging Internet of Things (IoT) technologies, such as IoT devices (e.g., wireless sensor networks, network-connected weather stations, cameras, and smart phones) can be used to collate vast amount of environmental and crop performance data, ranging from time series data from sensors, to spatial data from cameras, to human observations collected and recorded via mobile smart phone applications. Such data can then be analysed to filter out invalid data and compute personalised crop recommendations for any specific farm. In this paper, we present the design of SmartFarmNet, an IoT-based platform that can automate the collection of environmental, soil, fertilisation, and irrigation data; automatically correlate such data and filter-out invalid data from the perspective of assessing crop performance; and compute crop forecasts and personalised crop recommendations for any particular farm. SmartFarmNet can integrate virtually any IoT device, including commercially available sensors, cameras, weather stations, etc., and store their data in the cloud for performance analysis and recommendations. An evaluation of the SmartFarmNet platform and our experiences and lessons learnt in developing this system concludes the paper. SmartFarmNet is the first and currently largest system in the world (in terms of the number of sensors attached, crops assessed, and users it supports) that provides crop performance analysis and recommendations

    The EnviDat Concept for an Institutional Environmental Data Portal

    Get PDF
    EnviDat is the environmental data portal developed by the Swiss Federal Institute for Forest, Snow and Landscape Research WSL. The strategic initiative EnviDat highlights the importance WSL lays on Research Data Management (RDM) at the institutional level and demonstrates the commitment to accessible research data in order to advance environmental science. EnviDat focuses on registering and publishing environmental data sets and provides unified and efficient access to the WSL’s comprehensive reservoir of environmental monitoring and research data. Research data management is organized in a decentralized manner where the responsibility to curate research data remains with the experts and the original data providers. EnviDat supports data producers and data users in registration, documentation, storage, publication, search and retrieval of a wide range of heterogeneous data sets from the environmental domain. Innovative features include (i) a flexible, three-layer metadata schema, (ii) an additive data discovery model that considers spatial data and (iii) a DataCRediT mechanism designed for specifying data authorship. In addition, the overall user-friendly appearance in EnviDat provides an important opportunity for showcasing WSL research activities and results. The EnviDat portal builds on a conceptual system consisting of a core system, a set of guiding principles and a number of key services. Its development closely follows the conceptual framework, being guided by principles towards the ultimate goal of providing useful services for researchers

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat.Postprint (published version

    Hierarchical distributed fog-to-cloud data management in smart cities

    Get PDF
    There is a vast amount of data being generated every day in the world with different formats, quality levels, etc. This new data, together with the archived historical data, constitute the seed for future knowledge discovery and value generation in several fields of science and big data environments. Discovering value from data is a complex computing process where data is the key resource, not only during its processing, but also during its entire life cycle. However, there is still a huge concern about how to organize and manage this data in all fields for efficient usage and exploitation during all data life cycles. Although several specific Data LifeCycle (DLC) models have been recently defined for particular scenarios, we argue that there is no global and comprehensive DLC framework to be widely used in different fields. In particular scenario, smart cities are the current technological solutions to handle the challenges and complexity of the growing urban density. Traditionally, Smart City resources management rely on cloud based solutions where sensors data are collected to provide a centralized and rich set of open data. The advantages of cloud-based frameworks are their ubiquity, as well as an (almost) unlimited resources capacity. However, accessing data from the cloud implies large network traffic, high latencies usually not appropriate for real-time or critical solutions, as well as higher security risks. Alternatively, fog computing emerges as a promising technology to absorb these inconveniences. It proposes the use of devices at the edge to provide closer computing facilities and, therefore, reducing network traffic, reducing latencies drastically while improving security. We have defined a new framework for data management in the context of a Smart City through a global fog to cloud resources management architecture. This model has the advantages of both, fog and cloud technologies, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. In this thesis, we propose many novel ideas in the design of a novel F2C Data Management architecture for smart cities as following. First, we draw and describe a comprehensive scenario agnostic Data LifeCycle model successfully addressing all challenges included in the 6Vs not tailored to any specific environment, but easy to be adapted to fit the requirements of any particular field. Then, we introduce the Smart City Comprehensive Data LifeCycle model, a data management architecture generated from a comprehensive scenario agnostic model, tailored for the particular scenario of Smart Cities. We define the management of each data life phase, and explain its implementation on a Smart City with Fog-to-Cloud (F2C) resources management. And then, we illustrate a novel architecture for data management in the context of a Smart City through a global fog to cloud resources management architecture. We show this model has the advantages of both, fog and cloud, as it allows reduced latencies for critical applications while being able to use the high computing capabilities of cloud technology. As a first experiment for the F2C data management architecture, a real Smart City is analyzed, corresponding to the city of Barcelona, with special emphasis on the layers responsible for collecting the data generated by the deployed sensors. The amount of daily sensors data transmitted through the network has been estimated and a rough projection has been made assuming an exhaustive deployment that fully covers all city. And, we provide some solutions to both reduce the data transmission and improve the data management. Then, we used some data filtering techniques (including data aggregation and data compression) to estimate the network traffic in this model during data collection and compare it with a traditional real system. Indeed, we estimate the total data storage sizes through F2C scenario for Barcelona smart citiesAl món es generen diàriament una gran quantitat de dades, amb diferents formats, nivells de qualitat, etc. Aquestes noves dades, juntament amb les dades històriques arxivades, constitueixen la llavor per al descobriment de coneixement i la generació de valor en diversos camps de la ciència i grans entorns de dades (big data). Descobrir el valor de les dades és un procés complex de càlcul on les dades són el recurs clau, no només durant el seu processament, sinó també durant tot el seu cicle de vida. Tanmateix, encara hi ha una gran preocupació per com organitzar i gestionar aquestes dades en tots els camps per a un ús i explotació eficients durant tots els cicles de vida de les dades. Encara que recentment s'han definit diversos models específics de Data LifeCycle (DLC) per a escenaris particulars, argumentem que no hi ha un marc global i complet de DLC que s'utilitzi àmpliament en diferents camps. En particular, les ciutats intel·ligents són les solucions tecnològiques actuals per fer front als reptes i la complexitat de la creixent densitat urbana. Tradicionalment, la gestió de recursos de Smart City es basa en solucions basades en núvol (cloud computing) on es recopilen dades de sensors per proporcionar un conjunt de dades obert i centralitzat. Les avantatges dels entorns basats en núvol són la seva ubiqüitat, així com una capacitat (gairebé) il·limitada de recursos. Tanmateix, l'accés a dades del núvol implica un gran trànsit de xarxa i, en general, les latències elevades no són apropiades per a solucions crítiques o en temps real, així com també per a riscos de seguretat més elevats. Alternativament, el processament de boira (fog computing) sorgeix com una tecnologia prometedora per absorbir aquests inconvenients. Proposa l'ús de dispositius a la vora per proporcionar recuirsos informàtics més propers i, per tant, reduir el trànsit de la xarxa, reduint les latències dràsticament mentre es millora la seguretat. Hem definit un nou marc per a la gestió de dades en el context d'una ciutat intel·ligent a través d'una arquitectura de gestió de recursos des de la boira fins al núvol (Fog-to-Cloud computing, o F2C). Aquest model té els avantatges combinats de les tecnologies de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es poden utilitzar les grans capacitats informàtiques de la tecnologia en núvol. En aquesta tesi, proposem algunes idees noves en el disseny d'una arquitectura F2C de gestió de dades per a ciutats intel·ligents. En primer lloc, dibuixem i descrivim un model de Data LifeCycle global agnòstic que aborda amb èxit tots els reptes inclosos en els 6V i no adaptats a un entorn específic, però fàcil d'adaptar-se als requisits de qualsevol camp en concret. A continuació, presentem el model de Data LifeCycle complet per a una ciutat intel·ligent, una arquitectura de gestió de dades generada a partir d'un model agnòstic d'escenari global, adaptat a l'escenari particular de ciutat intel·ligent. Definim la gestió de cada fase de la vida de les dades i expliquem la seva implementació en una ciutat intel·ligent amb gestió de recursos F2C. I, a continuació, il·lustrem la nova arquitectura per a la gestió de dades en el context d'una Smart City a través d'una arquitectura de gestió de recursos F2C. Mostrem que aquest model té els avantatges d'ambdues, la tecnologia de boira i de núvol, ja que permet reduir les latències per a aplicacions crítiques mentre es pot utilitzar la gran capacitat de processament de la tecnologia en núvol. Com a primer experiment per a l'arquitectura de gestió de dades F2C, s'analitza una ciutat intel·ligent real, corresponent a la ciutat de Barcelona, amb especial èmfasi en les capes responsables de recollir les dades generades pels sensors desplegats. S'ha estimat la quantitat de dades de sensors diàries que es transmet a través de la xarxa i s'ha realitzat una projecció aproximada assumint un desplegament exhaustiu que cobreix tota la ciutat

    Statistical Models for Querying and Managing Time-Series Data

    Get PDF
    In recent years we are experiencing a dramatic increase in the amount of available time-series data. Primary sources of time-series data are sensor networks, medical monitoring, financial applications, news feeds and social networking applications. Availability of large amount of time-series data calls for scalable data management techniques that enable efficient querying and analysis of such data in real-time and archival settings. Often the time-series data generated from sensors (environmental, RFID, GPS, etc.), are imprecise and uncertain in nature. Thus, it is necessary to characterize this uncertainty for producing clean answers. In this thesis we propose methods that address these important issues pertaining to time-series data. Particularly, this thesis is centered around the following three topics: Computing Statistical Measures on Large Time-Series Datasets. Computing statistical measures for large databases of time series is a fundamental primitive for querying and mining time-series data [31, 81, 97, 111, 132, 137]. This primitive is gaining importance with the increasing number and rapid growth of time-series databases. In Chapter 3, we introduce the Affinity framework for efficient computation of statistical measures by exploiting the concept of affine relationships [113, 114]. Affine relationships can be used to infer a large number of statistical measures for time series, from other related time series, instead of computing them directly; thus, reducing the overall computational cost significantly. Moreover, the Affinity framework proposes an unified approach for computing several statistical measures at once. Creating Probabilistic Databases from Imprecise Data. A large amount of time-series data produced in the real-world has an inherent element of uncertainty, arising due to the various sources of imprecision affecting its sources (like, sensor data, GPS trajectories, environmental monitoring data, etc.). The primary sources of imprecision in such data are: imprecise sensors, limited communication bandwidth, sensor failures, etc. Recently there has been an exponential rise in the number of such imprecise sensors, which has led to an explosion of imprecise data. Standard database techniques cannot be used to provide clean and consistent answers in such scenarios. Therefore, probabilistic databases that factor-in the inherent uncertainty and produce clean answers are required. An important assumption i while using probabilistic databases is that each data point has a probability distribution associated with it. This is not true in practice — the distributions are absent. As a solution to this fundamental limitation, in Chapter 4 we propose methods for inferring such probability distributions and using them for efficiently creating probabilistic databases [116]. Managing Participatory Sensing Data. Community-driven participatory sensing is a rapidly evolving paradigm in mobile geo-sensor networks. Here, sensors of various sorts (e.g., multi-sensor units monitoring air quality, cell phones, thermal watches, thermometers in vehicles, etc.) are carried by the community (public vehicles, private vehicles, or individuals) during their daily activities, collecting various types of data about their surrounding. Data generated by these devices is in large quantity, and geographically and temporally skewed. Therefore, it is important that systems designed for managing such data should be aware of these unique data characteristics. In Chapter 5, we propose the ConDense (Community-driven Sensing of the Environment) framework for managing and querying community-sensed data [5, 19, 115]. ConDense exploits spatial smoothness of environmental parameters (like, ambient pollution [5] or radiation [2]) to construct statistical models of the data. Since the number of constructed models is significantly smaller than the original data, we show that using our approach leads to dramatic increase in query processing efficiency [19, 115] and significantly reduces memory usage

    Effective Metadata Management in Federated Sensor Networks

    Get PDF
    Abstract—As sensor networks become increasingly popular, heterogeneous sensor networks are being interconnected into federated sensor networks and provide huge volumes of sensor data to large user communities for a variety of applications. Effective metadata management plays a crucial role in processing and properly interpreting raw sensor measurement data, and needs to be performed in a collaborative fashion. Previous data management work has concentrated on metadata and data as two separate entities and has not provided specific support for joint real-time processing of metadata and sensor data. In this paper we propose a framework that allows effective sensor data and metadata management based on real-time metadata creation and join processing over federated sensor networks. The framework is established on three key mechanisms: (i) distributed metadata joins to allow streaming sensor data to be efficiently processed with their associated metadata, regardless of their location in the network, (ii) automated metadata generation to permit users to define monitoring conditions or operations for extracting and storing metadata from streaming sensor data, (iii) advanced metadata search utilizing various techniques specifically designed for sensor metadata querying and visualization. This framework is currently deployed and used as the backbone of a concrete application in environmental science and engineering, the Swiss Experiment, which runs a wide variety of measurements and experiments for environmental hazard forecasting and warning. I
    corecore