404 research outputs found

    Generating public transport data based on population distributions for RDF benchmarking

    Get PDF
    When benchmarking RDF data management systems such as public transport route planners, system evaluation needs to happen under various realistic circumstances, which requires a wide range of datasets with different properties. Real-world datasets are almost ideal, as they offer these realistic circumstances, but they are often hard to obtain and inflexible for testing. For these reasons, synthetic dataset generators are typically preferred over real-world datasets due to their intrinsic flexibility. Unfortunately, many synthetic dataset that are generated within benchmarks are insufficiently realistic, raising questions about the generalizability of benchmark results to real-world scenarios. In order to benchmark geospatial and temporal RDF data management systems such as route planners with sufficient external validity and depth, we designed PODiGG, a highly configurable generation algorithm for synthetic public transport datasets with realistic geospatial and temporal characteristics comparable to those of their real-world variants. The algorithm is inspired by real-world public transit network design and scheduling methodologies. This article discusses the design and implementation of PODiGG and validates the properties of its generated datasets. Our findings show that the generator achieves a sufficient level of realism, based on the existing coherence metric and new metrics we introduce specifically for the public transport domain. Thereby, PODiGG provides a flexible foundation for benchmarking RDF data management systems with geospatial and temporal data

    Sharing Human-Generated Observations by Integrating HMI and the Semantic Sensor Web

    Get PDF
    Current “Internet of Things” concepts point to a future where connected objects gather meaningful information about their environment and share it with other objects and people. In particular, objects embedding Human Machine Interaction (HMI), such as mobile devices and, increasingly, connected vehicles, home appliances, urban interactive infrastructures, etc., may not only be conceived as sources of sensor information, but, through interaction with their users, they can also produce highly valuable context-aware human-generated observations. We believe that the great promise offered by combining and sharing all of the different sources of information available can be realized through the integration of HMI and Semantic Sensor Web technologies. This paper presents a technological framework that harmonizes two of the most influential HMI and Sensor Web initiatives: the W3C’s Multimodal Architecture and Interfaces (MMI) and the Open Geospatial Consortium (OGC) Sensor Web Enablement (SWE) with its semantic extension, respectively. Although the proposed framework is general enough to be applied in a variety of connected objects integrating HMI, a particular development is presented for a connected car scenario where drivers’ observations about the traffic or their environment are shared across the Semantic Sensor Web. For implementation and evaluation purposes an on-board OSGi (Open Services Gateway Initiative) architecture was built, integrating several available HMI, Sensor Web and Semantic Web technologies. A technical performance test and a conceptual validation of the scenario with potential users are reported, with results suggesting the approach is soun

    Managing big, linked, and open earth-observation data: Using the TELEIOS/LEO software stack

    Get PDF
    Big Earth-observation (EO) data that are made freely available by space agencies come from various archives. Therefore, users trying to develop an application need to search within these archives, discover the needed data, and integrate them into their application. In this article, we argue that if EO data are published using the linked data paradigm, then the data discovery, data integration, and development of applications becomes easier. We present the life cycle of big, linked, and open EO data and show how to support their various stages using the software stack developed by the European Union (EU) research projects TELEIOS and the Linked Open EO Data for Precision Farming (LEO). We also show how this stack of tools can be used to implement an operational wildfire-monitoring service

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Storing and querying evolving knowledge graphs on the web

    Get PDF

    WW1LOD: an application of CIDOC-CRM to World War 1 linked data

    Get PDF
    The CIDOC-CRM standard indicates that common events, actors, places and timeframes are important in linking together cultural material, and provides a framework for describing them. However, merely describing entities in this way in two datasets does not yet interlink them. To do that, the identities of instances still need to be either reconciled, or be based on a shared vocabulary. The WW1LOD dataset presented in this paper was created to facilitate both of these approaches for collections dealing with the First World War. For this purpose, the dataset includes events, places, agents, times, keywords, and themes related to the war, based on over ten different authoritative data sources from providers such as the Imperial War Museum. The content is harmonized into RDF, and published as a Linked Open Data service. While generally basing on CIDOC-CRM, some modeling choices used also deviate from it where our experience dictated such. In the article, these deviations are discussed in the hope that they may serve as examples where CIDOC-CRM itself may warrant further examination. As a demonstration of use, the dataset and online service have been used to create a contextual reader application that is able link together and pull in information related to WW1 from e.g. 1914–1918 Online, Wikipedia, WW1 Discovery, Europeana and the Digital Public Library of America

    Geospatial queries on data collection using a common provenance model

    Get PDF
    Altres ajuts: Xavier Pons is the recipient of an ICREA Academia Excellence in Research Grant (2016-2020)Lineage information is the part of the metadata that describes "what", "when", "who", "how", and "where" geospatial data were generated. If it is well-presented and queryable, lineage becomes very useful information for inferring data quality, tracing error sources and increasing trust in geospatial information. In addition, if the lineage of a collection of datasets can be related and presented together, datasets, process chains, and methodologies can be compared. This paper proposes extending process step lineage descriptions into four explicit levels of abstraction (process run, tool, algorithm and functionality). Including functionalities and algorithm descriptions as a part of lineage provides high-level information that is independent from the details of the software used. Therefore, it is possible to transform lineage metadata that is initially documenting specific processing steps into a reusable workflow that describes a set of operations as a processing chain. This paper presents a system that provides lineage information as a service in a distributed environment. The system is complemented by an integrated provenance web application that is capable of visualizing and querying a provenance graph that is composed by the lineage of a collection of datasets. The International Organization for Standardization (ISO) 19115 standards family with World Wide Web Consortium (W3C) provenance initiative (W3C PROV) were combined in order to integrate provenance of a collection of datasets. To represent lineage elements, the ISO 19115-2 lineage class names were chosen, because they express the names of the geospatial objects that are involved more precisely. The relationship naming conventions of W3C PROV are used to represent relationships among these elements. The elements and relationships are presented in a queryable graph

    Towards Mobility Data Science (Vision Paper)

    Full text link
    Mobility data captures the locations of moving objects such as humans, animals, and cars. With the availability of GPS-equipped mobile devices and other inexpensive location-tracking technologies, mobility data is collected ubiquitously. In recent years, the use of mobility data has demonstrated significant impact in various domains including traffic management, urban planning, and health sciences. In this paper, we present the emerging domain of mobility data science. Towards a unified approach to mobility data science, we envision a pipeline having the following components: mobility data collection, cleaning, analysis, management, and privacy. For each of these components, we explain how mobility data science differs from general data science, we survey the current state of the art and describe open challenges for the research community in the coming years.Comment: Updated arXiv metadata to include two authors that were missing from the metadata. PDF has not been change

    Filling Gaps in Trawl Surveys at Sea through Spatiotemporal and Environmental Modelling

    Get PDF
    International scientific fishery survey programmes systematically collect samples of target stocks’ biomass and abundance and use them as the basis to estimate stock status in the framework of stock assessment models. The research surveys can also inform decision makers about Essential Fish Habitat conservation and help define harvest control rules based on direct observation of biomass at the sea. However, missed survey locations over the survey years are common in long-term programme data. Currently, modelling approaches to filling gaps in spatiotemporal survey data range from quickly applicable solutions to complex modelling. Most models require setting prior statistical assumptions on spatial distributions, assuming short-term temporal dependency between the data, and scarcely considering the environmental aspects that might have influenced stock presence in the missed locations. This paper proposes a statistical and machine learning based model to fill spatiotemporal gaps in survey data and produce robust estimates for stock assessment experts, decision makers, and regional fisheries management organizations. We apply our model to the SoleMon survey data in North-Central Adriatic Sea (Mediterranean Sea) for 4 stocks: Sepia officinalis, Solea solea, Squilla mantis, and Pecten jacobaeus. We reconstruct the biomass-index (i.e., biomass over the swept area) of 10 locations missed in 2020 (out of the 67 planned) because of several factors, including COVID-19 pandemic related restrictions. We evaluate model performance on 2019 data with respect to an alternative index that assumes biomass proportion consistency over time. Our model’s novelty is that it combines three complementary components. A spatial component estimates stock biomass-index in the missed locations in one year, given the surveyed location’s biomass-index distribution in the same year. A temporal component forecasts, for each missed survey location, biomass-index given the data history of that haul. An environmental component estimates a biomass-index weighting factor based on the environmental suitability of the haul area to species presence. Combining these components allows understanding the interplay between environmental-change drivers, stock presence, and fisheries. Our model formulation is general enough to be applied to other survey data with lower spatial homogeneity and more temporal gaps than the SoleMon dataset

    Assessing spatiotemporal predictability of LBSN : a case study of three Foursquare datasets

    Get PDF
    Location-based social networks (LBSN) have provided new possibilities for researchers to gain knowledge about human spatiotemporal behavior, and to make predictions about how people might behave through space and time in the future. An important requirement of successfully utilizing LBSN in these regards is a thorough understanding of the respective datasets, including their inherent potential as well as their limitations. Specifically, when it comes to predictions, we must know what we can actually expect from the data, and how we could maximize their usefulness. Yet, this knowledge is still largely lacking from the literature. Hence, this work explores one particular aspect which is the theoretical predictability of LBSN datasets. The uncovered predictability is represented with an interval. The lower bound of the interval corresponds to the amount of regular behaviors that can easily be anticipated, and represents the correct predication rate that any algorithm should be able to achieve. The upper bound corresponds to the amount of information that is contained in the dataset, and represents the maximum correct prediction rate that cannot be exceeded by any algorithms. Three Foursquare datasets from three American cities are studied as an example. It is found that, within our investigated datasets, the lower bound of predictability of the human spatiotemporal behavior is 27%, and the upper bound is 92%. Hence, the inherent potentials of the dataset for predicting human spatiotemporal behavior are clarified, and the revealed interval allows a realistic assessment of the quality of predictions and thus of associated algorithms. Additionally, in order to provide further insight into the practical use of the dataset, the relationship between the predictability and the check-in frequencies are investigated from three different perspectives. It was found that the individual perspective provides no significant correlations between the predictability and the check-in frequency. In contrast, the same two quantities are found to be negatively correlated from temporal and spatial perspectives. Our study further indicates that the heavily frequented contexts and some extraordinary geographic features such as airports could be good starting points for effective improvements of prediction algorithms. In general, this research provides novel knowledge regarding the nature of the LBSN dataset and practical insights for a more reasonable utilization of the dataset
    corecore