38 research outputs found

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Quality of Service Aware Data Stream Processing for Highly Dynamic and Scalable Applications

    Get PDF
    Huge amounts of georeferenced data streams are arriving daily to data stream management systems that are deployed for serving highly scalable and dynamic applications. There are innumerable ways at which those loads can be exploited to gain deep insights in various domains. Decision makers require an interactive visualization of such data in the form of maps and dashboards for decision making and strategic planning. Data streams normally exhibit fluctuation and oscillation in arrival rates and skewness. Those are the two predominant factors that greatly impact the overall quality of service. This requires data stream management systems to be attuned to those factors in addition to the spatial shape of the data that may exaggerate the negative impact of those factors. Current systems do not natively support services with quality guarantees for dynamic scenarios, leaving the handling of those logistics to the user which is challenging and cumbersome. Three workloads are predominant for any data stream, batch processing, scalable storage and stream processing. In this thesis, we have designed a quality of service aware system, SpatialDSMS, that constitutes several subsystems that are covering those loads and any mixed load that results from intermixing them. Most importantly, we natively have incorporated quality of service optimizations for processing avalanches of geo-referenced data streams in highly dynamic application scenarios. This has been achieved transparently on top of the codebases of emerging de facto standard best-in-class representatives, thus relieving the overburdened shoulders of the users in the presentation layer from having to reason about those services. Instead, users express their queries with quality goals and our system optimizers compiles that down into query plans with an embedded quality guarantee and leaves logistic handling to the underlying layers. We have developed standard compliant prototypes for all the subsystems that constitutes SpatialDSMS

    Mining Heterogeneous Urban Data at Multiple Granularity Layers

    Get PDF
    The recent development of urban areas and of the new advanced services supported by digital technologies has generated big challenges for people and city administrators, like air pollution, high energy consumption, traffic congestion, management of public events. Moreover, understanding the perception of citizens about the provided services and other relevant topics can help devising targeted actions in the management. With the large diffusion of sensing technologies and user devices, the capability to generate data of public interest within the urban area has rapidly grown. For instance, different sensors networks deployed in the urban area allow collecting a variety of data useful to characterize several aspects of the urban environment. The huge amount of data produced by different types of devices and applications brings a rich knowledge about the urban context. Mining big urban data can provide decision makers with knowledge useful to tackle the aforementioned challenges for a smart and sustainable administration of urban spaces. However, the high volume and heterogeneity of data increase the complexity of the analysis. Moreover, different sources provide data with different spatial and temporal references. The extraction of significant information from such diverse kinds of data depends also on how they are integrated, hence alternative data representations and efficient processing technologies are required. The PhD research activity presented in this thesis was aimed at tackling these issues. Indeed, the thesis deals with the analysis of big heterogeneous data in smart city scenarios, by means of new data mining techniques and algorithms, to study the nature of urban related processes. The problem is addressed focusing on both infrastructural and algorithmic layers. In the first layer, the thesis proposes the enhancement of the current leading techniques for the storage and elaboration of Big Data. The integration with novel computing platforms is also considered to support parallelization of tasks, tackling the issue of automatic scaling of resources. At algorithmic layer, the research activity aimed at innovating current data mining algorithms, by adapting them to novel Big Data architectures and to Cloud computing environments. Such algorithms have been applied to various classes of urban data, in order to discover hidden but important information to support the optimization of the related processes. This research activity focused on the development of a distributed framework to automatically aggregate heterogeneous data at multiple temporal and spatial granularities and to apply different data mining techniques. Parallel computations are performed according to the MapReduce paradigm and exploiting in-memory computing to reach near-linear computational scalability. By exploring manifold data resolutions in a relatively short time, several additional patterns of data can be discovered, allowing to further enrich the description of urban processes. Such framework is suitably applied to different use cases, where many types of data are used to provide insightful descriptive and predictive analyses. In particular, the PhD activity addressed two main issues in the context of urban data mining: the evaluation of buildings energy efficiency from different energy-related data and the characterization of people's perception and interest about different topics from user-generated content on social networks. For each use case within the considered applications, a specific architectural solution was designed to obtain meaningful and actionable results and to optimize the computational performance and scalability of algorithms, which were extensively validated through experimental tests

    Harnessing Innovative Data and Technology to Measure Development Effectiveness

    Get PDF
    In this study, the authors discuss and show how new kinds of digital data and analytics methods and tools falling under the umbrella term of Big Data, including Artificial Intelligence (AI) systems, can help measure development effectiveness. Selected case studies provide examples of assessments of the effectiveness of ODA-funded policies and programmes. They use different data and techniques. For example, analysis of mobile phone data and satellite images: to estimate poverty and inequality, traffic congestion, social cohesion or machine learning approaches to social media analysis to understand social interactions and networks, and natural language processing to study changes in public awareness. A toolkit contains resources and suggestions on key steps and considerations, including legal and ethical, when designing and implementing projects aimed at measuring development effectiveness through new digital data and tools. The chapter closes by describing the core principles and requirements of a vision of a ‘Human AI’, which would reflect and leverage the key features of current narrow AI systems that are able to identify and reinforce the neurons that help them reach their goals. A Human AI would be a data and machine-enabled human system (such as a society) that would seek to continuously learn and adjust to improve—rather than prove after the facts—the effectiveness of its collective actions, including development programming and public policies

    Distributed Partitioning and Processing of Large Spatial Datasets

    Full text link
    Data collection is one of the most common practices in today’s world. The data collection rate has rapidly increased over the past decade and is not showing any signs of decline. Data sources are many; the Internet of Things devices, mobile gadgets, social media posts, connected cars, and web servers constantly report on their users’ interactions and habits. Much of the collected data is spatial data which contains attributes that denote the physical origin of the data. As a result of the tremendous growth in data collection, higher demand for new techniques emerged to efficiently process and extract valuable insights in a relatively acceptable time frame. The current standard approach to large-scale data analysis uses distributed parallel processing systems like Apache Hadoop and Apache Spark. However, these systems are designed for general-purpose parallel processing and require an additional layer to recognize and efficiently process spatial datasets. Motivated by its many applications, we examine the several challenges facing spatial data partitioning and processing and propose solutions customized for each task. We detail our techniques for building spatial partitioners over large datasets for use with spatial queries like map-matching and kNN spatial join. Additionally, we present an accuracy benchmarking framework for comparing and classifying the results of two input files based on specific criteria. Our proposed work targets batch processing of large spatial datasets, including structured, unstructured, and semi-structured datasets

    Real time predictive monitoring system for urban transport

    Get PDF
    Ubiquitous access to mobile and internet technology has influenced a significant increase in the amount of data produced, communicated and stored by corporations as well as by individual users, in recent years. The research presented in this thesis proposes an architectural framework to acquire, store, manipulate and integrate data and information within an urban transport environment, to optimise its operations in real-time. The deployed architecture is based on the integration of a number of technologies and tailor-made algorithms implemented to provide a management tool to aid traffic monitoring, using intelligent decision-making processes. A creative combination of Data Mining techniques and Machine Learning algorithms was used to implement predictive analytics, as a key component in the process of addressing challenges in monitoring and managing an urban transport network operation in real-time. The proposed solution has then been applied to an actual urban transport management system, within a partner company, Mermaid Technology, Copenhagen to test and evaluate the proposed algorithms and the architectural integration principles used. Various visualization methods have been employed, at numerous stages of the project to dynamically interpret the large volume and diversity of data to effectively aid the monitoring and decision-making process. The deliverables on this project include: the system architecture design, as well as software solutions, which facilitate predictive analytics and effective visualisation strategies to aid real-time monitoring of a large system, in the context of urban transport. The proposed solutions have been implemented, tested and evaluated in a Case Study in collaboration with Mermaid Technology. Using live data from their network operations, it has aided in evaluating the efficiency of the proposed system

    Real time predictive monitoring system for urban transport

    Get PDF
    Ubiquitous access to mobile and internet technology has influenced a significant increase in the amount of data produced, communicated and stored by corporations as well as by individual users, in recent years. The research presented in this thesis proposes an architectural framework to acquire, store, manipulate and integrate data and information within an urban transport environment, to optimise its operations in real-time. The deployed architecture is based on the integration of a number of technologies and tailor-made algorithms implemented to provide a management tool to aid traffic monitoring, using intelligent decision-making processes. A creative combination of Data Mining techniques and Machine Learning algorithms was used to implement predictive analytics, as a key component in the process of addressing challenges in monitoring and managing an urban transport network operation in real-time. The proposed solution has then been applied to an actual urban transport management system, within a partner company, Mermaid Technology, Copenhagen to test and evaluate the proposed algorithms and the architectural integration principles used. Various visualization methods have been employed, at numerous stages of the project to dynamically interpret the large volume and diversity of data to effectively aid the monitoring and decision-making process. The deliverables on this project include: the system architecture design, as well as software solutions, which facilitate predictive analytics and effective visualisation strategies to aid real-time monitoring of a large system, in the context of urban transport. The proposed solutions have been implemented, tested and evaluated in a Case Study in collaboration with Mermaid Technology. Using live data from their network operations, it has aided in evaluating the efficiency of the proposed system

    Privacy Preserved Model Based Approaches for Generating Open Travel Behavioural Data

    Get PDF
    Location-aware technologies and smart phones are fast growing in usage and adoption as a medium of service request and delivery of daily activities. However, the increasing usage of these technologies has birthed new challenges that needs to be addressed. Privacy protection and the need for disaggregate mobility data for transportation modelling needs to be balanced for applications and academic research. This dissertation focuses on developing modern privacy mechanisms that seek to satisfy requirements on privacy and data utility for fine-grained travel behavioural modelling applications using large-scale mobility data. To accomplish this, we review the challenges and opportunities that are needed to be solved in order to harness the full potential of “Big Transportation Data”. Also, we perform a quantitative evaluation on the degree of privacy that are provided by popular location anonymization techniques when undertaken on sensitive location data (i.e. homes, offices) of a travel survey. As a step to solve the trade-off between privacy and utility, we develop a differentially-private generative model for simultaneously synthesizing both socio-economic attributes and sequences of activity diary. Adversarial attack models are proposed and tested to evaluate the effectiveness of the proposed system against privacy attacks. The results show that datasets from the developed privacy enhancing system can be used for travel behavioural modelling with satisfactory results while ensuring an acceptable level of privacy
    corecore