276 research outputs found

    A Data-driven Methodology Towards Mobility- and Traffic-related Big Spatiotemporal Data Frameworks

    Get PDF
    Human population is increasing at unprecedented rates, particularly in urban areas. This increase, along with the rise of a more economically empowered middle class, brings new and complex challenges to the mobility of people within urban areas. To tackle such challenges, transportation and mobility authorities and operators are trying to adopt innovative Big Data-driven Mobility- and Traffic-related solutions. Such solutions will help decision-making processes that aim to ease the load on an already overloaded transport infrastructure. The information collected from day-to-day mobility and traffic can help to mitigate some of such mobility challenges in urban areas. Road infrastructure and traffic management operators (RITMOs) face several limitations to effectively extract value from the exponentially growing volumes of mobility- and traffic-related Big Spatiotemporal Data (MobiTrafficBD) that are being acquired and gathered. Research about the topics of Big Data, Spatiotemporal Data and specially MobiTrafficBD is scattered, and existing literature does not offer a concrete, common methodological approach to setup, configure, deploy and use a complete Big Data-based framework to manage the lifecycle of mobility-related spatiotemporal data, mainly focused on geo-referenced time series (GRTS) and spatiotemporal events (ST Events), extract value from it and support decision-making processes of RITMOs. This doctoral thesis proposes a data-driven, prescriptive methodological approach towards the design, development and deployment of MobiTrafficBD Frameworks focused on GRTS and ST Events. Besides a thorough literature review on Spatiotemporal Data, Big Data and the merging of these two fields through MobiTraffiBD, the methodological approach comprises a set of general characteristics, technical requirements, logical components, data flows and technological infrastructure models, as well as guidelines and best practices that aim to guide researchers, practitioners and stakeholders, such as RITMOs, throughout the design, development and deployment phases of any MobiTrafficBD Framework. This work is intended to be a supporting methodological guide, based on widely used Reference Architectures and guidelines for Big Data, but enriched with inherent characteristics and concerns brought about by Big Spatiotemporal Data, such as in the case of GRTS and ST Events. The proposed methodology was evaluated and demonstrated in various real-world use cases that deployed MobiTrafficBD-based Data Management, Processing, Analytics and Visualisation methods, tools and technologies, under the umbrella of several research projects funded by the European Commission and the Portuguese Government.A população humana cresce a um ritmo sem precedentes, particularmente nas áreas urbanas. Este aumento, aliado ao robustecimento de uma classe média com maior poder económico, introduzem novos e complexos desafios na mobilidade de pessoas em áreas urbanas. Para abordar estes desafios, autoridades e operadores de transportes e mobilidade estão a adotar soluções inovadoras no domínio dos sistemas de Dados em Larga Escala nos domínios da Mobilidade e Tráfego. Estas soluções irão apoiar os processos de decisão com o intuito de libertar uma infraestrutura de estradas e transportes já sobrecarregada. A informação colecionada da mobilidade diária e da utilização da infraestrutura de estradas pode ajudar na mitigação de alguns dos desafios da mobilidade urbana. Os operadores de gestão de trânsito e de infraestruturas de estradas (em inglês, road infrastructure and traffic management operators — RITMOs) estão limitados no que toca a extrair valor de um sempre crescente volume de Dados Espaciotemporais em Larga Escala no domínio da Mobilidade e Tráfego (em inglês, Mobility- and Traffic-related Big Spatiotemporal Data —MobiTrafficBD) que estão a ser colecionados e recolhidos. Os trabalhos de investigação sobre os tópicos de Big Data, Dados Espaciotemporais e, especialmente, de MobiTrafficBD, estão dispersos, e a literatura existente não oferece uma metodologia comum e concreta para preparar, configurar, implementar e usar uma plataforma (framework) baseada em tecnologias Big Data para gerir o ciclo de vida de dados espaciotemporais em larga escala, com ênfase nas série temporais georreferenciadas (em inglês, geo-referenced time series — GRTS) e eventos espacio- temporais (em inglês, spatiotemporal events — ST Events), extrair valor destes dados e apoiar os RITMOs nos seus processos de decisão. Esta dissertação doutoral propõe uma metodologia prescritiva orientada a dados, para o design, desenvolvimento e implementação de plataformas de MobiTrafficBD, focadas em GRTS e ST Events. Além de uma revisão de literatura completa nas áreas de Dados Espaciotemporais, Big Data e na junção destas áreas através do conceito de MobiTrafficBD, a metodologia proposta contem um conjunto de características gerais, requisitos técnicos, componentes lógicos, fluxos de dados e modelos de infraestrutura tecnológica, bem como diretrizes e boas práticas para investigadores, profissionais e outras partes interessadas, como RITMOs, com o objetivo de guiá-los pelas fases de design, desenvolvimento e implementação de qualquer pla- taforma MobiTrafficBD. Este trabalho deve ser visto como um guia metodológico de suporte, baseado em Arqui- teturas de Referência e diretrizes amplamente utilizadas, mas enriquecido com as característi- cas e assuntos implícitos relacionados com Dados Espaciotemporais em Larga Escala, como no caso de GRTS e ST Events. A metodologia proposta foi avaliada e demonstrada em vários cenários reais no âmbito de projetos de investigação financiados pela Comissão Europeia e pelo Governo português, nos quais foram implementados métodos, ferramentas e tecnologias nas áreas de Gestão de Dados, Processamento de Dados e Ciência e Visualização de Dados em plataformas MobiTrafficB

    Some Contribution of Statistical Techniques in Big Data: A Review

    Get PDF
    Big Data is a popular topic in research work. Everyone is talking about big data, and it is believed that science, business, industry, government, society etc. will undergo a through change with the impact of big data.Big data is used to refer to very huge data set having large, more complex, hidden pattern, structured and unstructured nature of data with the difficulties to collect, storage, analysing for process or result. So proper advanced techniques to use to gain knowledge about big data. In big data research big challenge is created in storage, process, search, sharing, transfer, analysis and visualizing. To deeply discuss on introduction of big data, issue, management and all used big data techniques. Also in this paper present a review of various advanced statistical techniques to handling the key application of big data have large data set. These advanced techniques handle the structure as well as unstructured big data in different area

    Infraestrutura para análise de tráfego e comportamento de condutores

    Get PDF
    Mestrado em Engenharia de Computadores e TelemáticaO trabalho realizado nesta dissertação pode ser visto como um sistema de apoio à decisão para tráfego. Foi motivado pelos projetos smart cities dos quais os transportes são uma área importante. Com a evolução das tecnologias nas viaturas é possível fazer uma recolha de cada vez mais informação sobre veículos num ambiente real, permitindo assim fazer uma análise mais detalhada sobre o tráfego e comportamento dos condutores. A pesquisa efetuada sobre trabalho relacionado nesta área revelou que muitas das análises efetuadas não tem em consideração o contexto sendo que alguns estudos apontavam integrar fatores influentes na condução como trabalho futuro. Nesta dissertação os conceitos do trabalho relacionado são integrados assim como fontes de dados heterogénias com informação sobre o contexto. Foi também feito um estudo sobre diferentes paradigmas de bases de dados, onde foram estudados os principais paradigmas NoSQL, os seus casos de uso e as sua principais implementações. Esta dissertação tem como objetivo propor o desenho e a implementação de uma infraestrutura para análise de tráfego e comportamento de condutores a partir de dados sobre trajetórias obtidos de viaturas em circulação. Para a prova de conceito, foram efetuados dois casos de estudo com dados extraidos de duas fontes distintas. Um conjunto de ferramentas de extração, transformação e carregamento de dados foi criado para alimentar os data marts desenvolvidos. Ferramentas de visualização foram usadas de modo a poder fazer uma análise visual através de gráficos para as medidas agregadas e software sistemas de informação geográficos para os detalhes espaciais. Esta infraestrutura foi desenhada de modo a poder ser adaptada para diferentes casos de uso da área, desde gestão de transportes públicos até seguros com base em comportamento. Os resultados obtidos permitem estudar o comportamento dos condutores de modo a obter conhecimento nesta área e possivelmente melhorar o tráfego ou a experiência de condução.The work in this dissertation can be seen as a traffic decision support system. It was motivated for the smart cities project which transportation are a major area. With the technology evolution on vehicles it is possible to gather even more information about vehicles in a real scenario, this allows to perform a more detailed analysis about traffic and drivers’ behavior. The research done about related work in this area showed that a lot of the analysis performed did not have into consideration the context, some of this studies even proposed to integrate factors that influence the driving experience in the future. In this dissertation the concepts of the related work are integrated as well as heterogeneous data sources with context information. It was also performed a study about different database paradigms, in which were studied the most relevant NoSQL paradigms, their use cases and most used implementations. This dissertation proposes the design and implementation of a framework for traffic data analysis and drivers’ behavior based on trajectory data gathered from moving vehicles. For the proof of concept, it was performed two different case studies with data extracted from two distinct datasets with vehicles trajectories. A set of tools was developed to extract, transform and load data to the data marts developed. Visualization tools were used in order to perform a visual analysis through charts for aggregate measures and GIS software for the geospatial details. This framework was designed to be adaptable for different application scenarios involving moving vehicles, from public transportation management to behavior based insurance. The achieved results allows the study of traffic and drivers’ behavior in order to obtain knowledge in this area and possibly improve traffic management or the driving experience

    The future of urban models in the Big Data and AI era: a bibliometric analysis (2000-2019)

    Get PDF
    This article questions the effects on urban research dynamics of the Big Data and AI turn in urban management. To identify these effects, we use two complementary materials: bibliometric data and interviews. We consider two areas in urban research: one, covering the academic research dealing with transportation systems and the other, with water systems. First, we measure the evolution of AI and Big Data keywords in these two areas. Second, we measure the evolution of the share of publications published in computer science journals about urban traffic and water quality. To guide these bibliometric analyses, we rely on the content of interviews conducted with academics and higher education officials in Paris and Edinburgh at the beginning of 2018

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA's current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyse

    Urban Transport Evaluation Using Knowledge Extracted from Social Media

    Get PDF
    Atualmente, as redes sociais constituem uma fonte de dados valiosa para vários setores de actividade. No sector da mobilidade, as redes sociais online permitem obter informação em tempo-real a um baixo custo, quando comparado com outros métodos de recolha de informação. Nesta dissertação definiu-se uma metodologia para extrair conhecimento de mensagens extraídas do Twitter para analisar a mobilidade urbana. Esta metodologia foi estruturada em três módulos principais: configuração do sistema, análise de dados e visualização. As mensagens usadas para a demonstração da metodologia proposta foram recolhidas ao longo de dois meses para três cidades distintas: Nova Iorque, Londres e Melbourne. A extração de textos das redes socias e a posterior análise são tarefas muito demoradas devido ao alto volume de dados produzido. Cada mensagem extraída do Twitter é, normalmente, curta, informal e com muita gíria ou erros gramaticais associados. Para tratar estas questões, recorrendo à ferramenta NLTK (Natural Language Toolkit), técnicas de NLP (Natural Language Processing) foram aplicadas para que o texto fosse limpo, adequado e compreendido pelo algoritmo. Para a classificação das mensagens relacionadas com transportes, utilizou-se o modelo BERT (Bidirectional Transformers for Language Understanding) embedding. Trata-se de um modelo não-supervisionado pré-treinado lançado em 2018. No intuito de perceber se um modelo simples pode ter uma boa performance, utilizou-se uma abordagem unigram. Três listas de palavras relacionadas com transportes foram usadas: (i) uma lista pequena de 10 palavras, (ii) uma lista média com 35 palavras, e (iii) uma lista grande com 344 palavras. Os resultados da aplicação deste modelo monstram que este apresenta uma performance elevada, com a precisão e exatidão a registar valores superiores a 0.80 e 0.90, respetivamente. As palavras mais populares são train, walk, street, car, station, street e avenue. Os resultados obtidos são consistentes para as três cidades. Para a avaliação da perceção da opinião pública, as mensagens relacionadas com tráfego foram classificadas quanto ao seu sentimento. Para avaliar a polaridade das mensagens (positivo, neutro ou negativo), utilizou-se a ferramenta VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment. O VADER é uma ferramenta de fácil utilização e com boa compatibilidade com mensagens de redes sociais e textos informais. É baseada em campos lexicais e regras para calcular o valor composto do sentimento de um texto de acordo com as palavras usadas. A metodologia desenvolvida obteve bons resultados de performance na análise de sentimentos. O valor médio da precisão atingiu 0.77 e a exatidão atingiu 0.78. Foi feita uma análise a um evento específico que envolveu um acidente de carro em Nova Iorque no dia 18 de maio de 2017. A análise efetuada, em particular a este dia demonstra que a metodologia é capaz de identificar alterações espaciais e de fluxos de mobilidade indicando quais as potenciais causas da sua origem. O trabalho desenvolvido permitiu concluir que a metodologia proposta pode ser bastante útil para auxiliar gestores de tráfego, planeadores urbanos, investigadores e formuladores de políticas a obter informações sobre as opiniões públicas sobre mobilidade urbana.Public opinion is nowadays a valuable data source for many sectors. Regarding the transportation and mobility sector, it is possible to collect information on real-time with reduced costs compared to other methods of information extraction. In this dissertation, we defined a methodology to extract knowledge from messages collected from Twitter to analyse urban mobility. The methodology was structured according three main modules: system configuration, data analytics and visualization. The messages used for the demonstration of the proposed methodology were extracted during two months for three different cities: New York, London and Melbourne. The text extraction from social media and its analysis are very time-consuming tasks due to the volume of the messages produced. Each message extracted from Twitter is, normally, short, informal and with a lot of slang or misspellings. To deal with that matter, by using NLTK (Natural Language Toolkit) tool, NLP (Natural Language Processing) techniques were applied so the text could be cleared and understandable by the algorithm. For the classification of travel related messages, a BERT (Bidirectional Transformers for Language Understanding) embedding model was used. The model is pre-trained, unsupervised and was released in 2018. In order to understand if a simple model could have good performance, an unigram approach was used. Three lists of travel-related words were used: (i) a small list with 10 traveled-related words, (ii) a medium list with 35 traveled-related words and (iii) a big list with 344 traveled-related words. The results show a high model performance with precision and accuracy higher than 0.80 and 0.90, respectively. Popular words are train, walk, street, car, station, street and avenue. Consistent results were obtained for all the three cities assessed. To evaluate the public opinion, the messages related to transportation and mobility were classified according to its sentiment. Then, to evaluate the polarity of the messages (positive, neutral or negative), VADER (Valence Aware Dictionary and sEntiment Reasone) sentiment tool was used. VADER is an easy tool to use and has great compatibility with social media messages and informal texts. It is a lexicon and rule based tool that calculates the compound value of text emotion according to its words. The developed methodology attained good performance results for the sentiment analysis where the average value of precision scored 0.77 while recall, accuracy and F1-score attained around 0.78. A specific analysis was made regarding a car crash event on New York on May 18, 2017. This analysis demonstrates that the methodology is capable of recognizing spacial changes and mobility flows directing to the potential causes of its origin. The developed work allows the conclusion that the proposed methodology can be very helpful to transport engineers, urban planners, researchers and policymakers in getting insight into public opinions regarding urban mobility

    A Survey on Big Data for Network Traffic Monitoring and Analysis

    Get PDF
    Network Traffic Monitoring and Analysis (NTMA) represents a key component for network management, especially to guarantee the correct operation of large-scale networks such as the Internet. As the complexity of Internet services and the volume of traffic continue to increase, it becomes difficult to design scalable NTMA applications. Applications such as traffic classification and policing require real-time and scalable approaches. Anomaly detection and security mechanisms require to quickly identify and react to unpredictable events while processing millions of heterogeneous events. At last, the system has to collect, store, and process massive sets of historical data for post-mortem analysis. Those are precisely the challenges faced by general big data approaches: Volume, Velocity, Variety, and Veracity. This survey brings together NTMA and big data. We catalog previous work on NTMA that adopt big data approaches to understand to what extent the potential of big data is being explored in NTMA. This survey mainly focuses on approaches and technologies to manage the big NTMA data, additionally briefly discussing big data analytics (e.g., machine learning) for the sake of NTMA. Finally, we provide guidelines for future work, discussing lessons learned, and research directions

    SeLINA: a Self-Learning Insightful Network Analyzer

    Get PDF
    Understanding the behavior of a network from a large scale traffic dataset is a challenging problem. Big data frameworks offer scalable algorithms to extract information from raw data, but often require a sophisticated fine-tuning and a detailed knowledge of machine learning algorithms. To streamline this process, we propose SeLINA (Self-Learning Insightful Network Analyzer), a generic, self-tuning, simple tool to extract knowledge from network traffic measurements. SeLINA includes different data analytics techniques providing self-learning capabilities to state-of-the-art scalable approaches, jointly with parameter auto-selection to off-load the network expert from parameter tuning. We combine both unsupervised and supervised approaches to mine data with a scalable approach. SeLINA embeds mechanisms to check if the new data fits the model, to detect possible changes in the traffic, and to, possibly automatically, trigger model rebuilding. The result is a system that offers human-readable models of the data with minimal user intervention, supporting domain experts in extracting actionable knowledge and highlighting possibly meaningful interpretations. SeLINA’s current implementation runs on Apache Spark. We tested it on large collections of realworld passive network measurements from a nationwide ISP, investigating YouTube and P2P traffic. The experimental results confirmed the ability of SeLINA to provide insights and detect changes in the data that suggest further analyses
    corecore