23 research outputs found

    Modeling Taxi Drivers' Behaviour for the Next Destination Prediction

    Full text link
    In this paper, we study how to model taxi drivers' behaviour and geographical information for an interesting and challenging task: the next destination prediction in a taxi journey. Predicting the next location is a well studied problem in human mobility, which finds several applications in real-world scenarios, from optimizing the efficiency of electronic dispatching systems to predicting and reducing the traffic jam. This task is normally modeled as a multiclass classification problem, where the goal is to select, among a set of already known locations, the next taxi destination. We present a Recurrent Neural Network (RNN) approach that models the taxi drivers' behaviour and encodes the semantics of visited locations by using geographical information from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to predict the exact coordinates of the next destination, overcoming the problem of producing, in output, a limited set of locations, seen during the training phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge 2015 dataset - based on the city of Porto -, obtaining better results with respect to the competition winner, whilst using less information, and on Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on Intelligent Transportation System

    Weak nodes detection in urban transport systems: Planning for resilience in Singapore

    Full text link
    The availability of massive data-sets describing human mobility offers the possibility to design simulation tools to monitor and improve the resilience of transport systems in response to traumatic events such as natural and man-made disasters (e.g. floods terroristic attacks, etc...). In this perspective, we propose ACHILLES, an application to model people's movements in a given transport system mode through a multiplex network representation based on mobility data. ACHILLES is a web-based application which provides an easy-to-use interface to explore the mobility fluxes and the connectivity of every urban zone in a city, as well as to visualize changes in the transport system resulting from the addition or removal of transport modes, urban zones, and single stops. Notably, our application allows the user to assess the overall resilience of the transport network by identifying its weakest node, i.e. Urban Achilles Heel, with reference to the ancient Greek mythology. To demonstrate the impact of ACHILLES for humanitarian aid we consider its application to a real-world scenario by exploring human mobility in Singapore in response to flood prevention.Comment: 9 pages, 6 figures, IEEE Data Science and Advanced Analytic

    Strong and Efficient Baselines for Open Domain Conversational Question Answering

    Full text link
    Unlike the Open Domain Question Answering (ODQA) setting, the conversational (ODConvQA) domain has received limited attention when it comes to reevaluating baselines for both efficiency and effectiveness. In this paper, we study the State-of-the-Art (SotA) Dense Passage Retrieval (DPR) retriever and Fusion-in-Decoder (FiD) reader pipeline, and show that it significantly underperforms when applied to ODConvQA tasks due to various limitations. We then propose and evaluate strong yet simple and efficient baselines, by introducing a fast reranking component between the retriever and the reader, and by performing targeted finetuning steps. Experiments on two ODConvQA tasks, namely TopiOCQA and OR-QuAC, show that our method improves the SotA results, while reducing reader's latency by 60%. Finally, we provide new and valuable insights into the development of challenging baselines that serve as a reference for future, more intricate approaches, including those that leverage Large Language Models (LLMs).Comment: Accepted to EMNLP 2023 Finding

    Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information

    Full text link
    The movements of individuals within and among cities influence key aspects of our society, such as the objective and subjective well-being, the diffusion of innovations, the spreading of epidemics, and the quality of the environment. For this reason, there is increasing interest around the challenging problem of flow generation, which consists in generating the flows between a set of geographic locations, given the characteristics of the locations and without any information about the real flows. Existing solutions to flow generation are mainly based on mechanistic approaches, such as the gravity model and the radiation model, which suffer from underfitting and overdispersion, neglect important variables such as land use and the transportation network, and cannot describe non-linear relationships between these variables. In this paper, we propose the Multi-Feature Deep Gravity (MFDG) model as an effective solution to flow generation. On the one hand, the MFDG model exploits a large number of variables (e.g., characteristics of land use and the road network; transport, food, and health facilities) extracted from voluntary geographic information data (OpenStreetMap). On the other hand, our model exploits deep neural networks to describe complex non-linear relationships between those variables. Our experiments, conducted on commuting flows in England, show that the MFDG model achieves a significant increase in the performance (up to 250\% for highly populated areas) than mechanistic models that do not use deep neural networks, or that do not exploit geographic voluntary data. Our work presents a precise definition of the flow generation problem, which is a novel task for the deep learning community working with spatio-temporal data, and proposes a deep neural network model that significantly outperforms current state-of-the-art statistical models

    A multi-source dataset of urban life in the city of Milan and the Province of Trentino

    Get PDF
    The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others

    LiMoSiNe pipeline: Multilingual UIMA-based NLP platform

    Get PDF
    We present a robust and efficient parallelizable multilingual UIMA-based platform for automatically annotating textual inputs with different layers of linguistic description, ranging from surface level phenomena all the way down to deep discourse-level information. In particular, given an input text, the pipeline extracts: sentences and tokens; entity mentions; syntactic information; opinionated expressions; relations between entity mentions; co-reference chains and wikified entities. The system is available in two versions: a standalone distribution enables design and optimization of userspecific sub-modules, whereas a server-client distribution allows for straightforward highperformance NLP processing, reducing the engineering cost for higher-level tasks

    Machine Learning Methods for Urban Computing

    Get PDF
    Machine Learning Methods for Urban Computing World population is increasingly moving from rural areas to urban centers, making large cities densely populated. In urban areas, there is greater access to work, a wide variety of options for education and training, ease of transport and the abundance of attractive places within a few kilometers. Across huge cities, people tend to move more and have to do it faster than in the past. On the other hand, heavy traffic (e.g., traffic jams), overbuilding and changes in the urban lifestyle can cause several new problems such as noise, atmospheric pollution (i.e., smog) and severe traffic congestions. However, the rise of novel data sources and machine learning techniques can help to tackle such problems and improve the quality of life of citizens. Indeed, in a smart city environment, the huge amount of data generated daily can be captured by sensors, actuators, and mobile devices. It goes without saying that using such data opens the door to several applications, including forecasting of urban displacements, land use classification and event detection in an urban environment. Motived by these opportunities, Urban Computing (UC) leverages on heterogeneous data sources and applies machine learning techniques to tackle these big challenges that modern cities are facing. In this perspective, one of the core questions when designing UC systems is how to enable models to learn from different urban data sources and thus how to represent urban spaces. The mainstream approach is to represent input objects as feature vectors that encode several aspects of the urban environment such as the presence of people, density of urban activities, and mobility flows. However, this tedious approach of manually feature engineering can be extremely complex, time-consuming and domain-specific dependent. Additionally, it can become even more complex when aggregating multiple geographical data sources such as point-of- interests, administrative boundaries, and mobility data. A valid alternative to feature-based methods is using kernels, which are non-linear functions that map input examples into some high dimensional space allowing for learning more powerful discriminative decision functions. Given a representation of the input object, kernels map it into some high-dimensional space where implicitly a large number of features are generated, allowing for learning robust discriminative functions. In this way the effort for the feature engineering pro- cess can be greatly reduced. Machine Learning Methods for Urban Computing Kernel methods have been widely applied in Natural Language Processing on tasks such as question answering, semantic role labeling and even for solving linguistic games. Taking inspiration from these successful cases, in this thesis we adapt kernel learning for solving novel tasks in UC. First, we focus on the problem of aggregating multiple urban data sources to provide datasets that fuse knowledge from a wide variety of data sources. Next, we focus on the problem of designing an input structure that is representative of urban space. In particular, we propose to model urban areas with tree structures that are fed to tree kernel functions for automatically generate expressive features. We propose several urban space representations that demonstrated to be very effecting in solving novel urban computing tasks such as land use classification and next location prediction in human mobility. Then, by applying a mining algorithm we enabled the interpretation of urban zones, providing help in the difficult problem of understanding the high-level urban characteristics of a city. In fact, our mined substructures provide help in identifying the different urban nature of cities. Finally, we explore the application of machine learning models to novel urban data sources by solving solve innovative tasks such as predicting the future presence of influenza-like symptoms looking at the people’s mobility behaviors
    corecore