23 research outputs found
Modeling Taxi Drivers' Behaviour for the Next Destination Prediction
In this paper, we study how to model taxi drivers' behaviour and geographical
information for an interesting and challenging task: the next destination
prediction in a taxi journey. Predicting the next location is a well studied
problem in human mobility, which finds several applications in real-world
scenarios, from optimizing the efficiency of electronic dispatching systems to
predicting and reducing the traffic jam. This task is normally modeled as a
multiclass classification problem, where the goal is to select, among a set of
already known locations, the next taxi destination. We present a Recurrent
Neural Network (RNN) approach that models the taxi drivers' behaviour and
encodes the semantics of visited locations by using geographical information
from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to
predict the exact coordinates of the next destination, overcoming the problem
of producing, in output, a limited set of locations, seen during the training
phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge
2015 dataset - based on the city of Porto -, obtaining better results with
respect to the competition winner, whilst using less information, and on
Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on
Intelligent Transportation System
Weak nodes detection in urban transport systems: Planning for resilience in Singapore
The availability of massive data-sets describing human mobility offers the
possibility to design simulation tools to monitor and improve the resilience of
transport systems in response to traumatic events such as natural and man-made
disasters (e.g. floods terroristic attacks, etc...). In this perspective, we
propose ACHILLES, an application to model people's movements in a given
transport system mode through a multiplex network representation based on
mobility data. ACHILLES is a web-based application which provides an
easy-to-use interface to explore the mobility fluxes and the connectivity of
every urban zone in a city, as well as to visualize changes in the transport
system resulting from the addition or removal of transport modes, urban zones,
and single stops. Notably, our application allows the user to assess the
overall resilience of the transport network by identifying its weakest node,
i.e. Urban Achilles Heel, with reference to the ancient Greek mythology. To
demonstrate the impact of ACHILLES for humanitarian aid we consider its
application to a real-world scenario by exploring human mobility in Singapore
in response to flood prevention.Comment: 9 pages, 6 figures, IEEE Data Science and Advanced Analytic
Strong and Efficient Baselines for Open Domain Conversational Question Answering
Unlike the Open Domain Question Answering (ODQA) setting, the conversational
(ODConvQA) domain has received limited attention when it comes to reevaluating
baselines for both efficiency and effectiveness. In this paper, we study the
State-of-the-Art (SotA) Dense Passage Retrieval (DPR) retriever and
Fusion-in-Decoder (FiD) reader pipeline, and show that it significantly
underperforms when applied to ODConvQA tasks due to various limitations. We
then propose and evaluate strong yet simple and efficient baselines, by
introducing a fast reranking component between the retriever and the reader,
and by performing targeted finetuning steps. Experiments on two ODConvQA tasks,
namely TopiOCQA and OR-QuAC, show that our method improves the SotA results,
while reducing reader's latency by 60%. Finally, we provide new and valuable
insights into the development of challenging baselines that serve as a
reference for future, more intricate approaches, including those that leverage
Large Language Models (LLMs).Comment: Accepted to EMNLP 2023 Finding
Deep Gravity: enhancing mobility flows generation with deep neural networks and geographic information
The movements of individuals within and among cities influence key aspects of
our society, such as the objective and subjective well-being, the diffusion of
innovations, the spreading of epidemics, and the quality of the environment.
For this reason, there is increasing interest around the challenging problem of
flow generation, which consists in generating the flows between a set of
geographic locations, given the characteristics of the locations and without
any information about the real flows. Existing solutions to flow generation are
mainly based on mechanistic approaches, such as the gravity model and the
radiation model, which suffer from underfitting and overdispersion, neglect
important variables such as land use and the transportation network, and cannot
describe non-linear relationships between these variables. In this paper, we
propose the Multi-Feature Deep Gravity (MFDG) model as an effective solution to
flow generation. On the one hand, the MFDG model exploits a large number of
variables (e.g., characteristics of land use and the road network; transport,
food, and health facilities) extracted from voluntary geographic information
data (OpenStreetMap). On the other hand, our model exploits deep neural
networks to describe complex non-linear relationships between those variables.
Our experiments, conducted on commuting flows in England, show that the MFDG
model achieves a significant increase in the performance (up to 250\% for
highly populated areas) than mechanistic models that do not use deep neural
networks, or that do not exploit geographic voluntary data. Our work presents a
precise definition of the flow generation problem, which is a novel task for
the deep learning community working with spatio-temporal data, and proposes a
deep neural network model that significantly outperforms current
state-of-the-art statistical models
A multi-source dataset of urban life in the city of Milan and the Province of Trentino
The study of socio-technical systems has been revolutionized by the unprecedented amount of digital records that are constantly being produced by human activities such as accessing Internet services, using mobile devices, and consuming energy and knowledge. In this paper, we describe the richest open multi-source dataset ever released on two geographical areas. The dataset is composed of telecommunications, weather, news, social networks and electricity data from the city of Milan and the Province of Trentino. The unique multi-source composition of the dataset makes it an ideal testbed for methodologies and approaches aimed at tackling a wide range of problems including energy consumption, mobility planning, tourist and migrant flows, urban structures and interactions, event detection, urban well-being and many others
LiMoSiNe pipeline: Multilingual UIMA-based NLP platform
We present a robust and efficient parallelizable multilingual UIMA-based platform for automatically annotating textual inputs with different layers of linguistic description, ranging from surface level phenomena all the way down to deep discourse-level information. In particular, given an input text, the pipeline extracts: sentences and tokens; entity mentions; syntactic information; opinionated expressions; relations between entity mentions; co-reference chains and wikified entities. The system is available in two versions: a standalone distribution enables design and optimization of userspecific sub-modules, whereas a server-client distribution allows for straightforward highperformance NLP processing, reducing the engineering cost for higher-level tasks
Machine Learning Methods for Urban Computing
Machine Learning Methods for Urban Computing World population is increasingly moving from rural areas to urban centers, making large cities densely populated. In urban areas, there is greater access to work, a wide variety of options for education and training, ease of transport and the abundance of attractive places within a few kilometers. Across huge cities, people tend to move more and have to do it faster than in the past. On the other hand, heavy traffic (e.g., traffic jams), overbuilding and changes in the urban lifestyle can cause several new problems such as noise, atmospheric pollution (i.e., smog) and severe traffic congestions. However, the rise of novel data sources and machine learning techniques can help to tackle such problems and improve the quality of life of citizens. Indeed, in a smart city environment, the huge amount of data generated daily can be captured by sensors, actuators, and mobile devices. It goes without saying that using such data opens the door to several applications, including forecasting of urban displacements, land use classification and event detection in an urban environment. Motived by these opportunities, Urban Computing (UC) leverages on heterogeneous data sources and applies machine learning techniques to tackle these big challenges that modern cities are facing. In this perspective, one of the core questions when designing UC systems is how to enable models to learn from different urban data sources and thus how to represent urban spaces. The mainstream approach is to represent input objects as feature vectors that encode several aspects of the urban environment such as the presence of people, density of urban activities, and mobility flows. However, this tedious approach of manually feature engineering can be extremely complex, time-consuming and domain-specific dependent. Additionally, it can become even more complex when aggregating multiple geographical data sources such as point-of- interests, administrative boundaries, and mobility data. A valid alternative to feature-based methods is using kernels, which are non-linear functions that map input examples into some high dimensional space allowing for learning more powerful discriminative decision functions. Given a representation of the input object, kernels map it into some high-dimensional space where implicitly a large number of features are generated, allowing for learning robust discriminative functions. In this way the effort for the feature engineering pro- cess can be greatly reduced. Machine Learning Methods for Urban Computing
Kernel methods have been widely applied in Natural Language Processing on tasks such as question answering, semantic role labeling and even for solving linguistic games. Taking inspiration from these successful cases, in this thesis we adapt kernel learning for solving novel tasks in UC. First, we focus on the problem of aggregating multiple urban data sources to provide datasets that fuse knowledge from a wide variety of data sources. Next, we focus on the problem of designing an input structure that is representative of urban space. In particular, we propose to model urban areas with tree structures that are fed to tree kernel functions for automatically generate expressive features. We propose several urban space representations that demonstrated to be very effecting in solving novel urban computing tasks such as land use classification and next location prediction in human mobility. Then, by applying a mining algorithm we enabled the interpretation of urban zones, providing help in the difficult problem of understanding the high-level urban characteristics of a city. In fact, our mined substructures provide help in identifying the different urban nature of cities. Finally, we explore the application of machine learning models to novel urban data sources by solving solve innovative tasks such as predicting the future presence of influenza-like symptoms looking at the people’s mobility behaviors