51 research outputs found
Modeling Taxi Drivers' Behaviour for the Next Destination Prediction
In this paper, we study how to model taxi drivers' behaviour and geographical
information for an interesting and challenging task: the next destination
prediction in a taxi journey. Predicting the next location is a well studied
problem in human mobility, which finds several applications in real-world
scenarios, from optimizing the efficiency of electronic dispatching systems to
predicting and reducing the traffic jam. This task is normally modeled as a
multiclass classification problem, where the goal is to select, among a set of
already known locations, the next taxi destination. We present a Recurrent
Neural Network (RNN) approach that models the taxi drivers' behaviour and
encodes the semantics of visited locations by using geographical information
from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to
predict the exact coordinates of the next destination, overcoming the problem
of producing, in output, a limited set of locations, seen during the training
phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge
2015 dataset - based on the city of Porto -, obtaining better results with
respect to the competition winner, whilst using less information, and on
Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on
Intelligent Transportation System
Mining large-scale human mobility data for long-term crime prediction
Traditional crime prediction models based on census data are limited, as they
fail to capture the complexity and dynamics of human activity. With the rise of
ubiquitous computing, there is the opportunity to improve such models with data
that make for better proxies of human presence in cities. In this paper, we
leverage large human mobility data to craft an extensive set of features for
crime prediction, as informed by theories in criminology and urban studies. We
employ averaging and boosting ensemble techniques from machine learning, to
investigate their power in predicting yearly counts for different types of
crimes occurring in New York City at census tract level. Our study shows that
spatial and spatio-temporal features derived from Foursquare venues and
checkins, subway rides, and taxi rides, improve the baseline models relying on
census and POI data. The proposed models achieve absolute R^2 metrics of up to
65% (on a geographical out-of-sample test set) and up to 89% (on a temporal
out-of-sample test set). This proves that, next to the residential population
of an area, the ambient population there is strongly predictive of the area's
crime levels. We deep-dive into the main crime categories, and find that the
predictive gain of the human dynamics features varies across crime types: such
features bring the biggest boost in case of grand larcenies, whereas assaults
are already well predicted by the census features. Furthermore, we identify and
discuss top predictive features for the main crime categories. These results
offer valuable insights for those responsible for urban policy or law
enforcement
Recommended from our members
Modeling Urban Venue Dynamics through Spatio-Temporal Metrics and Complex Networks
The ubiquity of GPS-enabled devices, mobile applications, and intelligent transportation systems have enabled opportunities to model the world at an unprecedented scale. Urban environments, in particular, have benefited from new data sources that provide granular representations of activities across space and time. As cities experienced a rise in urbanization, they also faced challenges in managing vehicle levels, congestion, and public transportation systems. Modeling these fast-paced changes through rich data from sources such as taxis, bikes, and trains has enabled prediction models capable of characterizing trends and forecasting future changes. Data-driven studies of urban mobility dynamics have been instrumental in helping deliver more contextual services to cities, support urban policy, and inform business decisions. This dissertation explores how novel algorithmic architectures and techniques reveal and predict business trends and urban development patterns.
The research informing this dissertation harnesses principles from network science, modeling cities as connected networks of venues. Building upon a foundation of research in complex network theory, urban computing, and machine learning, we propose algorithms tailored for three computing tasks focused on modeling venue dynamics, characteristics, and trends. First, we predict the demand for newly opened businesses using insights from movement patterns across different regions of the city. Through this analysis we demonstrate how temporally similar areas can be successfully used as inputs to predict the visitation patterns of new venues. Next, we forecast the likelihood of business failure through a supervised learning model. We analyze the value of varying features in predicting business failure and explore their impact across new and established venues and across different cities worldwide. Finally, we present a deep learning architecture which integrates both spatial and topological features to predict the future demand for a venue. These works highlight the power of complex network measures to quantify the structure of a city and inform prediction models.
This dissertation leverages vast amounts of data from spatio-temporal networks to model venue dynamics. The research puts forward evidence to support a data-driven study of geographic systems applied to fundamental questions in urban studies, retail development, and social science.Gates Cambridge Trus
Advances in Public Transport Platform for the Development of Sustainability Cities
Modern societies demand high and varied mobility, which in turn requires a complex transport system adapted to social needs that guarantees the movement of people and goods in an economically efficient and safe way, but all are subject to a new environmental rationality and the new logic of the paradigm of sustainability. From this perspective, an efficient and flexible transport system that provides intelligent and sustainable mobility patterns is essential to our economy and our quality of life. The current transport system poses growing and significant challenges for the environment, human health, and sustainability, while current mobility schemes have focused much more on the private vehicle that has conditioned both the lifestyles of citizens and cities, as well as urban and territorial sustainability. Transport has a very considerable weight in the framework of sustainable development due to environmental pressures, associated social and economic effects, and interrelations with other sectors. The continuous growth that this sector has experienced over the last few years and its foreseeable increase, even considering the change in trends due to the current situation of generalized crisis, make the challenge of sustainable transport a strategic priority at local, national, European, and global levels. This Special Issue will pay attention to all those research approaches focused on the relationship between evolution in the area of transport with a high incidence in the environment from the perspective of efficiency
Distributed Partitioning and Processing of Large Spatial Datasets
Data collection is one of the most common practices in today’s world. The data collection rate has rapidly increased over the past decade and is not showing any signs of decline. Data sources are many; the Internet of Things devices, mobile gadgets, social media posts, connected cars, and web servers constantly report on their users’ interactions and habits. Much of the collected data is spatial data which contains attributes that denote the physical origin of the data. As a result of the tremendous growth in data collection, higher demand for new techniques emerged to efficiently process and extract valuable insights in a relatively acceptable time frame. The current standard approach to large-scale data analysis uses distributed parallel processing systems like Apache Hadoop and Apache Spark. However, these systems are designed for general-purpose parallel processing and require an additional layer to recognize and efficiently process spatial datasets. Motivated by its many applications, we examine the several challenges facing spatial data partitioning and processing and propose solutions customized for each task. We detail our techniques for building spatial partitioners over large datasets for use with spatial queries like map-matching and kNN spatial join. Additionally, we present an accuracy benchmarking framework for comparing and classifying the results of two input files based on specific criteria. Our proposed work targets batch processing of large spatial datasets, including structured, unstructured, and semi-structured datasets
Enhancing vehicle destination prediction using latent trajectory information
Intelligent transportation systems have the potential to provide road users with a range of useful applications, including vehicle preconditioning, traffic flow management and intelligent parking recommendations. The majority of these applications can benefit from knowledge of vehicle activities (common situations that a vehicle encounters e.g. traffic), along with the upcoming destinations that a vehicle will visit. We focus on the trajectories that vehicles provide, and the data contained within them, in order to ascertain information about the patterns in individuals' mobility data.
Machine learning has been used in many different vehicle applications, and we focus on using these techniques to predict the activity of a vehicle and its future destinations. Clustering methods can be applied at the level of trajectories or the individual instances within them, and we explore both of these alternatives in this thesis. Additionally, we explore several classification approaches to predict activities and destinations. In developing our methods, we make use of a combination of both geospatial and temporal data along with on-board vehicle sensor data.
This thesis presents novel methods for filtering stay points to identify points of interest and applying destination prediction to vehicle trajectories. Existing methods for stay point detection are not specific to vehicles, and therefore any region of low mobility is potentially considered to be of interest. We propose a novel method for filtering the extracted stay points to identify points of interest, using vehicle data to predict vehicle activities. The predicted activities are further used to represent trajectories as sequences of annotated locations, to inform the detection of similarities between journeys. Finally, this thesis presents a novel method for using additional properties of a trajectory to cluster trajectories into groupings of similar trajectories with the aim of improving the accuracy of destination prediction. We evaluate our proposed methods on a set of vehicle datasets, varying in purpose and the data available
Searching and mining in enriched geo-spatial data
The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability
distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released.
This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were
accepted into various renowned international peer-reviewed venues.
A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences.
This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its
components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data.
In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots.
Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend.
Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner.
This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a
framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll.
Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals.
Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen.
Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen.
Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen.
Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden,
um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen.
Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen
Crowdsensing-driven route optimisation algorithms for smart urban mobility
Urban rörlighet anses ofta vara en av de främsta möjliggörarna för en hållbar statsutveckling.
Idag skulle det dock kräva ett betydande skifte mot renare och effektivare stadstransporter vilket skulle stödja ökad social och ekonomisk koncentration av resurser i städerna. En viktig prioritet för städer runt om i världen är att stödja medborgarnas rörlighet inom stadsmiljöer medan samtidigt minska trafikstockningar, olyckor och föroreningar. Att utveckla en effektivare och grönare (eller med ett ord; smartare) stadsrörlighet är en av de svåraste problemen att bemöta för stora metropoler. I denna avhandling närmar vi oss problemet från det snabba utvecklingsperspektivet av ITlandskapet i städer vilket möjliggör byggandet av rörlighetslösningar utan stora stora investeringar eller sofistikerad sensortenkik.
I synnerhet föreslår vi utnyttjandet av den mobila rörlighetsavkännings, eng. Mobile Crowdsensing (MCS), paradigmen i vilken befolkningen exploaterar sin mobilkommunikation och/eller mobilasensorer med syftet att frivilligt samla, distribuera, lokalt processera och analysera geospecifik information. Rörlighetavkänningssdata (t.ex. händelser, trafikintensitet, buller och luftföroreningar etc.) inhämtad från frivilliga i befolkningen kan ge värdefull information om aktuella rörelsesförhållanden i stad vilka, med adekvata databehandlingsalgoriter, kan användas för att planera människors
rörelseflöden inom stadsmiljön.
Såtillvida kombineras i denna avhandling två mycket lovande smarta rörlighetsmöjliggörare, eng. Smart Mobility Enablers, nämligen MCS och rese/ruttplanering.
Vi kan därmed till viss utsträckning sammanföra forskningsutmaningar från dessa två delar. Vi väljer att separera våra forskningsmål i två delar, dvs forskningssteg: (1) arkitektoniska utmaningar vid design av MCS-system och (2) algoritmiska utmaningar för tillämpningar av MCS-driven ruttplanering.
Vi ämnar att visa en logisk forskningsprogression över tiden, med avstamp i mänskligt dirigerade rörelseavkänningssystem som MCS och ett avslut i automatiserade ruttoptimeringsalgoritmer
skräddarsydda för specifika MCS-applikationer. Även om vi förlitar oss på heuristiska lösningar och algoritmer för NP-svåra ruttproblem förlitar vi oss på äkta applikationer med syftet att visa på fördelarna med algoritm- och infrastrukturförslagen.La movilidad urbana es considerada una de las principales desencadenantes de un desarrollo urbano sostenible. Sin embargo, hoy en día se requiere una transición hacia un transporte urbano más limpio y más eficiente que soporte una concentración de recursos sociales y económicos cada vez mayor en las ciudades. Una de las principales prioridades para las ciudades de todo el mundo es facilitar la movilidad de los ciudadanos dentro de los entornos urbanos, al mismo tiempo que se reduce la congestión, los accidentes y la contaminación. Sin embargo, desarrollar una movilidad urbana más eficiente y más verde (o en una palabra, más inteligente) es uno de los temas más difíciles de afrontar para las grandes áreas metropolitanas. En esta tesis, abordamos este problema desde la perspectiva de un panorama TIC en rápida evolución que nos permite construir movilidad sin la necesidad de grandes inversiones ni sofisticadas tecnologías de sensores. En particular, proponemos aprovechar el paradigma Mobile Crowdsensing (MCS) en el que los ciudadanos utilizan sus teléfonos móviles y dispositivos, para nosotros recopilar, procesar y analizar localmente información georreferenciada, distribuida voluntariamente. Los datos de movilidad recopilados de ciudadanos que voluntariamente quieren compartirlos (por ejemplo, eventos, intensidad del tráfico, ruido y contaminación del aire, etc.) pueden proporcionar información valiosa sobre las condiciones de movilidad actuales en la ciudad, que con el algoritmo de procesamiento de datos adecuado, pueden utilizarse para enrutar y gestionar el flujo de gente en entornos urbanos. Por lo tanto, en esta tesis combinamos dos prometedoras fuentes de movilidad inteligente: MCS y la planificación de viajes/rutas, uniendo en cierta medida los distintos desafíos de investigación. Hemos dividido nuestros objetivos de investigación en dos etapas: (1) Desafíos arquitectónicos en el diseño de sistemas MCS y (2) Desafíos algorítmicos en la planificación de rutas aprovechando la información del MCS. Nuestro objetivo es demostrar una progresión lógica de la investigación a lo largo del tiempo, comenzando desde los fundamentos de los sistemas de detección centrados en personas, como el MCS, hasta los algoritmos de optimización de rutas diseñados específicamente para la aplicación de estos. Si bien nos centramos en algoritmos y heurísticas para resolver problemas de enrutamiento de clase NP-hard, utilizamos ejemplos de aplicaciones en el mundo real para mostrar las ventajas de los algoritmos e infraestructuras propuestas
Measuring & Mitigating Electric Vehicle Adoption Barriers
Transitioning our cars to run on renewable sources of energy is crucial to addressing concerns over energy security and climate change. Electric vehicles (EVs), vehicles that are fully or partially powered by batteries charged from the electrical grid, allow for such a transition. Specifically, if hydro, solar, and wind generation continues to be integrated into the global power system, we can power an EV-based transportation network cleanly and sustainably.
To this end, major car manufacturers are now producing and marketing EVs. Unfortunately,
at the time of this writing, drivers are slow to adopt EVs due to a number of concerns. The
two greatest concerns are range anxiety—the fear of being stranded without power and
the fear that necessary charging infrastructure does not exist—and the unknown return on
investment of EVs over their lifetime.
This thesis presents computational approaches for measuring and mitigating EV adoption
barriers. Towards measuring the barriers to adoption, we build a sentiment analysis system
for programmatically mining detailed perceptions towards EVs from ownership forums. In
addition, we design the most comprehensive electric bike trial to date, which allows us to
study several aspects of electric vehicles, including range anxiety, at a much lower cost.
Towards mitigation, we develop algorithms for managing a network of gasoline vehicles to
be used by EV owners when a planned trip exceeds the range of their EV. Further, we design
a model for taxi companies to compute whether it is profitable to transition a fraction of
their fleet to EVs.
To summarize our findings, we find that sentiments towards EVs are very positive, especially
regarding performance and maintenance, but there are concerns over range anxiety and the
higher initial price of EVs. There is a delicate balance between these two adoption barriers.
Larger batteries cost more, so alleviating range anxiety with larger batteries leads to pricier
vehicles. Conversely, EVs with low range capabilities can also induce costs, because drivers
and fleets that own EVs may have to often acquire (or own as an additional vehicle) a
gasoline vehicle to fully meet their mobility demands. As a result, EVs are best suited for
drivers and fleets that are able to make long-term return on investment calculations, and
whose mobility patterns do not include many very long trips. Fleets can greatly reduce their
operating costs by adopting EVs because they have the capital to make upfront investments
that are profitable long-term. We show that even under conservative assumptions about
revenue loss due to battery depletion, EVs are already profitable (the company saves more
than enough money to recoup all initial investments) for a large taxi company in San
Francisco. Similarly, EVs can be profitable for two-car families (those who already have a
gasoline car) and for those who can easily acquire a gasoline vehicle when needed, hence
our work on sizing networks of gasoline-vehicle pools for EV owners. Finally, we find that
not only are electric bikes and EVs operationally similar, the sentiments towards the two
technologies are as well. Advancements made in the battery sector, especially those that
reduce costs or weight, are likely to accelerate sales in both markets.
The results presented in this thesis, as well as in prior work, suggest that EVs are suitable
for many drivers and will hence serve a role in our eventual transition away from fossil fuels
- …