51 research outputs found

    Modeling Taxi Drivers' Behaviour for the Next Destination Prediction

    Full text link
    In this paper, we study how to model taxi drivers' behaviour and geographical information for an interesting and challenging task: the next destination prediction in a taxi journey. Predicting the next location is a well studied problem in human mobility, which finds several applications in real-world scenarios, from optimizing the efficiency of electronic dispatching systems to predicting and reducing the traffic jam. This task is normally modeled as a multiclass classification problem, where the goal is to select, among a set of already known locations, the next taxi destination. We present a Recurrent Neural Network (RNN) approach that models the taxi drivers' behaviour and encodes the semantics of visited locations by using geographical information from Location-Based Social Networks (LBSNs). In particular, RNNs are trained to predict the exact coordinates of the next destination, overcoming the problem of producing, in output, a limited set of locations, seen during the training phase. The proposed approach was tested on the ECML/PKDD Discovery Challenge 2015 dataset - based on the city of Porto -, obtaining better results with respect to the competition winner, whilst using less information, and on Manhattan and San Francisco datasets.Comment: preprint version of a paper submitted to IEEE Transactions on Intelligent Transportation System

    Mining large-scale human mobility data for long-term crime prediction

    Full text link
    Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

    Advances in Public Transport Platform for the Development of Sustainability Cities

    Get PDF
    Modern societies demand high and varied mobility, which in turn requires a complex transport system adapted to social needs that guarantees the movement of people and goods in an economically efficient and safe way, but all are subject to a new environmental rationality and the new logic of the paradigm of sustainability. From this perspective, an efficient and flexible transport system that provides intelligent and sustainable mobility patterns is essential to our economy and our quality of life. The current transport system poses growing and significant challenges for the environment, human health, and sustainability, while current mobility schemes have focused much more on the private vehicle that has conditioned both the lifestyles of citizens and cities, as well as urban and territorial sustainability. Transport has a very considerable weight in the framework of sustainable development due to environmental pressures, associated social and economic effects, and interrelations with other sectors. The continuous growth that this sector has experienced over the last few years and its foreseeable increase, even considering the change in trends due to the current situation of generalized crisis, make the challenge of sustainable transport a strategic priority at local, national, European, and global levels. This Special Issue will pay attention to all those research approaches focused on the relationship between evolution in the area of transport with a high incidence in the environment from the perspective of efficiency

    Distributed Partitioning and Processing of Large Spatial Datasets

    Full text link
    Data collection is one of the most common practices in today’s world. The data collection rate has rapidly increased over the past decade and is not showing any signs of decline. Data sources are many; the Internet of Things devices, mobile gadgets, social media posts, connected cars, and web servers constantly report on their users’ interactions and habits. Much of the collected data is spatial data which contains attributes that denote the physical origin of the data. As a result of the tremendous growth in data collection, higher demand for new techniques emerged to efficiently process and extract valuable insights in a relatively acceptable time frame. The current standard approach to large-scale data analysis uses distributed parallel processing systems like Apache Hadoop and Apache Spark. However, these systems are designed for general-purpose parallel processing and require an additional layer to recognize and efficiently process spatial datasets. Motivated by its many applications, we examine the several challenges facing spatial data partitioning and processing and propose solutions customized for each task. We detail our techniques for building spatial partitioners over large datasets for use with spatial queries like map-matching and kNN spatial join. Additionally, we present an accuracy benchmarking framework for comparing and classifying the results of two input files based on specific criteria. Our proposed work targets batch processing of large spatial datasets, including structured, unstructured, and semi-structured datasets

    Enhancing vehicle destination prediction using latent trajectory information

    Get PDF
    Intelligent transportation systems have the potential to provide road users with a range of useful applications, including vehicle preconditioning, traffic flow management and intelligent parking recommendations. The majority of these applications can benefit from knowledge of vehicle activities (common situations that a vehicle encounters e.g. traffic), along with the upcoming destinations that a vehicle will visit. We focus on the trajectories that vehicles provide, and the data contained within them, in order to ascertain information about the patterns in individuals' mobility data. Machine learning has been used in many different vehicle applications, and we focus on using these techniques to predict the activity of a vehicle and its future destinations. Clustering methods can be applied at the level of trajectories or the individual instances within them, and we explore both of these alternatives in this thesis. Additionally, we explore several classification approaches to predict activities and destinations. In developing our methods, we make use of a combination of both geospatial and temporal data along with on-board vehicle sensor data. This thesis presents novel methods for filtering stay points to identify points of interest and applying destination prediction to vehicle trajectories. Existing methods for stay point detection are not specific to vehicles, and therefore any region of low mobility is potentially considered to be of interest. We propose a novel method for filtering the extracted stay points to identify points of interest, using vehicle data to predict vehicle activities. The predicted activities are further used to represent trajectories as sequences of annotated locations, to inform the detection of similarities between journeys. Finally, this thesis presents a novel method for using additional properties of a trajectory to cluster trajectories into groupings of similar trajectories with the aim of improving the accuracy of destination prediction. We evaluate our proposed methods on a set of vehicle datasets, varying in purpose and the data available

    Searching and mining in enriched geo-spatial data

    Get PDF
    The emergence of new data collection mechanisms in geo-spatial applications paired with a heightened tendency of users to volunteer information provides an ever-increasing flow of data of high volume, complex nature, and often associated with inherent uncertainty. Such mechanisms include crowdsourcing, automated knowledge inference, tracking, and social media data repositories. Such data bearing additional information from multiple sources like probability distributions, text or numerical attributes, social context, or multimedia content can be called multi-enriched. Searching and mining this abundance of information holds many challenges, if all of the data's potential is to be released. This thesis addresses several major issues arising in that field, namely path queries using multi-enriched data, trend mining in social media data, and handling uncertainty in geo-spatial data. In all cases, the developed methods have made significant contributions and have appeared in or were accepted into various renowned international peer-reviewed venues. A common use of geo-spatial data is path queries in road networks where traditional methods optimise results based on absolute and ofttimes singular metrics, i.e., finding the shortest paths based on distance or the best trade-off between distance and travel time. Integrating additional aspects like qualitative or social data by enriching the data model with knowledge derived from sources as mentioned above allows for queries that can be issued to fit a broader scope of needs or preferences. This thesis presents two implementations of incorporating multi-enriched data into road networks. In one case, a range of qualitative data sources is evaluated to gain knowledge about user preferences which is subsequently matched with locations represented in a road network and integrated into its components. Several methods are presented for highly customisable path queries that incorporate a wide spectrum of data. In a second case, a framework is described for resource distribution with reappearance in road networks to serve one or more clients, resulting in paths that provide maximum gain based on a probabilistic evaluation of available resources. Applications for this include finding parking spots. Social media trends are an emerging research area giving insight in user sentiment and important topics. Such trends consist of bursts of messages concerning a certain topic within a time frame, significantly deviating from the average appearance frequency of the same topic. By investigating the dissemination of such trends in space and time, this thesis presents methods to classify trend archetypes to predict future dissemination of a trend. Processing and querying uncertain data is particularly demanding given the additional knowledge required to yield results with probabilistic guarantees. Since such knowledge is not always available and queries are not easily scaled to larger datasets due to the #P-complete nature of the problem, many existing approaches reduce the data to a deterministic representation of its underlying model to eliminate uncertainty. However, data uncertainty can also provide valuable insight into the nature of the data that cannot be represented in a deterministic manner. This thesis presents techniques for clustering uncertain data as well as query processing, that take the additional information from uncertainty models into account while preserving scalability using a sampling-based approach, while previous approaches could only provide one of the two. The given solutions enable the application of various existing clustering techniques or query types to a framework that manages the uncertainty.Das Erscheinen neuer Methoden zur Datenerhebung in räumlichen Applikationen gepaart mit einer erhöhten Bereitschaft der Nutzer, Daten über sich preiszugeben, generiert einen stetig steigenden Fluss von Daten in großer Menge, komplexer Natur, und oft gepaart mit inhärenter Unsicherheit. Beispiele für solche Mechanismen sind Crowdsourcing, automatisierte Wissensinferenz, Tracking, und Daten aus sozialen Medien. Derartige Daten, angereichert mit mit zusätzlichen Informationen aus verschiedenen Quellen wie Wahrscheinlichkeitsverteilungen, Text- oder numerische Attribute, sozialem Kontext, oder Multimediainhalten, werden als multi-enriched bezeichnet. Suche und Datamining in dieser weiten Datenmenge hält viele Herausforderungen bereit, wenn das gesamte Potenzial der Daten genutzt werden soll. Diese Arbeit geht auf mehrere große Fragestellungen in diesem Feld ein, insbesondere Pfadanfragen in multi-enriched Daten, Trend-mining in Daten aus sozialen Netzwerken, und die Beherrschung von Unsicherheit in räumlichen Daten. In all diesen Fällen haben die entwickelten Methoden signifikante Forschungsbeiträge geleistet und wurden veröffentlicht oder angenommen zu diversen renommierten internationalen, von Experten begutachteten Konferenzen und Journals. Ein gängiges Anwendungsgebiet räumlicher Daten sind Pfadanfragen in Straßennetzwerken, wo traditionelle Methoden die Resultate anhand absoluter und oft auch singulärer Maße optimieren, d.h., der kürzeste Pfad in Bezug auf die Distanz oder der beste Kompromiss zwischen Distanz und Reisezeit. Durch die Integration zusätzlicher Aspekte wie qualitativer Daten oder Daten aus sozialen Netzwerken als Anreicherung des Datenmodells mit aus diesen Quellen abgeleitetem Wissen werden Anfragen möglich, die ein breiteres Spektrum an Anforderungen oder Präferenzen erfüllen. Diese Arbeit präsentiert zwei Ansätze, solche multi-enriched Daten in Straßennetze einzufügen. Zum einen wird eine Reihe qualitativer Datenquellen ausgewertet, um Wissen über Nutzerpräferenzen zu generieren, welches darauf mit Örtlichkeiten im Straßennetz abgeglichen und in das Netz integriert wird. Diverse Methoden werden präsentiert, die stark personalisierbare Pfadanfragen ermöglichen, die ein weites Spektrum an Daten mit einbeziehen. Im zweiten Fall wird ein Framework präsentiert, das eine Ressourcenverteilung im Straßennetzwerk modelliert, bei der einmal verbrauchte Ressourcen erneut auftauchen können. Resultierende Pfade ergeben einen maximalen Ertrag basieren auf einer probabilistischen Evaluation der verfügbaren Ressourcen. Eine Anwendung ist die Suche nach Parkplätzen. Trends in sozialen Medien sind ein entstehendes Forscchungsgebiet, das Einblicke in Benutzerverhalten und wichtige Themen zulässt. Solche Trends bestehen aus großen Mengen an Nachrichten zu einem bestimmten Thema innerhalb eines Zeitfensters, so dass die Auftrittsfrequenz signifikant über den durchschnittlichen Level liegt. Durch die Untersuchung der Fortpflanzung solcher Trends in Raum und Zeit präsentiert diese Arbeit Methoden, um Trends nach Archetypen zu klassifizieren und ihren zukünftigen Weg vorherzusagen. Die Anfragebearbeitung und Datamining in unsicheren Daten ist besonders herausfordernd, insbesondere im Hinblick auf das notwendige Zusatzwissen, um Resultate mit probabilistischen Garantien zu erzielen. Solches Wissen ist nicht immer verfügbar und Anfragen lassen sich aufgrund der \P-Vollständigkeit des Problems nicht ohne Weiteres auf größere Datensätze skalieren. Dennoch kann Datenunsicherheit wertvollen Einblick in die Struktur der Daten liefern, der mit deterministischen Methoden nicht erreichbar wäre. Diese Arbeit präsentiert Techniken zum Clustering unsicherer Daten sowie zur Anfragebearbeitung, die die Zusatzinformation aus dem Unsicherheitsmodell in Betracht ziehen, jedoch gleichzeitig die Skalierbarkeit des Ansatzes auf große Datenmengen sicherstellen

    Crowdsensing-driven route optimisation algorithms for smart urban mobility

    Get PDF
    Urban rörlighet anses ofta vara en av de främsta möjliggörarna för en hållbar statsutveckling. Idag skulle det dock kräva ett betydande skifte mot renare och effektivare stadstransporter vilket skulle stödja ökad social och ekonomisk koncentration av resurser i städerna. En viktig prioritet för städer runt om i världen är att stödja medborgarnas rörlighet inom stadsmiljöer medan samtidigt minska trafikstockningar, olyckor och föroreningar. Att utveckla en effektivare och grönare (eller med ett ord; smartare) stadsrörlighet är en av de svåraste problemen att bemöta för stora metropoler. I denna avhandling närmar vi oss problemet från det snabba utvecklingsperspektivet av ITlandskapet i städer vilket möjliggör byggandet av rörlighetslösningar utan stora stora investeringar eller sofistikerad sensortenkik. I synnerhet föreslår vi utnyttjandet av den mobila rörlighetsavkännings, eng. Mobile Crowdsensing (MCS), paradigmen i vilken befolkningen exploaterar sin mobilkommunikation och/eller mobilasensorer med syftet att frivilligt samla, distribuera, lokalt processera och analysera geospecifik information. Rörlighetavkänningssdata (t.ex. händelser, trafikintensitet, buller och luftföroreningar etc.) inhämtad från frivilliga i befolkningen kan ge värdefull information om aktuella rörelsesförhållanden i stad vilka, med adekvata databehandlingsalgoriter, kan användas för att planera människors rörelseflöden inom stadsmiljön. Såtillvida kombineras i denna avhandling två mycket lovande smarta rörlighetsmöjliggörare, eng. Smart Mobility Enablers, nämligen MCS och rese/ruttplanering. Vi kan därmed till viss utsträckning sammanföra forskningsutmaningar från dessa två delar. Vi väljer att separera våra forskningsmål i två delar, dvs forskningssteg: (1) arkitektoniska utmaningar vid design av MCS-system och (2) algoritmiska utmaningar för tillämpningar av MCS-driven ruttplanering. Vi ämnar att visa en logisk forskningsprogression över tiden, med avstamp i mänskligt dirigerade rörelseavkänningssystem som MCS och ett avslut i automatiserade ruttoptimeringsalgoritmer skräddarsydda för specifika MCS-applikationer. Även om vi förlitar oss på heuristiska lösningar och algoritmer för NP-svåra ruttproblem förlitar vi oss på äkta applikationer med syftet att visa på fördelarna med algoritm- och infrastrukturförslagen.La movilidad urbana es considerada una de las principales desencadenantes de un desarrollo urbano sostenible. Sin embargo, hoy en día se requiere una transición hacia un transporte urbano más limpio y más eficiente que soporte una concentración de recursos sociales y económicos cada vez mayor en las ciudades. Una de las principales prioridades para las ciudades de todo el mundo es facilitar la movilidad de los ciudadanos dentro de los entornos urbanos, al mismo tiempo que se reduce la congestión, los accidentes y la contaminación. Sin embargo, desarrollar una movilidad urbana más eficiente y más verde (o en una palabra, más inteligente) es uno de los temas más difíciles de afrontar para las grandes áreas metropolitanas. En esta tesis, abordamos este problema desde la perspectiva de un panorama TIC en rápida evolución que nos permite construir movilidad sin la necesidad de grandes inversiones ni sofisticadas tecnologías de sensores. En particular, proponemos aprovechar el paradigma Mobile Crowdsensing (MCS) en el que los ciudadanos utilizan sus teléfonos móviles y dispositivos, para nosotros recopilar, procesar y analizar localmente información georreferenciada, distribuida voluntariamente. Los datos de movilidad recopilados de ciudadanos que voluntariamente quieren compartirlos (por ejemplo, eventos, intensidad del tráfico, ruido y contaminación del aire, etc.) pueden proporcionar información valiosa sobre las condiciones de movilidad actuales en la ciudad, que con el algoritmo de procesamiento de datos adecuado, pueden utilizarse para enrutar y gestionar el flujo de gente en entornos urbanos. Por lo tanto, en esta tesis combinamos dos prometedoras fuentes de movilidad inteligente: MCS y la planificación de viajes/rutas, uniendo en cierta medida los distintos desafíos de investigación. Hemos dividido nuestros objetivos de investigación en dos etapas: (1) Desafíos arquitectónicos en el diseño de sistemas MCS y (2) Desafíos algorítmicos en la planificación de rutas aprovechando la información del MCS. Nuestro objetivo es demostrar una progresión lógica de la investigación a lo largo del tiempo, comenzando desde los fundamentos de los sistemas de detección centrados en personas, como el MCS, hasta los algoritmos de optimización de rutas diseñados específicamente para la aplicación de estos. Si bien nos centramos en algoritmos y heurísticas para resolver problemas de enrutamiento de clase NP-hard, utilizamos ejemplos de aplicaciones en el mundo real para mostrar las ventajas de los algoritmos e infraestructuras propuestas

    Measuring & Mitigating Electric Vehicle Adoption Barriers

    Get PDF
    Transitioning our cars to run on renewable sources of energy is crucial to addressing concerns over energy security and climate change. Electric vehicles (EVs), vehicles that are fully or partially powered by batteries charged from the electrical grid, allow for such a transition. Specifically, if hydro, solar, and wind generation continues to be integrated into the global power system, we can power an EV-based transportation network cleanly and sustainably. To this end, major car manufacturers are now producing and marketing EVs. Unfortunately, at the time of this writing, drivers are slow to adopt EVs due to a number of concerns. The two greatest concerns are range anxiety—the fear of being stranded without power and the fear that necessary charging infrastructure does not exist—and the unknown return on investment of EVs over their lifetime. This thesis presents computational approaches for measuring and mitigating EV adoption barriers. Towards measuring the barriers to adoption, we build a sentiment analysis system for programmatically mining detailed perceptions towards EVs from ownership forums. In addition, we design the most comprehensive electric bike trial to date, which allows us to study several aspects of electric vehicles, including range anxiety, at a much lower cost. Towards mitigation, we develop algorithms for managing a network of gasoline vehicles to be used by EV owners when a planned trip exceeds the range of their EV. Further, we design a model for taxi companies to compute whether it is profitable to transition a fraction of their fleet to EVs. To summarize our findings, we find that sentiments towards EVs are very positive, especially regarding performance and maintenance, but there are concerns over range anxiety and the higher initial price of EVs. There is a delicate balance between these two adoption barriers. Larger batteries cost more, so alleviating range anxiety with larger batteries leads to pricier vehicles. Conversely, EVs with low range capabilities can also induce costs, because drivers and fleets that own EVs may have to often acquire (or own as an additional vehicle) a gasoline vehicle to fully meet their mobility demands. As a result, EVs are best suited for drivers and fleets that are able to make long-term return on investment calculations, and whose mobility patterns do not include many very long trips. Fleets can greatly reduce their operating costs by adopting EVs because they have the capital to make upfront investments that are profitable long-term. We show that even under conservative assumptions about revenue loss due to battery depletion, EVs are already profitable (the company saves more than enough money to recoup all initial investments) for a large taxi company in San Francisco. Similarly, EVs can be profitable for two-car families (those who already have a gasoline car) and for those who can easily acquire a gasoline vehicle when needed, hence our work on sizing networks of gasoline-vehicle pools for EV owners. Finally, we find that not only are electric bikes and EVs operationally similar, the sentiments towards the two technologies are as well. Advancements made in the battery sector, especially those that reduce costs or weight, are likely to accelerate sales in both markets. The results presented in this thesis, as well as in prior work, suggest that EVs are suitable for many drivers and will hence serve a role in our eventual transition away from fossil fuels
    corecore