832 research outputs found

    Mining open datasets for transparency in taxi transport in metropolitan environments.

    Get PDF
    Uber has recently been introducing novel practices in urban taxi transport. Journey prices can change dynamically in almost real time and also vary geographically from one area to another in a city, a strategy known as surge pricing. In this paper, we explore the power of the new generation of open datasets towards understanding the impact of the new disruption technologies that emerge in the area of public transport. With our primary goal being a more transparent economic landscape for urban commuters, we provide a direct price comparison between Uber and the Yellow Cab company in New York. We discover that Uber, despite its lower standard pricing rates, effectively charges higher fares on average, especially during short in length, but frequent in occurrence, taxi journeys. Building on this insight, we develop a smartphone application, OpenStreetCab, that offers a personalized consultation to mobile users on which taxi provider is cheaper for their journey. Almost five months after its launch, the app has attracted more than three thousand users in a single city. Their journey queries have provided additional insights on the potential savings similar technologies can have for urban commuters, with a highlight being that on average, a user in New York saves 6 U.S. Dollars per taxi journey if they pick the cheapest taxi provider. We run extensive experiments to show how Uber's surge pricing is the driving factor of higher journey prices and therefore higher potential savings for our application's users. Finally, motivated by the observation that Uber's surge pricing is occurring more frequently that intuitively expected, we formulate a prediction task where the aim becomes to predict a geographic area's tendency to surge. Using exogenous to Uber data, in particular Yellow Cab and Foursquare data, we show how it is possible to estimate customer demand within an area, and by extension surge pricing, with high accuracy.This is the final version of the article. It was first available from Springer via http://dx.doi.org/10.1140/epjds/s13688-015-0060-

    Mining large-scale human mobility data for long-term crime prediction

    Full text link
    Traditional crime prediction models based on census data are limited, as they fail to capture the complexity and dynamics of human activity. With the rise of ubiquitous computing, there is the opportunity to improve such models with data that make for better proxies of human presence in cities. In this paper, we leverage large human mobility data to craft an extensive set of features for crime prediction, as informed by theories in criminology and urban studies. We employ averaging and boosting ensemble techniques from machine learning, to investigate their power in predicting yearly counts for different types of crimes occurring in New York City at census tract level. Our study shows that spatial and spatio-temporal features derived from Foursquare venues and checkins, subway rides, and taxi rides, improve the baseline models relying on census and POI data. The proposed models achieve absolute R^2 metrics of up to 65% (on a geographical out-of-sample test set) and up to 89% (on a temporal out-of-sample test set). This proves that, next to the residential population of an area, the ambient population there is strongly predictive of the area's crime levels. We deep-dive into the main crime categories, and find that the predictive gain of the human dynamics features varies across crime types: such features bring the biggest boost in case of grand larcenies, whereas assaults are already well predicted by the census features. Furthermore, we identify and discuss top predictive features for the main crime categories. These results offer valuable insights for those responsible for urban policy or law enforcement

    Exploratory Data Analysis and Data Envelopment Analysis of Urban Rail Transit

    Get PDF
    [Abstract] This paper deals with the efficiency and sustainability of urban rail transit (URT) using exploratory data analytics (EDA) and data envelopment analysis (DEA). The first stage of the proposed methodology is EDA with already available indicators (e.g., the number of stations and passengers), and suggested indicators (e.g., weekly frequencies, link occupancy rates, and CO2 footprint per journey) to directly characterize the efficiency and sustainability of this transport mode. The second stage is to assess the efficiency of URT with two original models, based on a thorough selection of input and output variables, which is one of the key contributions of EDA to this methodology. The first model compares URT against other urban transport modes, applicable to route personalization, and the second scores the efficiency of URT lines. The main outcome of this paper is the proposed methodology, which has been experimentally validated using open data from the Transport for London (TfL) URT network and additional sources.Ministerio de Economía, Industria y Competitividad; TIN2016-75845-PAgencia Estatal de Investigación; SNEO-20161147Xunta de Galicia; ED431G2019/01Xunta de Galicia; ED431C 2017/04Xunta de Galicia; ED431G2019/0

    Graph Input Representations for Machine Learning Applications in Urban Network Analysis

    Get PDF
    Understanding and learning the characteristics of network paths has been of particular interest for decades and has led to several successful applications. Such analysis becomes challenging for urban networks as their size and complexity are significantly higher compared to other networks. The state-of-the-art machine learning (ML) techniques allow us to detect hidden patterns and, thus, infer the features associated with them. However, very little is known about the impact on the performance of such predictive models by the use of different input representations. In this paper, we design and evaluate six different graph input representations (i.e., representations of the network paths), by considering the network's topological and temporal characteristics, for being used as inputs for machine learning models to learn the behavior of urban networks paths. The representations are validated and then tested with a real-world taxi journeys dataset predicting the tips using a road network of New York. Our results demonstrate that the input representations that use temporal information help the model to achieve the highest accuracy (RMSE of 1.42$)

    Big data-driven multimodal traffic management : trends and challenges

    Get PDF

    A Computational Architecture Based on RFID Sensors for Traceability in Smart Cities

    Get PDF
    Information Technology and Communications (ICT) is presented as the main element in order to achieve more efficient and sustainable city resource management, while making sure that the needs of the citizens to improve their quality of life are satisfied. A key element will be the creation of new systems that allow the acquisition of context information, automatically and transparently, in order to provide it to decision support systems. In this paper, we present a novel distributed system for obtaining, representing and providing the flow and movement of people in densely populated geographical areas. In order to accomplish these tasks, we propose the design of a smart sensor network based on RFID communication technologies, reliability patterns and integration techniques. Contrary to other proposals, this system represents a comprehensive solution that permits the acquisition of user information in a transparent and reliable way in a non-controlled and heterogeneous environment. This knowledge will be useful in moving towards the design of smart cities in which decision support on transport strategies, business evaluation or initiatives in the tourism sector will be supported by real relevant information. As a final result, a case study will be presented which will allow the validation of the proposal

    Towards a human-centric data economy

    Get PDF
    Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value), which moves digital companies to hoard and protect their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise the provision of innovative services built upon them. As a result, most of those valuable assets still remain unexploited in corporate silos nowadays. This situation is shaping the so-called data economy around a number of champions, and it is hampering the benefits of a global data exchange on a large scale. Some analysts have estimated the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed to identify relevant cross-industry use cases involving different verticals, and to enable sovereign data exchanges to realise them. Among individuals, the massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection. Apart from spurring recent legislative developments in this direction, this concern has raised some voices warning against the unsustainability of the existing digital economics (few digital champions, potential negative impact on employment, growing inequality), some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma [114, 115, 155]. From a technical perspective, we are far from having the required technology and algorithms that will enable such a human-centric data economy. Even its scope is still blurry, and the question about the value of data, at least, controversial. Research works from different disciplines have studied the data value chain, different approaches to the value of data, how to price data assets, and novel data marketplace designs. At the same time, complex legal and ethical issues with respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets over the Internet. We carry out what is, to the best of our understanding, the most thorough survey of commercial data marketplaces. In this work, we have catalogued and characterised ten different business models, including those of personal information management systems, companies born in the wake of recent data protection regulations and aiming at empowering end users to take control of their data. We have also identified the challenges faced by different types of entities, and what kind of solutions and technology they are using to provide their services. Then we present a first of its kind measurement study that sheds light on the prices of data in the market using a novel methodology. We study how ten commercial data marketplaces categorise and classify data assets, and which categories of data command higher prices. We also develop classifiers for comparing data products across different marketplaces, and we study the characteristics of the most valuable data assets and the features that specific vendors use to set the price of their data products. Based on this information and adding data products offered by other 33 data providers, we develop a regression analysis for revealing features that correlate with prices of data products. As a result, we also implement the basic building blocks of a novel data pricing tool capable of providing a hint of the market price of a new data product using as inputs just its metadata. This tool would provide more transparency on the prices of data products in the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of nascent markets. Next we turn to topics related to data marketplace design. Particularly, we study how buyers can select and purchase suitable data for their tasks without requiring a priori access to such data in order to make a purchase decision, and how marketplaces can distribute payoffs for a data transaction combining data of different sources among the corresponding providers, be they individuals or firms. The difficulty of both problems is further exacerbated in a human-centric data economy where buyers have to choose among data of thousands of individuals, and where marketplaces have to distribute payoffs to thousands of people contributing personal data to a specific transaction. Regarding the selection process, we compare different purchase strategies depending on the level of information available to data buyers at the time of making decisions. A first methodological contribution of our work is proposing a data evaluation stage prior to datasets being selected and purchased by buyers in a marketplace. We show that buyers can significantly improve the performance of the purchasing process just by being provided with a measurement of the performance of their models when trained by the marketplace with individual eligible datasets. We design purchase strategies that exploit such functionality and we call the resulting algorithm Try Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N) - needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that the value of data is different for each individual, and cannot be approximated by its volume. Our results reveal that even more complex approaches based on the “leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value from economics and game theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. However, the Shapley value entails serious computational challenges. Its exact calculation requires repetitively training and evaluating every combination of data sources and hence O(N!) or O(2N) computational time, which is unfeasible for complex models or thousands of individuals. Moreover, our work paves the way to new methods of measuring the value of spatio-temporal data. We identify heuristics such as entropy or similarity to the average that show a significant correlation with the Shapley value and therefore can be used to overcome the significant computational challenges posed by Shapley approximation algorithms in this specific context. We conclude with a number of open issues and propose further research directions that leverage the contributions and findings of this dissertation. These include monitoring data transactions to better measure data markets, and complementing market data with actual transaction prices to build a more accurate data pricing tool. A human-centric data economy would also require that the contributions of thousands of individuals to machine learning tasks are calculated daily. For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff calculation processes in data marketplaces. In that direction, we also point to some alternatives to repetitively training and evaluating a model to select data based on Try Before You Buy and approximate the Shapley value. Finally, we discuss the challenges and potential technologies that help with building a federation of standardised data marketplaces. The data economy will develop fast in the upcoming years, and researchers from different disciplines will work together to unlock the value of data and make the most out of it. Maybe the proposal of getting paid for our data and our contribution to the data economy finally flies, or maybe it is other proposals such as the robot tax that are finally used to balance the power between individuals and tech firms in the digital economy. Still, we hope our work sheds light on the value of data, and contributes to making the price of data more transparent and, eventually, to moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue

    Ride-hailing services: Competition or complement to public transport to reduce accident rates. The case of Madrid

    Get PDF
    Introduction: The transport and mobility sector is experiencing profound transformations. These changes are mainly due to: environmental awareness, the increase in the population of large urban areas and the size of cities, the aging of the population and the emergence of relevant technological innovations that have changed consumption habits, such as electronic commerce or the sharing economy. The introduction of new services such as Uber or Cabify is transforming urban and metropolitan mobility, which has to adapt to this new scenario and the very concept of mobility. Objective: Thus, the purpose of this study was to evaluate whether ride-hailing platforms substitute or complement public transport to reduce accident rates, considering the two basic transport zones of Madrid: “The Central Almond” and the periphery. Methods: The data were collected from the 21 districts of Madrid for the period 2013–2019, and they were analyzed by a Random Effects Negative Binominal model. Results: The results obtained in this study suggest that since the arrival of Uber and Cabify to the municipality of Madrid the number of fatalities and serious injuries in traffic accidents has been reduced. Traffic accidents on weekends and holidays, with at least one serious injury or death, have also been reduced. However, the number of minor injuries has increased in the central districts of Madrid. Conclusion: Overall, what was found in this study supports the hypothesis that these services replace the urban buses. However, these services improve the supply to users with greater difficulties to access taxis or public transport, constituting an alternative mode of transport for high-risk drivers. Therefore, such findings may be quite useful for policy makers to better define regulatory policies for these services.María Flor García is currently developing her doctoral thesis on sharing economy and mobility and received an FPU grant from the University of Alicante: UAFPU2018–028

    Mobile networks and internet of things infrastructures to characterize smart human mobility

    Get PDF
    The evolution of Mobile Networks and Internet of Things (IoT) architectures allows one to rethink the way smart cities infrastructures are designed and managed, and solve a number of problems in terms of human mobility. The territories that adopt the sensoring era can take advantage of this disruptive technology to improve the quality of mobility of their citizens and the rationalization of their resources. However, with this rapid development of smart terminals and infrastructures, as well as the proliferation of diversified applications, even current networks may not be able to completely meet quickly rising human mobility demands. Thus, they are facing many challenges and to cope with these challenges, different standards and projects have been proposed so far. Accordingly, Artificial Intelligence (AI) has been utilized as a new paradigm for the design and optimization of mobile networks with a high level of intelligence. The objective of this work is to identify and discuss the challenges of mobile networks, alongside IoT and AI, to characterize smart human mobility and to discuss some workable solutions to these challenges. Finally, based on this discussion, we propose paths for future smart human mobility researches.This work has been supported by FCT–Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UIDB/00319/2020. This work has also been supported by national funds through FCT–Fundação para a Ciência e Tecnologia through project UIDB/04728/202
    corecore