89,725 research outputs found

    HDBSCAN: Density based Clustering over Location Based Services

    Full text link
    Location Based Services (LBS) have become extremely popular and used by millions of users. Popular LBS run the entire gamut from mapping services (such as Google Maps) to restaurants (such as Yelp) and real-estate (such as Redfin). The public query interfaces of LBS can be abstractly modeled as a kNN interface over a database of two dimensional points: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. Often, k is set to a small value such as 20 or 50. In this paper, we consider the novel problem of enabling density based clustering over an LBS with only a limited, kNN query interface. Due to the query rate limits imposed by LBS, even retrieving every tuple once is infeasible. Hence, we seek to construct a cluster assignment function f(.) by issuing a small number of kNN queries, such that for any given tuple t in the database which may or may not have been accessed, f(.) outputs the cluster assignment of t with high accuracy. We conduct a comprehensive set of experiments over benchmark datasets and popular real-world LBS such as Yahoo! Flickr, Zillow, Redfin and Google Maps

    Identifying Points of Interest and Similar Individuals from Raw GPS Data

    Full text link
    Smartphones and portable devices have become ubiquitous and part of everyone's life. Due to the fact of its portability, these devices are perfect to record individuals' traces and life-logging generating vast amounts of data at low costs. These data is emerging as a new source for studies in human mobility patterns raising the number of research projects and techniques aiming to analyze and retrieve useful information from it. The aim of this paper is to explore GPS raw data from different individuals in a community and apply data mining algorithms to identify meaningful places in a region and describe user's profiles and its similarities. We evaluate the proposed method with a real-world dataset. The experimental results show that the steps performed to identify points of interest (POIs) and further the similarity between the users are quite satisfactory serving as a supplement for urban planning and social networks.Comment: Conference paper at Mobility IoT 2018 - http://mobilityiot2018.eai-conferences.org/full-program

    Spatial Outlier Detection from GSM Mobility Data

    Full text link
    This paper has been withdrawn by the authors. With the rigorous growth of cellular network many mobility datasets are available publically, which attracted researchers to study human mobility fall under spatio-temporal phenomenon. Mobility profile building is main task in spatio-temporal trend analysis which can be extracted from the location information available in the dataset. The location information is usually gathered through the GPS, service provider assisted faux GPS and Cell Global Identity (CGI). Because of high power consumption and extra resource installation requirement in GPS related methods, Cell Global Identity is most inexpensive method and readily available solution for location information. CGI location information is four set head i.e. Mobile country code (MCC), Mobile network code (MNC), Location area code (LAC) and Cell ID, location information is retrieved in form of longitude and latitude coordinates through any of publically available Cell Id databases e.g. Google location API using CGI. However due to of fast growth in GSM network, change in topology by the GSM service provider and technology shift toward 3G exact spatial extraction is somehow a problem in it, so location extraction must dealt with spatial outlier's problem first for mobility building. In this paper we proposed a methodology for the detection of spatial outliers from GSM CGI data, the proposed methodology is hierarchical clustering based and used the basic GSM network architecture properties

    Spatio-Temporal Modeling of Wireless Users Internet Access Patterns Using Self-Organizing Maps

    Full text link
    User online behavior and interests will play a central role in future mobile networks. We introduce a systematic method for large-scale multi-dimensional analysis of online activity for thousands of mobile users across 79 buildings over a variety of web domains. We propose a modeling approach based on self-organizing maps (SOM) for discovering, organizing and visualizing different mobile users' trends from billions of WLAN records. We find surprisingly that users' trends based on domains and locations can be accurately modeled using a self-organizing map with clearly distinct characteristics. We also find many non-trivial correlations between different types of web domains and locations. Based on our analysis, we introduce a mixture model as an initial step towards realistic simulation of wireless network usage

    Taxi demand forecasting: A HEDGE based tessellation strategy for improved accuracy

    Full text link
    A key problem in location-based modeling and forecasting lies in identifying suitable spatial and temporal resolutions. In particular, judicious spatial partitioning can play a significant role in enhancing the performance of location-based forecasting models. In this work, we investigate two widely used tessellation strategies for partitioning city space, in the context of real-time taxi demand forecasting. Our study compares (i) Geohash tessellation, and (ii) Voronoi tessellation, using two distinct taxi demand datasets, over multiple time scales. For the purpose of comparison, we employ classical time-series tools to model the spatio-temporal demand. Our study finds that the performance of each tessellation strategy is highly dependent on the city geography, spatial distribution of the data, and the time of the day, and that neither strategy is found to perform optimally across the forecast horizon. We propose a hybrid tessellation algorithm that picks the best tessellation strategy at each instant, based on their performance in the recent past. Our hybrid algorithm is a non-stationary variant of the well-known HEDGE algorithm for choosing the best advice from multiple experts. We show that the hybrid tessellation strategy performs consistently better than either of the two strategies across the data sets considered, at multiple time scales, and with different performance metrics. We achieve an average accuracy of above 80% per km^2 for both data sets considered at 60 minute aggregation levels.Comment: Under revision in Special Issue on Knowledge Discovery from Mobility Data for Intelligent Transportation Systems (Transactions on ITS

    Functionally Fractal Urban Networks: Geospatial Co-location and Homogeneity of Infrastructure

    Full text link
    Just as natural river networks are known to be globally self-similar, recent research has shown that human-built urban networks, such as road networks, are also functionally self-similar, and have fractal topology with power-law node-degree distributions (p(k) = a k). Here we show, for the first time, that other urban infrastructure networks (sanitary and storm-water sewers), which sustain flows of critical services for urban citizens, also show scale-free functional topologies. For roads and drainage networks, we compared functional topological metrics, derived from high-resolution data (70,000 nodes) for a large US city providing services to about 900,000 citizens over an area of about 1,000 km2. For the whole city and for different sized subnets, we also examined these networks in terms of geospatial co-location (roads and sewers). Our analyses reveal functional topological homogeneity among all the subnets within the city, in spite of differences in several urban attributes. The functional topologies of all subnets of both infrastructure types resemble power-law distributions, with tails becoming increasingly power-law as the subnet area increases. Our findings hold implications for assessing the vulnerability of these critical infrastructure networks to cascading shocks based on spatial interdependency, and for improved design and maintenance of urban infrastructure networks

    Market Mechanism Design for Profitable On-Demand Transport Services

    Full text link
    On-demand transport services in the form of dial-a-ride and taxis are crucial parts of the transport infrastructure in all major cities. However, not all on-demand transport services are equal. In particular, not-for-profit dial-a-ride services with coordinated drivers significantly differ from profit-motivated taxi services with uncoordinated drivers. As such, there are two key threads of research for efficient scheduling, routing, and pricing for passengers: dial-a-ride services (first thread); and taxi services (second thread). Unfortunately, there has been only limited development of algorithms for joint optimization of scheduling, routing, and pricing; largely due to the widespread assumption of fixed pricing. In this paper, we introduce another thread: profit-motivated on-demand transport services with coordinated drivers. To maximize provider profits and the efficiency of the service, we propose a new market mechanism for this new thread of on-demand transport services, where passengers negotiate with the service provider. In contrast to previous work, our mechanism jointly optimizes scheduling, routing, and pricing. Ultimately, we demonstrate that our approach can lead to higher profits, compared with standard fixed price approaches, while maintaining comparable efficiency.Comment: 34 page

    Semantic Place Descriptors for Classification and Map Discovery

    Full text link
    Urban environments develop complex, non-obvious structures that are often hard to represent in the form of maps or guides. Finding the right place to go often requires intimate familiarity with the location in question and cannot easily be deduced by visitors. In this work, we exploit large-scale samples of usage information, in the form of mobile phone traces and geo-tagged Twitter messages in order to automatically explore and annotate city maps via kernel density estimation. Our experiments are based on one year's worth of mobile phone activity collected by Nokia's Mobile Data Challenge (MDC). We show that usage information can be a strong predictor of semantic place categories, allowing us to automatically annotate maps based on the behavior of the local user base.Comment: 13 pages, 1 figure, 1 tabl

    Toward a Distributed Knowledge Discovery system for Grid systems

    Full text link
    During the last decade or so, we have had a deluge of data from not only science fields but also industry and commerce fields. Although the amount of data available to us is constantly increasing, our ability to process it becomes more and more difficult. Efficient discovery of useful knowledge from these datasets is therefore becoming a challenge and a massive economic need. This led to the need of developing large-scale data mining (DM) techniques to deal with these huge datasets either from science or economic applications. In this chapter, we present a new DDM system combining dataset-driven and architecture-driven strategies. Data-driven strategies will consider the size and heterogeneity of the data, while architecture driven will focus on the distribution of the datasets. This system is based on a Grid middleware tools that integrate appropriate large data manipulation operations. Therefore, this allows more dynamicity and autonomicity during the mining, integrating and processing phase

    Analysis of Location Data Leakage in the Internet Traffic of Android-based Mobile Devices

    Full text link
    In recent years we have witnessed a shift towards personalized, context-based applications and services for mobile device users. A key component of many of these services is the ability to infer the current location and predict the future location of users based on location sensors embedded in the devices. Such knowledge enables service providers to present relevant and timely offers to their users and better manage traffic congestion control, thus increasing customer satisfaction and engagement. However, such services suffer from location data leakage which has become one of today's most concerning privacy issues for smartphone users. In this paper we focus specifically on location data that is exposed by Android applications via Internet network traffic in plaintext (i.e., without encryption) without the user's awareness. We present an empirical evaluation, involving the network traffic of real mobile device users, aimed at: (1) measuring the extent of location data leakage in the Internet traffic of Android-based smartphone devices; and (2) understanding the value of this data by inferring users' points of interests (POIs). This was achieved by analyzing the Internet traffic recorded from the smartphones of a group of 71 participants for an average period of 37 days. We also propose a procedure for mining and filtering location data from raw network traffic and utilize geolocation clustering methods to infer users' POIs. The key findings of this research center on the extent of this phenomenon in terms of both ubiquity and severity; we found that over 85\% of devices of users are leaking location data, and the exposure rate of users' POIs, derived from the relatively sparse leakage indicators, is around 61%.Comment: 11 pages, 10 figure
    • …
    corecore