15,150 research outputs found

    An Investigation of Parallel Road Map Inference from Big GPS Traces Data

    Get PDF
    AbstractWith the increased use of GPS sensors in several everyday devices, persons trip data are be- coming very abundant. Many opportunities for exploration of the wealth GPS data and in this paper, we inferred, the geometry of road maps in Tunisia and the connectivity between them. This phenomenon is known as map generation and also map inference procedure. For that, we gathered big GPS data from about ten thousands of vehicles equipped with GPS receivers and circulating in Tunisia, which does not have a road map like other developing countries. We collected a big database with approximately 100 gigabytes. After preprocessing it, we were obliged to partition data in order to facilitate handling an unstructured database with a such size. In fact, we used for that K-means with its sequential mode and the parallel mode based on Mapreduce, which is one of the most famous proposed solution to analyse the rapidly growing data. The proposed parallel k-means algorithm was tested with our GPS data and the results are efficient in processing large datasets. It is a parallel data processing tool which is gathering significant importance from industry and academia especially with appearance of a new term to describe massive datasets having large-volume, high-complexity and growing data from different sources, “big data”

    A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data

    Full text link
    The increased availability of large-scale trajectory data around the world provides rich information for the study of urban dynamics. For example, New York City Taxi Limousine Commission regularly releases source-destination information about trips in the taxis they regulate. Taxi data provide information about traffic patterns, and thus enable the study of urban flow -- what will traffic between two locations look like at a certain date and time in the future? Existing big data methods try to outdo each other in terms of complexity and algorithmic sophistication. In the spirit of "big data beats algorithms", we present a very simple baseline which outperforms state-of-the-art approaches, including Bing Maps and Baidu Maps (whose APIs permit large scale experimentation). Such a travel time estimation baseline has several important uses, such as navigation (fast travel time estimates can serve as approximate heuristics for A search variants for path finding) and trip planning (which uses operating hours for popular destinations along with travel time estimates to create an itinerary).Comment: 12 page

    Data analytics 2016: proceedings of the fifth international conference on data analytics

    Get PDF

    Discovering private trajectories using background information

    Get PDF
    Trajectories are spatio-temporal traces of moving objects which contain valuable information to be harvested by spatio-temporal data mining techniques. Applications like city traffic planning, identification of evacuation routes, trend detection, and many more can benefit from trajectory mining. However, the trajectories of individuals often contain private and sensitive information, so anyone who possess trajectory data must take special care when disclosing this data. Removing identifiers from trajectories before the release is not effective against linkage type attacks, and rich sources of background information make it even worse. An alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved. This way, the actual trajectories are not released, but the distance information can still be used for data mining techniques such as clustering. In this paper, we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes. The background knowledge is in the form of known trajectories and extra information such as the speed limit. We provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories. Experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds

    FogGIS: Fog Computing for Geospatial Big Data Analytics

    Full text link
    Cloud Geographic Information Systems (GIS) has emerged as a tool for analysis, processing and transmission of geospatial data. The Fog computing is a paradigm where Fog devices help to increase throughput and reduce latency at the edge of the client. This paper developed a Fog-based framework named Fog GIS for mining analytics from geospatial data. We built a prototype using Intel Edison, an embedded microprocessor. We validated the FogGIS by doing preliminary analysis. including compression, and overlay analysis. Results showed that Fog computing hold a great promise for analysis of geospatial data. We used several open source compression techniques for reducing the transmission to the cloud.Comment: 6 pages, 4 figures, 1 table, 3rd IEEE Uttar Pradesh Section International Conference on Electrical, Computer and Electronics (09-11 December, 2016) Indian Institute of Technology (Banaras Hindu University) Varanasi, Indi

    Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow

    Get PDF
    Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow
    corecore