15,150 research outputs found
An Investigation of Parallel Road Map Inference from Big GPS Traces Data
AbstractWith the increased use of GPS sensors in several everyday devices, persons trip data are be- coming very abundant. Many opportunities for exploration of the wealth GPS data and in this paper, we inferred, the geometry of road maps in Tunisia and the connectivity between them. This phenomenon is known as map generation and also map inference procedure. For that, we gathered big GPS data from about ten thousands of vehicles equipped with GPS receivers and circulating in Tunisia, which does not have a road map like other developing countries. We collected a big database with approximately 100 gigabytes. After preprocessing it, we were obliged to partition data in order to facilitate handling an unstructured database with a such size. In fact, we used for that K-means with its sequential mode and the parallel mode based on Mapreduce, which is one of the most famous proposed solution to analyse the rapidly growing data. The proposed parallel k-means algorithm was tested with our GPS data and the results are efficient in processing large datasets. It is a parallel data processing tool which is gathering significant importance from industry and academia especially with appearance of a new term to describe massive datasets having large-volume, high-complexity and growing data from different sources, “big data”
A Simple Baseline for Travel Time Estimation using Large-Scale Trip Data
The increased availability of large-scale trajectory data around the world
provides rich information for the study of urban dynamics. For example, New
York City Taxi Limousine Commission regularly releases source-destination
information about trips in the taxis they regulate. Taxi data provide
information about traffic patterns, and thus enable the study of urban flow --
what will traffic between two locations look like at a certain date and time in
the future? Existing big data methods try to outdo each other in terms of
complexity and algorithmic sophistication. In the spirit of "big data beats
algorithms", we present a very simple baseline which outperforms
state-of-the-art approaches, including Bing Maps and Baidu Maps (whose APIs
permit large scale experimentation). Such a travel time estimation baseline has
several important uses, such as navigation (fast travel time estimates can
serve as approximate heuristics for A search variants for path finding) and
trip planning (which uses operating hours for popular destinations along with
travel time estimates to create an itinerary).Comment: 12 page
Discovering private trajectories using background information
Trajectories are spatio-temporal traces of moving objects which contain valuable information to be harvested by spatio-temporal data mining techniques. Applications like city traffic planning, identification of evacuation routes, trend detection, and many more can benefit from trajectory mining. However, the trajectories of individuals often contain private and sensitive information, so anyone who possess trajectory data must take special care when disclosing this data. Removing identifiers from trajectories before the release is not effective against linkage type attacks, and rich sources of background information make it even worse. An alternative is to apply transformation techniques to map the given set of trajectories into another set where the distances are preserved. This way, the actual trajectories are not released, but the distance information can still be used for data mining techniques such as clustering. In this paper, we show that an unknown private trajectory can be reconstructed using the available background information together with the mutual distances released for data mining purposes. The background knowledge is in the form of known trajectories and extra information such as the speed limit. We provide analytical results which bound the number of the known trajectories needed to reconstruct private trajectories. Experiments performed on real trajectory data sets show that the number of known samples is surprisingly smaller than the actual theoretical bounds
FogGIS: Fog Computing for Geospatial Big Data Analytics
Cloud Geographic Information Systems (GIS) has emerged as a tool for
analysis, processing and transmission of geospatial data. The Fog computing is
a paradigm where Fog devices help to increase throughput and reduce latency at
the edge of the client. This paper developed a Fog-based framework named Fog
GIS for mining analytics from geospatial data. We built a prototype using Intel
Edison, an embedded microprocessor. We validated the FogGIS by doing
preliminary analysis. including compression, and overlay analysis. Results
showed that Fog computing hold a great promise for analysis of geospatial data.
We used several open source compression techniques for reducing the
transmission to the cloud.Comment: 6 pages, 4 figures, 1 table, 3rd IEEE Uttar Pradesh Section
International Conference on Electrical, Computer and Electronics (09-11
December, 2016) Indian Institute of Technology (Banaras Hindu University)
Varanasi, Indi
Adapted K-Nearest Neighbors for Detecting Anomalies on Spatio–Temporal Traffic Flow
Outlier detection is an extensive research area, which has been intensively studied in several domains such as biological sciences, medical diagnosis, surveillance, and traffic anomaly detection. This paper explores advances in the outlier detection area by finding anomalies in spatio-temporal urban traffic flow. It proposes a new approach by considering the distribution of the flows in a given time interval. The flow distribution probability (FDP) databases are first constructed from the traffic flows by considering both spatial and temporal information. The outlier detection mechanism is then applied to the coming flow distribution probabilities, the inliers are stored to enrich the FDP databases, while the outliers are excluded from the FDP databases. Moreover, a k-nearest neighbor for distance-based outlier detection is investigated and adopted for FDP outlier detection. To validate the proposed framework, real data from Odense traffic flow case are evaluated at ten locations. The results reveal that the proposed framework is able to detect the real distribution of flow outliers. Another experiment has been carried out on Beijing data, the results show that our approach outperforms the baseline algorithms for high-urban traffic flow
- …