32,164 research outputs found
Spaceprint: a Mobility-based Fingerprinting Scheme for Public Spaces
In this paper, we address the problem of how automated situation-awareness
can be achieved by learning real-world situations from ubiquitously generated
mobility data. Without semantic input about the time and space where situations
take place, this turns out to be a fundamental challenging problem.
Uncertainties also introduce technical challenges when data is generated in
irregular time intervals, being mixed with noise, and errors. Purely relying on
temporal patterns observable in mobility data, in this paper, we propose
Spaceprint, a fully automated algorithm for finding the repetitive pattern of
similar situations in spaces. We evaluate this technique by showing how the
latent variables describing the category, and the actual identity of a space
can be discovered from the extracted situation patterns. Doing so, we use
different real-world mobility datasets with data about the presence of mobile
entities in a variety of spaces. We also evaluate the performance of this
technique by showing its robustness against uncertainties
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
High-Resolution Road Vehicle Collision Prediction for the City of Montreal
Road accidents are an important issue of our modern societies, responsible
for millions of deaths and injuries every year in the world. In Quebec only, in
2018, road accidents are responsible for 359 deaths and 33 thousands of
injuries. In this paper, we show how one can leverage open datasets of a city
like Montreal, Canada, to create high-resolution accident prediction models,
using big data analytics. Compared to other studies in road accident
prediction, we have a much higher prediction resolution, i.e., our models
predict the occurrence of an accident within an hour, on road segments defined
by intersections. Such models could be used in the context of road accident
prevention, but also to identify key factors that can lead to a road accident,
and consequently, help elaborate new policies.
We tested various machine learning methods to deal with the severe class
imbalance inherent to accident prediction problems. In particular, we
implemented the Balanced Random Forest algorithm, a variant of the Random
Forest machine learning algorithm in Apache Spark. Interestingly, we found that
in our case, Balanced Random Forest does not perform significantly better than
Random Forest.
Experimental results show that 85% of road vehicle collisions are detected by
our model with a false positive rate of 13%. The examples identified as
positive are likely to correspond to high-risk situations. In addition, we
identify the most important predictors of vehicle collisions for the area of
Montreal: the count of accidents on the same road segment during previous
years, the temperature, the day of the year, the hour and the visibility
- …