21 research outputs found
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection
approach with the local kernel density estimation (KDE). A Relative
Density-based Outlier Score (RDOS) is introduced to measure the local
outlierness of objects, in which the density distribution at the location of an
object is estimated with a local KDE method based on extended nearest neighbors
of the object. Instead of using only nearest neighbors, we further consider
reverse nearest neighbors and shared nearest neighbors of an object for density
distribution estimation. Some theoretical properties of the proposed RDOS
including its expected value and false alarm probability are derived. A
comprehensive experimental study on both synthetic and real-life data sets
demonstrates that our approach is more effective than state-of-the-art outlier
detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter
Analysis Based on SVM for Untrusted Mobile Crowd Sensing
Mobile crowdsensing, which collects environmental information from mobile phone users, is growing in popularity. These data can be used by companies for marketing surveys or decision making. However, collecting sensing data from other users may violate their privacy. Moreover, the data aggregator and/or the participants of crowdsensing may be untrusted entities. Recent studies have proposed randomized response schemes for anonymized data collection. We have Developed vehicle Survey Mobile Application for decision making and predict marketing survey. This kind of data collection can analyze the sensing data of users statistically without precise information about other users� sensing results. In this proposed work, we use SVM classifier for classifying the data can be used by companies for marketing surveys or decision making. In which we worked on Parameter of a city, which will help in analyzing vehicle count as well their availability according to vehicle type, vehicle model etc. The Result analyses will directly affects in predicting the result oriented strategies
Hashing-Based-Estimators for Kernel Density in High Dimensions
Given a set of points and a kernel , the Kernel
Density Estimate at a point is defined as
. We study the problem
of designing a data structure that given a data set and a kernel function,
returns *approximations to the kernel density* of a query point in *sublinear
time*. We introduce a class of unbiased estimators for kernel density
implemented through locality-sensitive hashing, and give general theorems
bounding the variance of such estimators. These estimators give rise to
efficient data structures for estimating the kernel density in high dimensions
for a variety of commonly used kernels. Our work is the first to provide
data-structures with theoretical guarantees that improve upon simple random
sampling in high dimensions.Comment: A preliminary version of this paper appeared in FOCS 201
On Skewed Multi-dimensional Distributions: the F usion
How do we model and find outliers in Twitter data? Given the number of retweets of each person on a so-cial network, what is their expected number of com-ments? Real-life data are often very skewed, exhibit-ing power-law-like behavior. For such skewed multi-dimensional discrete data, the existing models are not general enough to capture various realistic scenarios, and often need to be discretized as they often model continuous quantities. We propose FusionRP, short for Fusion Restaurant Process, a simple and intuitive model for skewed multi-dimensional discrete distribu-tions, such as number of retweets vs. comments in Twitter-like data. Our model is discrete by design, has provably asymptotic log-logistic sum of marginals, is general enough to capture varied relationships, and most importantly, and fits the real data very well. We give an effective and scalable maximum-likelihood based fitting approach that is linear in the number of unique observed values and the input dimension. We test FusionRP on a twitter-like social network with 2.2M users, a phone call network with 1.9M call records, game data with 45M users and Facebook data with 2.5M posts. Our results show that FusionRP significantly outper-forms several alternative methods and can detect out-liers, such as bot-like behaviors in the Facebook data.