21 research outputs found

    A Local Density-Based Approach for Local Outlier Detection

    Full text link
    This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only kk nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter

    Analysis Based on SVM for Untrusted Mobile Crowd Sensing

    Get PDF
    Mobile crowdsensing, which collects environmental information from mobile phone users, is growing in popularity. These data can be used by companies for marketing surveys or decision making. However, collecting sensing data from other users may violate their privacy. Moreover, the data aggregator and/or the participants of crowdsensing may be untrusted entities. Recent studies have proposed randomized response schemes for anonymized data collection. We have Developed vehicle Survey Mobile Application for decision making and predict marketing survey. This kind of data collection can analyze the sensing data of users statistically without precise information about other users� sensing results. In this proposed work, we use SVM classifier for classifying the data can be used by companies for marketing surveys or decision making. In which we worked on Parameter of a city, which will help in analyzing vehicle count as well their availability according to vehicle type, vehicle model etc. The Result analyses will directly affects in predicting the result oriented strategies

    Hashing-Based-Estimators for Kernel Density in High Dimensions

    Full text link
    Given a set of points P⊂RdP\subset \mathbb{R}^{d} and a kernel kk, the Kernel Density Estimate at a point x∈Rdx\in\mathbb{R}^{d} is defined as KDEP(x)=1∣P∣∑y∈Pk(x,y)\mathrm{KDE}_{P}(x)=\frac{1}{|P|}\sum_{y\in P} k(x,y). We study the problem of designing a data structure that given a data set PP and a kernel function, returns *approximations to the kernel density* of a query point in *sublinear time*. We introduce a class of unbiased estimators for kernel density implemented through locality-sensitive hashing, and give general theorems bounding the variance of such estimators. These estimators give rise to efficient data structures for estimating the kernel density in high dimensions for a variety of commonly used kernels. Our work is the first to provide data-structures with theoretical guarantees that improve upon simple random sampling in high dimensions.Comment: A preliminary version of this paper appeared in FOCS 201

    On Skewed Multi-dimensional Distributions: the F usion

    Full text link
    How do we model and find outliers in Twitter data? Given the number of retweets of each person on a so-cial network, what is their expected number of com-ments? Real-life data are often very skewed, exhibit-ing power-law-like behavior. For such skewed multi-dimensional discrete data, the existing models are not general enough to capture various realistic scenarios, and often need to be discretized as they often model continuous quantities. We propose FusionRP, short for Fusion Restaurant Process, a simple and intuitive model for skewed multi-dimensional discrete distribu-tions, such as number of retweets vs. comments in Twitter-like data. Our model is discrete by design, has provably asymptotic log-logistic sum of marginals, is general enough to capture varied relationships, and most importantly, and fits the real data very well. We give an effective and scalable maximum-likelihood based fitting approach that is linear in the number of unique observed values and the input dimension. We test FusionRP on a twitter-like social network with 2.2M users, a phone call network with 1.9M call records, game data with 45M users and Facebook data with 2.5M posts. Our results show that FusionRP significantly outper-forms several alternative methods and can detect out-liers, such as bot-like behaviors in the Facebook data.
    corecore