Search CORE

8,090 research outputs found

A Local Density-Based Approach for Local Outlier Detection

Author: He Haibo
Tang Bo
Publication venue
Publication date: 27/06/2016
Field of study

This paper presents a simple but effective density-based outlier detection approach with the local kernel density estimation (KDE). A Relative Density-based Outlier Score (RDOS) is introduced to measure the local outlierness of objects, in which the density distribution at the location of an object is estimated with a local KDE method based on extended nearest neighbors of the object. Instead of using only

k

nearest neighbors, we further consider reverse nearest neighbors and shared nearest neighbors of an object for density distribution estimation. Some theoretical properties of the proposed RDOS including its expected value and false alarm probability are derived. A comprehensive experimental study on both synthetic and real-life data sets demonstrates that our approach is more effective than state-of-the-art outlier detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter

arXiv.org e-Print Archive

Crossref

DigitalCommons@URI

FSMJ: Feature Selection with Maximum Jensen-Shannon Divergence for Text Categorization

Author: He Haibo
Tang Bo
Publication venue
Publication date: 20/06/2016
Field of study

In this paper, we present a new wrapper feature selection approach based on Jensen-Shannon (JS) divergence, termed feature selection with maximum JS-divergence (FSMJ), for text categorization. Unlike most existing feature selection approaches, the proposed FSMJ approach is based on real-valued features which provide more information for discrimination than binary-valued features used in conventional approaches. We show that the FSMJ is a greedy approach and the JS-divergence monotonically increases when more features are selected. We conduct several experiments on real-life data sets, compared with the state-of-the-art feature selection approaches for text categorization. The superior performance of the proposed FSMJ approach demonstrates its effectiveness and further indicates its wide potential applications on data mining.Comment: 8 pages, 6 figures, World Congress on Intelligent Control and Automation, 201

arXiv.org e-Print Archive

DigitalCommons@URI

On the Efficiency of the Proportional Allocation Mechanism for Divisible Resources

Author: Christodoulou George
Sgouritsa Alkmini
Tang Bo
Publication venue
Publication date: 24/07/2015
Field of study

We study the efficiency of the proportional allocation mechanism, that is widely used to allocate divisible resources. Each agent submits a bid for each divisible resource and receives a fraction proportional to her bids. We quantify the inefficiency of Nash equilibria by studying the Price of Anarchy (PoA) of the induced game under complete and incomplete information. When agents' valuations are concave, we show that the Bayesian Nash equilibria can be arbitrarily inefficient, in contrast to the well-known 4/3 bound for pure equilibria. Next, we upper bound the PoA over Bayesian equilibria by 2 when agents' valuations are subadditive, generalizing and strengthening previous bounds on lattice submodular valuations. Furthermore, we show that this bound is tight and cannot be improved by any simple or scale-free mechanism. Then we switch to settings with budget constraints, and we show an improved upper bound on the PoA over coarse-correlated equilibria. Finally, we prove that the PoA is exactly 2 for pure equilibria in the polyhedral environment.Comment: To appear in SAGT 201

arXiv.org e-Print Archive

University of Liverpool Repository

Crossref

Springer - Publisher Connector

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

Author: He Haibo
Kay Steven
Tang Bo
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/02/2016
Field of study

Automated feature selection is important for text categorization to reduce the feature size and to speed up the learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination (

MD

) and

MD-\chi^2

methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data Engineering. 14 pages, 5 figure

arXiv.org e-Print Archive

DigitalCommons@URI

Probabilistic Human Mobility Model in Indoor Environment

Author: Guo Yi
He Haibo
Jiang Chao
Tang Bo
Publication venue
Publication date: 27/06/2016
Field of study

Understanding human mobility is important for the development of intelligent mobile service robots as it can provide prior knowledge and predictions of human distribution for robot-assisted activities. In this paper, we propose a probabilistic method to model human motion behaviors which is determined by both internal and external factors in an indoor environment. While the internal factors are represented by the individual preferences, aims and interests, the external factors are indicated by the stimulation of the environment. We model the randomness of human macro-level movement, e.g., the probability of visiting a specific place and staying time, under the Bayesian framework, considering the influence of both internal and external variables. We use two case studies in a shopping mall and in a college student dorm building to show the effectiveness of our proposed probabilistic human mobility model. Real surveillance camera data are used to validate the proposed model together with survey data in the case study of student dorm.Comment: 8 pages, 9 figures, International Joint Conference on Neural Networks (IJCNN) 201

arXiv.org e-Print Archive

Crossref

DigitalCommons@URI