21,375 research outputs found
FSMJ: Feature Selection with Maximum Jensen-Shannon Divergence for Text Categorization
In this paper, we present a new wrapper feature selection approach based on
Jensen-Shannon (JS) divergence, termed feature selection with maximum
JS-divergence (FSMJ), for text categorization. Unlike most existing feature
selection approaches, the proposed FSMJ approach is based on real-valued
features which provide more information for discrimination than binary-valued
features used in conventional approaches. We show that the FSMJ is a greedy
approach and the JS-divergence monotonically increases when more features are
selected. We conduct several experiments on real-life data sets, compared with
the state-of-the-art feature selection approaches for text categorization. The
superior performance of the proposed FSMJ approach demonstrates its
effectiveness and further indicates its wide potential applications on data
mining.Comment: 8 pages, 6 figures, World Congress on Intelligent Control and
Automation, 201
On the Efficiency of the Proportional Allocation Mechanism for Divisible Resources
We study the efficiency of the proportional allocation mechanism, that is
widely used to allocate divisible resources. Each agent submits a bid for each
divisible resource and receives a fraction proportional to her bids. We
quantify the inefficiency of Nash equilibria by studying the Price of Anarchy
(PoA) of the induced game under complete and incomplete information. When
agents' valuations are concave, we show that the Bayesian Nash equilibria can
be arbitrarily inefficient, in contrast to the well-known 4/3 bound for pure
equilibria. Next, we upper bound the PoA over Bayesian equilibria by 2 when
agents' valuations are subadditive, generalizing and strengthening previous
bounds on lattice submodular valuations. Furthermore, we show that this bound
is tight and cannot be improved by any simple or scale-free mechanism. Then we
switch to settings with budget constraints, and we show an improved upper bound
on the PoA over coarse-correlated equilibria. Finally, we prove that the PoA is
exactly 2 for pure equilibria in the polyhedral environment.Comment: To appear in SAGT 201
Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Automated feature selection is important for text categorization to reduce
the feature size and to speed up the learning process of classifiers. In this
paper, we present a novel and efficient feature selection framework based on
the Information Theory, which aims to rank the features with their
discriminative capacity for classification. We first revisit two information
measures: Kullback-Leibler divergence and Jeffreys divergence for binary
hypothesis testing, and analyze their asymptotic properties relating to type I
and type II errors of a Bayesian classifier. We then introduce a new divergence
measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure
multi-distribution divergence for multi-class classification. Based on the
JMH-divergence, we develop two efficient feature selection methods, termed
maximum discrimination () and methods, for text categorization.
The promising results of extensive experiments demonstrate the effectiveness of
the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data
Engineering. 14 pages, 5 figure
A Local Density-Based Approach for Local Outlier Detection
This paper presents a simple but effective density-based outlier detection
approach with the local kernel density estimation (KDE). A Relative
Density-based Outlier Score (RDOS) is introduced to measure the local
outlierness of objects, in which the density distribution at the location of an
object is estimated with a local KDE method based on extended nearest neighbors
of the object. Instead of using only nearest neighbors, we further consider
reverse nearest neighbors and shared nearest neighbors of an object for density
distribution estimation. Some theoretical properties of the proposed RDOS
including its expected value and false alarm probability are derived. A
comprehensive experimental study on both synthetic and real-life data sets
demonstrates that our approach is more effective than state-of-the-art outlier
detection methods.Comment: 22 pages, 14 figures, submitted to Pattern Recognition Letter
Tight Bounds for the Price of Anarchy of Simultaneous First Price Auctions
We study the Price of Anarchy of simultaneous first-price auctions for buyers
with submodular and subadditive valuations. The current best upper bounds for
the Bayesian Price of Anarchy of these auctions are e/(e-1) [Syrgkanis and
Tardos 2013] and 2 [Feldman et al. 2013], respectively. We provide matching
lower bounds for both cases even for the case of full information and for mixed
Nash equilibria via an explicit construction.
We present an alternative proof of the upper bound of e/(e-1) for first-price
auctions with fractionally subadditive valuations which reveals the worst-case
price distribution, that is used as a building block for the matching lower
bound construction.
We generalize our results to a general class of item bidding auctions that we
call bid-dependent auctions (including first-price auctions and all-pay
auctions) where the winner is always the highest bidder and each bidder's
payment depends only on his own bid.
Finally, we apply our techniques to discriminatory price multi-unit auctions.
We complement the results of [de Keijzer et al. 2013] for the case of
subadditive valuations, by providing a matching lower bound of 2. For the case
of submodular valuations, we provide a lower bound of 1.109. For the same class
of valuations, we were able to reproduce the upper bound of e/(e-1) using our
non-smooth approach.Comment: 37 pages, 5 figures, ACM Transactions on Economics and Computatio
Detection of False Data Injection Attacks in Smart Grid under Colored Gaussian Noise
In this paper, we consider the problems of state estimation and false data
injection detection in smart grid when the measurements are corrupted by
colored Gaussian noise. By modeling the noise with the autoregressive process,
we estimate the state of the power transmission networks and develop a
generalized likelihood ratio test (GLRT) detector for the detection of false
data injection attacks. We show that the conventional approach with the
assumption of Gaussian noise is a special case of the proposed method, and thus
the new approach has more applicability. {The proposed detector is also tested
on an independent component analysis (ICA) based unobservable false data attack
scheme that utilizes similar assumptions of sample observation.} We evaluate
the performance of the proposed state estimator and attack detector on the IEEE
30-bus power system with comparison to conventional Gaussian noise based
detector. The superior performance of {both observable and unobservable false
data attacks} demonstrates the effectiveness of the proposed approach and
indicates a wide application on the power signal processing.Comment: 8 pages, 4 figures in IEEE Conference on Communications and Network
Security (CNS) 201
Probabilistic Human Mobility Model in Indoor Environment
Understanding human mobility is important for the development of intelligent
mobile service robots as it can provide prior knowledge and predictions of
human distribution for robot-assisted activities. In this paper, we propose a
probabilistic method to model human motion behaviors which is determined by
both internal and external factors in an indoor environment. While the internal
factors are represented by the individual preferences, aims and interests, the
external factors are indicated by the stimulation of the environment. We model
the randomness of human macro-level movement, e.g., the probability of visiting
a specific place and staying time, under the Bayesian framework, considering
the influence of both internal and external variables. We use two case studies
in a shopping mall and in a college student dorm building to show the
effectiveness of our proposed probabilistic human mobility model. Real
surveillance camera data are used to validate the proposed model together with
survey data in the case study of student dorm.Comment: 8 pages, 9 figures, International Joint Conference on Neural Networks
(IJCNN) 201
Fast Low-rank Representation based Spatial Pyramid Matching for Image Classification
Spatial Pyramid Matching (SPM) and its variants have achieved a lot of
success in image classification. The main difference among them is their
encoding schemes. For example, ScSPM incorporates Sparse Code (SC) instead of
Vector Quantization (VQ) into the framework of SPM. Although the methods
achieve a higher recognition rate than the traditional SPM, they consume more
time to encode the local descriptors extracted from the image. In this paper,
we propose using Low Rank Representation (LRR) to encode the descriptors under
the framework of SPM. Different from SC, LRR considers the group effect among
data points instead of sparsity. Benefiting from this property, the proposed
method (i.e., LrrSPM) can offer a better performance. To further improve the
generalizability and robustness, we reformulate the rank-minimization problem
as a truncated projection problem. Extensive experimental studies show that
LrrSPM is more efficient than its counterparts (e.g., ScSPM) while achieving
competitive recognition rates on nine image data sets.Comment: accepted into knowledge based systems, 201
- …
