1,467 research outputs found
Spectral Clustering with Imbalanced Data
Spectral clustering is sensitive to how graphs are constructed from data
particularly when proximal and imbalanced clusters are present. We show that
Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced data since they tend to emphasize cut sizes over cut values. We
propose a graph partitioning problem that seeks minimum cut partitions under
minimum size constraints on partitions to deal with imbalanced data. Our
approach parameterizes a family of graphs, by adaptively modulating node
degrees on a fixed node set, to yield a set of parameter dependent cuts
reflecting varying levels of imbalance. The solution to our problem is then
obtained by optimizing over these parameters. We present rigorous limit cut
analysis results to justify our approach. We demonstrate the superiority of our
method through unsupervised and semi-supervised experiments on synthetic and
real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with
arXiv:1302.513
Clustering and Community Detection with Imbalanced Clusters
Spectral clustering methods which are frequently used in clustering and
community detection applications are sensitive to the specific graph
constructions particularly when imbalanced clusters are present. We show that
ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to
imbalanced cluster sizes since they tend to emphasize cut sizes over cut
values. We propose a graph partitioning problem that seeks minimum cut
partitions under minimum size constraints on partitions to deal with imbalanced
cluster sizes. Our approach parameterizes a family of graphs by adaptively
modulating node degrees on a fixed node set, yielding a set of parameter
dependent cuts reflecting varying levels of imbalance. The solution to our
problem is then obtained by optimizing over these parameters. We present
rigorous limit cut analysis results to justify our approach and demonstrate the
superiority of our method through experiments on synthetic and real datasets
for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted
to IEEE TSIP
Applications of incidence bounds in point covering problems
In the Line Cover problem a set of n points is given and the task is to cover
the points using either the minimum number of lines or at most k lines. In
Curve Cover, a generalization of Line Cover, the task is to cover the points
using curves with d degrees of freedom. Another generalization is the
Hyperplane Cover problem where points in d-dimensional space are to be covered
by hyperplanes. All these problems have kernels of polynomial size, where the
parameter is the minimum number of lines, curves, or hyperplanes needed. First
we give a non-parameterized algorithm for both problems in O*(2^n) (where the
O*(.) notation hides polynomial factors of n) time and polynomial space,
beating a previous exponential-space result. Combining this with incidence
bounds similar to the famous Szemeredi-Trotter bound, we present a Curve Cover
algorithm with running time O*((Ck/log k)^((d-1)k)), where C is some constant.
Our result improves the previous best times O*((k/1.35)^k) for Line Cover
(where d=2), O*(k^(dk)) for general Curve Cover, as well as a few other bounds
for covering points by parabolas or conics. We also present an algorithm for
Hyperplane Cover in R^3 with running time O*((Ck^2/log^(1/5) k)^k), improving
on the previous time of O*((k^2/1.3)^k).Comment: SoCG 201
BSP-fields: An Exact Representation of Polygonal Objects by Differentiable Scalar Fields Based on Binary Space Partitioning
The problem considered in this work is to find a dimension independent algorithm for the generation of signed scalar fields exactly representing polygonal objects and satisfying the following requirements: the defining real function takes zero value exactly at the polygonal object boundary; no extra zero-value isosurfaces should be generated; C1 continuity of the function in the entire domain. The proposed algorithms are based on the binary space partitioning (BSP) of the object by the planes passing through the polygonal faces and are independent of the object genus, the number of disjoint components, and holes in the initial polygonal mesh. Several extensions to the basic algorithm are proposed to satisfy the selected optimization criteria. The generated BSP-fields allow for applying techniques of the function-based modeling to already existing legacy objects from CAD and computer animation areas, which is illustrated by several examples
Ensemble Committees for Stock Return Classification and Prediction
This paper considers a portfolio trading strategy formulated by algorithms in
the field of machine learning. The profitability of the strategy is measured by
the algorithm's capability to consistently and accurately identify stock
indices with positive or negative returns, and to generate a preferred
portfolio allocation on the basis of a learned model. Stocks are characterized
by time series data sets consisting of technical variables that reflect market
conditions in a previous time interval, which are utilized produce binary
classification decisions in subsequent intervals. The learned model is
constructed as a committee of random forest classifiers, a non-linear support
vector machine classifier, a relevance vector machine classifier, and a
constituent ensemble of k-nearest neighbors classifiers. The Global Industry
Classification Standard (GICS) is used to explore the ensemble model's efficacy
within the context of various fields of investment including Energy, Materials,
Financials, and Information Technology. Data from 2006 to 2012, inclusive, are
considered, which are chosen for providing a range of market circumstances for
evaluating the model. The model is observed to achieve an accuracy of
approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate
Research prize - second plac
- …