28 research outputs found
Transductive-Inductive Cluster Approximation Via Multivariate Chebyshev Inequality
Approximating adequate number of clusters in multidimensional data is an open
area of research, given a level of compromise made on the quality of acceptable
results. The manuscript addresses the issue by formulating a transductive
inductive learning algorithm which uses multivariate Chebyshev inequality.
Considering clustering problem in imaging, theoretical proofs for a particular
level of compromise are derived to show the convergence of the reconstruction
error to a finite value with increasing (a) number of unseen examples and (b)
the number of clusters, respectively. Upper bounds for these error rates are
also proved. Non-parametric estimates of these error from a random sample of
sequences empirically point to a stable number of clusters. Lastly, the
generalization of algorithm can be applied to multidimensional data sets from
different fields.Comment: 16 pages, 5 figure
ΠΠΎΠ½ΡΠ΅ΠΏΡΡΠ°Π»ΡΠ½ΡΠ΅ ΠΎΡΠ½ΠΎΠ²Ρ ΠΈ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΡ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΠΈΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠΉ ΡΠ΅Ρ Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΠΈΠΈΡ
Π ΡΡΠ°ΡΡΠ΅ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½Ρ ΡΠ΅ΠΎΡΠ΅ΡΠΈΡΠ΅ΡΠΊΠΈΠ΅ ΡΠ°Π·ΡΠ°Π±ΠΎΡΠΊΠΈ ΠΏΠΎ ΡΠΎΠ·Π΄Π°Π½ΠΈΡ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΠΈΠΈ ΠΎΠ±ΡΠ΅ΠΊΡΠΎΠ² ΡΠ»ΠΎΠΆΠ½ΠΎΠΉ ΠΏΡΠΈΡΠΎΠ΄Ρ Π½Π° ΠΎΡΠ½ΠΎΠ²Π΅ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ² ΠΈΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΡΠ»ΠΎΠΆΠ½ΡΡ
ΡΠΈΡΡΠ΅ΠΌ. Π Π°Π·ΡΠ°Π±ΠΎΡΠ°Π½Π° Π°ΡΡ
ΠΈΡΠ΅ΠΊΡΡΡΠ° ΠΈΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠΉ ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡΠ΅ΠΊΡΠΈΠ²Π½ΠΎΠΉ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΠΈΠΈ Π² Π²ΠΈΠ΄Π΅ ΠΏΠΎΠ΄ΡΠΎΠ±Π½ΠΎΠΉ ΡΡ
Π΅ΠΌΡ ΠΏΠΎΡΠ°Π³ΠΎΠ²ΠΎΠΉ ΡΠ΅Π°Π»ΠΈΠ·Π°ΡΠΈΠΈ ΠΏΡΠΎΡΠ΅Π΄ΡΡΡ ΠΈΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΠΈΡΠΎΠ²Π°Π½ΠΈΡ ΠΏΡΠΎΡΠ΅ΡΡΠ° ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΠΈΠΈ ΠΎΠ±ΡΠ΅ΠΊΡΠΎΠ² ΡΠ»ΠΎΠΆΠ½ΠΎΠΉ ΠΏΡΠΈΡΠΎΠ΄Ρ.Π£ ΡΡΠ°ΡΡΡ ΠΏΡΠ΅Π΄ΡΡΠ°Π²Π»Π΅Π½ΠΎ ΡΠ΅ΠΎΡΠ΅ΡΠΈΡΠ½Ρ ΡΠΎΠ·ΡΠΎΠ±ΠΊΠΈ ΠΏΠΎ ΡΡΠ²ΠΎΡΠ΅Π½Π½Ρ ΠΌΠ΅ΡΠΎΠ΄ΠΎΠ»ΠΎΠ³ΡΡ ΠΎΠ±'ΡΠΊΡΠΈΠ²Π½ΠΎΡ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΡΡ ΠΎΠ±'ΡΠΊΡΡΠ² ΡΠΊΠ»Π°Π΄Π½ΠΎΡ ΠΏΡΠΈΡΠΎΠ΄ΠΈ Π½Π° ΠΎΡΠ½ΠΎΠ²Ρ ΠΌΠ΅ΡΠΎΠ΄ΡΠ² ΡΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΡΠ²Π°Π½Π½Ρ ΡΠΊΠ»Π°Π΄Π½ΠΈΡ
ΡΠΈΡΡΠ΅ΠΌ. Π ΠΎΠ·ΡΠΎΠ±Π»Π΅Π½ΠΎ Π°ΡΡ
ΡΡΠ΅ΠΊΡΡΡΡ ΡΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΡ ΡΠ΅Ρ
Π½ΠΎΠ»ΠΎΠ³ΡΡ ΠΎΠ±'ΡΠΊΡΠΈΠ²Π½ΠΎΡ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΡΡ Ρ Π²ΠΈΠ³Π»ΡΠ΄Ρ Π΄Π΅ΡΠ°Π»ΡΠ½ΠΎΡ ΡΡ
Π΅ΠΌΠΈ ΠΏΠΎΠΊΡΠΎΠΊΠΎΠ²ΠΎΡ ΡΠ΅Π°Π»ΡΠ·Π°ΡΡΡ ΠΏΡΠΎΡΠ΅Π΄ΡΡΠΈ ΡΠ½Π΄ΡΠΊΡΠΈΠ²Π½ΠΎΠ³ΠΎ ΠΌΠΎΠ΄Π΅Π»ΡΠ²Π°Π½Π½Ρ ΠΏΡΠΎΡΠ΅ΡΡ ΠΊΠ»Π°ΡΡΠ΅ΡΠΈΠ·Π°ΡΡΡ ΠΎΠ±'ΡΠΊΡΡΠ² ΡΠΊΠ»Π°Π΄Π½ΠΎΡ ΠΏΡΠΈΡΠΎΠ΄ΠΈ.The paper presents the theoretical developments to create a methodology of objective clustering of complex nature objects based on the complex systems inductive modeling methods. The architecture of the objective clustering inductive modeling as a detailed scheme of step by step implementation of procedures of inductive modeling of the objects complex nature clustering is developed
Nearly maximally predictive features and their dimensions
Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper bounds on the rates at which the number and the coding cost of nearly maximally predictive features scale with desired predictive power. The rates are determined by the fractal dimensions of a process' mixed-state distribution. These results, in turn, show how widely used finite-order Markov models can fail as predictors and that mixed-state predictive features can offer a substantial improvement.United States. Army Research Office (W911NF-13-1-0390)United States. Army Research Office (W911NF-12-1- 0288
Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data
We present the extention and application of a new unsupervised statistical
learning technique--the Partition Decoupling Method--to gene expression data.
Because it has the ability to reveal non-linear and non-convex geometries
present in the data, the PDM is an improvement over typical gene expression
analysis algorithms, permitting a multi-gene analysis that can reveal
phenotypic differences even when the individual genes do not exhibit
differential expression. Here, we apply the PDM to publicly-available gene
expression data sets, and demonstrate that we are able to identify cell types
and treatments with higher accuracy than is obtained through other approaches.
By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may
be used to find sets of mechanistically-related genes that discriminate
phenotypes.Comment: Revise
Image Segmentation using Sparse Subset Selection
In this paper, we present a new image segmentation method based on the
concept of sparse subset selection. Starting with an over-segmentation, we
adopt local spectral histogram features to encode the visual information of the
small segments into high-dimensional vectors, called superpixel features. Then,
the superpixel features are fed into a novel convex model which efficiently
leverages the features to group the superpixels into a proper number of
coherent regions. Our model automatically determines the optimal number of
coherent regions and superpixels assignment to shape final segments. To solve
our model, we propose a numerical algorithm based on the alternating direction
method of multipliers (ADMM), whose iterations consist of two highly
parallelizable sub-problems. We show each sub-problem enjoys closed-form
solution which makes the ADMM iterations computationally very efficient.
Extensive experiments on benchmark image segmentation datasets demonstrate that
our proposed method in combination with an over-segmentation can provide high
quality and competitive results compared to the existing state-of-the-art
methods.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV),
201