28 research outputs found

    Transductive-Inductive Cluster Approximation Via Multivariate Chebyshev Inequality

    Full text link
    Approximating adequate number of clusters in multidimensional data is an open area of research, given a level of compromise made on the quality of acceptable results. The manuscript addresses the issue by formulating a transductive inductive learning algorithm which uses multivariate Chebyshev inequality. Considering clustering problem in imaging, theoretical proofs for a particular level of compromise are derived to show the convergence of the reconstruction error to a finite value with increasing (a) number of unseen examples and (b) the number of clusters, respectively. Upper bounds for these error rates are also proved. Non-parametric estimates of these error from a random sample of sequences empirically point to a stable number of clusters. Lastly, the generalization of algorithm can be applied to multidimensional data sets from different fields.Comment: 16 pages, 5 figure

    ΠšΠΎΠ½Ρ†Π΅ΠΏΡ‚ΡƒΠ°Π»ΡŒΠ½Ρ‹Π΅ основы ΠΈ мСтодология создания ΠΈΠ½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ кластСризациия

    No full text
    Π’ ΡΡ‚Π°Ρ‚ΡŒΠ΅ прСдставлСны тСорСтичСскиС Ρ€Π°Π·Ρ€Π°Π±ΠΎΡ‚ΠΊΠΈ ΠΏΠΎ созданию ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ кластСризации ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² слоТной ΠΏΡ€ΠΈΡ€ΠΎΠ΄Ρ‹ Π½Π° основС ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ² ΠΈΠ½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ модСлирования слоТных систСм. Π Π°Π·Ρ€Π°Π±ΠΎΡ‚Π°Π½Π° Π°Ρ€Ρ…ΠΈΡ‚Π΅ΠΊΡ‚ΡƒΡ€Π° ΠΈΠ½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³ΠΈΠΈ ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΈΠ²Π½ΠΎΠΉ кластСризации Π² Π²ΠΈΠ΄Π΅ ΠΏΠΎΠ΄Ρ€ΠΎΠ±Π½ΠΎΠΉ схСмы пошаговой Ρ€Π΅Π°Π»ΠΈΠ·Π°Ρ†ΠΈΠΈ ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€Ρ‹ ΠΈΠ½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ модСлирования процСсса кластСризации ΠΎΠ±ΡŠΠ΅ΠΊΡ‚ΠΎΠ² слоТной ΠΏΡ€ΠΈΡ€ΠΎΠ΄Ρ‹.Π£ статті прСдставлСно Ρ‚Π΅ΠΎΡ€Π΅Ρ‚ΠΈΡ‡Π½Ρ– Ρ€ΠΎΠ·Ρ€ΠΎΠ±ΠΊΠΈ ΠΏΠΎ ΡΡ‚Π²ΠΎΡ€Π΅Π½Π½ΡŽ ΠΌΠ΅Ρ‚ΠΎΠ΄ΠΎΠ»ΠΎΠ³Ρ–Ρ— ΠΎΠ±'Ρ”ΠΊΡ‚ΠΈΠ²Π½ΠΎΡ— кластСризації ΠΎΠ±'Ρ”ΠΊΡ‚Ρ–Π² складної ΠΏΡ€ΠΈΡ€ΠΎΠ΄ΠΈ Π½Π° основі ΠΌΠ΅Ρ‚ΠΎΠ΄Ρ–Π² Ρ–Π½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ модСлювання складних систСм. Π ΠΎΠ·Ρ€ΠΎΠ±Π»Π΅Π½ΠΎ Π°Ρ€Ρ…Ρ–Ρ‚Π΅ΠΊΡ‚ΡƒΡ€Ρƒ Ρ–Π½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΡ— Ρ‚Π΅Ρ…Π½ΠΎΠ»ΠΎΠ³Ρ–Ρ— ΠΎΠ±'Ρ”ΠΊΡ‚ΠΈΠ²Π½ΠΎΡ— кластСризації Ρƒ вигляді Π΄Π΅Ρ‚Π°Π»ΡŒΠ½ΠΎΡ— схСми ΠΏΠΎΠΊΡ€ΠΎΠΊΠΎΠ²ΠΎΡ— Ρ€Π΅Π°Π»Ρ–Π·Π°Ρ†Ρ–Ρ— ΠΏΡ€ΠΎΡ†Π΅Π΄ΡƒΡ€ΠΈ Ρ–Π½Π΄ΡƒΠΊΡ‚ΠΈΠ²Π½ΠΎΠ³ΠΎ модСлювання процСсу кластСризації ΠΎΠ±'Ρ”ΠΊΡ‚Ρ–Π² складної ΠΏΡ€ΠΈΡ€ΠΎΠ΄ΠΈ.The paper presents the theoretical developments to create a methodology of objective clustering of complex nature objects based on the complex systems inductive modeling methods. The architecture of the objective clustering inductive modeling as a detailed scheme of step by step implementation of procedures of inductive modeling of the objects complex nature clustering is developed

    Nearly maximally predictive features and their dimensions

    Get PDF
    Scientific explanation often requires inferring maximally predictive features from a given data set. Unfortunately, the collection of minimal maximally predictive features for most stochastic processes is uncountably infinite. In such cases, one compromises and instead seeks nearly maximally predictive features. Here, we derive upper bounds on the rates at which the number and the coding cost of nearly maximally predictive features scale with desired predictive power. The rates are determined by the fractal dimensions of a process' mixed-state distribution. These results, in turn, show how widely used finite-order Markov models can fail as predictors and that mixed-state predictive features can offer a substantial improvement.United States. Army Research Office (W911NF-13-1-0390)United States. Army Research Office (W911NF-12-1- 0288

    Partition Decoupling for Multi-gene Analysis of Gene Expression Profiling Data

    Get PDF
    We present the extention and application of a new unsupervised statistical learning technique--the Partition Decoupling Method--to gene expression data. Because it has the ability to reveal non-linear and non-convex geometries present in the data, the PDM is an improvement over typical gene expression analysis algorithms, permitting a multi-gene analysis that can reveal phenotypic differences even when the individual genes do not exhibit differential expression. Here, we apply the PDM to publicly-available gene expression data sets, and demonstrate that we are able to identify cell types and treatments with higher accuracy than is obtained through other approaches. By applying it in a pathway-by-pathway fashion, we demonstrate how the PDM may be used to find sets of mechanistically-related genes that discriminate phenotypes.Comment: Revise

    Image Segmentation using Sparse Subset Selection

    Full text link
    In this paper, we present a new image segmentation method based on the concept of sparse subset selection. Starting with an over-segmentation, we adopt local spectral histogram features to encode the visual information of the small segments into high-dimensional vectors, called superpixel features. Then, the superpixel features are fed into a novel convex model which efficiently leverages the features to group the superpixels into a proper number of coherent regions. Our model automatically determines the optimal number of coherent regions and superpixels assignment to shape final segments. To solve our model, we propose a numerical algorithm based on the alternating direction method of multipliers (ADMM), whose iterations consist of two highly parallelizable sub-problems. We show each sub-problem enjoys closed-form solution which makes the ADMM iterations computationally very efficient. Extensive experiments on benchmark image segmentation datasets demonstrate that our proposed method in combination with an over-segmentation can provide high quality and competitive results compared to the existing state-of-the-art methods.Comment: IEEE Winter Conference on Applications of Computer Vision (WACV), 201
    corecore