Search CORE

249 research outputs found

Rival penalized competitive learning for content-based indexing.

Author
Publication venue
Publication date: 01/01/1998
Field of study

by Lau Tak Kan.Thesis (M.Phil.)--Chinese University of Hong Kong, 1998.Includes bibliographical references (leaves 100-108).Abstract also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Background --- p.1Chapter 1.2 --- Problem Defined --- p.5Chapter 1.3 --- Contributions --- p.5Chapter 1.4 --- Thesis Organization --- p.7Chapter 2 --- Content-based Retrieval Multimedia Database Background and Indexing Problem --- p.8Chapter 2.1 --- Feature Extraction --- p.8Chapter 2.2 --- Nearest-neighbor Search --- p.10Chapter 2.3 --- Content-based Indexing Methods --- p.15Chapter 2.4 --- Indexing Problem --- p.22Chapter 3 --- Data Clustering Methods for Indexing --- p.25Chapter 3.1 --- Proposed Solution to Indexing Problem --- p.25Chapter 3.2 --- Brief Description of Several Clustering Methods --- p.26Chapter 3.2.1 --- K-means --- p.26Chapter 3.2.2 --- Competitive Learning (CL) --- p.27Chapter 3.2.3 --- Rival Penalized Competitive Learning (RPCL) --- p.29Chapter 3.2.4 --- General Hierarchical Clustering Methods --- p.31Chapter 3.3 --- Why RPCL? --- p.32Chapter 4 --- Non-hierarchical RPCL Indexing --- p.33Chapter 4.1 --- The Non-hierarchical Approach --- p.33Chapter 4.2 --- Performance Experiments --- p.34Chapter 4.2.1 --- Experimental Setup --- p.35Chapter 4.2.2 --- Experiment 1: Test for Recall and Precision Performance --- p.38Chapter 4.2.3 --- Experiment 2: Test for Different Sizes of Input Data Sets --- p.45Chapter 4.2.4 --- Experiment 3: Test for Different Numbers of Dimensions --- p.49Chapter 4.2.5 --- Experiment 4: Compare with Actual Nearest-neighbor Results --- p.53Chapter 4.3 --- Chapter Summary --- p.55Chapter 5 --- Hierarchical RPCL Indexing --- p.56Chapter 5.1 --- The Hierarchical Approach --- p.56Chapter 5.2 --- The Hierarchical RPCL Binary Tree (RPCL-b-tree) --- p.58Chapter 5.3 --- Insertion --- p.61Chapter 5.4 --- Deletion --- p.63Chapter 5.5 --- Searching --- p.63Chapter 5.6 --- Experiments --- p.69Chapter 5.6.1 --- Experimental Setup --- p.69Chapter 5.6.2 --- Experiment 5: Test for Different Node Sizes --- p.72Chapter 5.6.3 --- Experiment 6: Test for Different Sizes of Data Sets --- p.75Chapter 5.6.4 --- Experiment 7: Test for Different Data Distributions --- p.78Chapter 5.6.5 --- Experiment 8: Test for Different Numbers of Dimensions --- p.80Chapter 5.6.6 --- Experiment 9: Test for Different Numbers of Database Ob- jects Retrieved --- p.83Chapter 5.6.7 --- Experiment 10: Test with VP-tree --- p.86Chapter 5.7 --- Discussion --- p.90Chapter 5.8 --- A Relationship Formula --- p.93Chapter 5.9 --- Chapter Summary --- p.96Chapter 6 --- Conclusion --- p.97Chapter 6.1 --- Future Works --- p.97Chapter 6.2 --- Conclusion --- p.98Bibliography --- p.10

CUHK Digital Repository

Fuzzy clustering for content-based indexing in multimedia databases.

Author
Publication venue
Publication date: 01/01/2001
Field of study

Yue Ho-Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 129-137).Abstracts in English and Chinese.Abstract --- p.iAcknowledgement --- p.ivChapter 1 --- Introduction --- p.1Chapter 1.1 --- Problem Definition --- p.7Chapter 1.2 --- Contributions --- p.8Chapter 1.3 --- Thesis Organization --- p.10Chapter 2 --- Literature Review --- p.11Chapter 2.1 --- "Content-based Retrieval, Background and Indexing Problem" --- p.11Chapter 2.1.1 --- Feature Extraction --- p.12Chapter 2.1.2 --- Nearest-neighbor Search --- p.13Chapter 2.1.3 --- Content-based Indexing Methods --- p.15Chapter 2.2 --- Indexing Problems --- p.25Chapter 2.3 --- Data Clustering Methods for Indexing --- p.26Chapter 2.3.1 --- Probabilistic Clustering --- p.27Chapter 2.3.2 --- Possibilistic Clustering --- p.34Chapter 3 --- Fuzzy Clustering Algorithms --- p.37Chapter 3.1 --- Fuzzy Competitive Clustering --- p.38Chapter 3.2 --- Sequential Fuzzy Competitive Clustering --- p.40Chapter 3.3 --- Experiments --- p.43Chapter 3.3.1 --- Experiment 1: Data set with different number of samples --- p.44Chapter 3.3.2 --- Experiment 2: Data set on different dimensionality --- p.46Chapter 3.3.3 --- Experiment 3: Data set with different number of natural clusters inside --- p.55Chapter 3.3.4 --- Experiment 4: Data set with different noise level --- p.56Chapter 3.3.5 --- Experiment 5: Clusters with different geometry size --- p.60Chapter 3.3.6 --- Experiment 6: Clusters with different number of data instances --- p.67Chapter 3.3.7 --- Experiment 7: Performance on real data set --- p.71Chapter 3.4 --- Discussion --- p.72Chapter 3.4.1 --- "Differences Between FCC, SFCC, and Others Clustering Algorithms" --- p.72Chapter 3.4.2 --- Variations on SFCC --- p.75Chapter 3.4.3 --- Why SFCC? --- p.75Chapter 4 --- Hierarchical Indexing based on Natural Clusters Information --- p.77Chapter 4.1 --- The Hierarchical Approach --- p.77Chapter 4.2 --- The Sequential Fuzzy Competitive Clustering Binary Tree (SFCC- b-tree) --- p.79Chapter 4.2.1 --- Data Structure of SFCC-b-tree --- p.80Chapter 4.2.2 --- Tree Building of SFCC-b-Tree --- p.82Chapter 4.2.3 --- Insertion of SFCC-b-tree --- p.83Chapter 4.2.4 --- Deletion of SFCC-b-Tree --- p.84Chapter 4.2.5 --- Searching in SFCC-b-Tree --- p.84Chapter 4.3 --- Experiments --- p.88Chapter 4.3.1 --- Experimental Setting --- p.88Chapter 4.3.2 --- Experiment 8: Test for different leaf node sizes --- p.90Chapter 4.3.3 --- Experiment 9: Test for different dimensionality --- p.97Chapter 4.3.4 --- Experiment 10: Test for different sizes of data sets --- p.104Chapter 4.3.5 --- Experiment 11: Test for different data distributions --- p.109Chapter 4.4 --- Summary --- p.113Chapter 5 --- A Case Study on SFCC-b-tree --- p.114Chapter 5.1 --- Introduction --- p.114Chapter 5.2 --- Data Collection --- p.115Chapter 5.3 --- Data Pre-processing --- p.116Chapter 5.4 --- Experimental Results --- p.119Chapter 5.5 --- Summary --- p.121Chapter 6 --- Conclusion --- p.122Chapter 6.1 --- An Efficiency Formula --- p.122Chapter 6.1.1 --- Motivation --- p.122Chapter 6.1.2 --- Regression Model --- p.123Chapter 6.1.3 --- Discussion --- p.124Chapter 6.2 --- Future Directions --- p.127Chapter 6.3 --- Conclusion --- p.128Bibliography --- p.12

CUHK Digital Repository

A perceptual learning model to discover the hierarchical latent structure of image collections

Author: Bacciu Davide
Publication venue: IMT Alti Studi Lucca
Publication date: 01/01/2008
Field of study

Biology has been an unparalleled source of inspiration for the work of researchers in several scientific and engineering fields including computer vision. The starting point of this thesis is the neurophysiological properties of the human early visual system, in particular, the cortical mechanism that mediates learning by exploiting information about stimuli repetition. Repetition has long been considered a fundamental correlate of skill acquisition andmemory formation in biological aswell as computational learning models. However, recent studies have shown that biological neural networks have differentways of exploiting repetition in forming memory maps. The thesis focuses on a perceptual learning mechanism called repetition suppression, which exploits the temporal distribution of neural activations to drive an efficient neural allocation for a set of stimuli. This explores the neurophysiological hypothesis that repetition suppression serves as an unsupervised perceptual learning mechanism that can drive efficient memory formation by reducing the overall size of stimuli representation while strengthening the responses of the most selective neurons. This interpretation of repetition is different from its traditional role in computational learning models mainly to induce convergence and reach training stability, without using this information to provide focus for the neural representations of the data. The first part of the thesis introduces a novel computational model with repetition suppression, which forms an unsupervised competitive systemtermed CoRe, for Competitive Repetition-suppression learning. The model is applied to generalproblems in the fields of computational intelligence and machine learning. Particular emphasis is placed on validating the model as an effective tool for the unsupervised exploration of bio-medical data. In particular, it is shown that the repetition suppression mechanism efficiently addresses the issues of automatically estimating the number of clusters within the data, as well as filtering noise and irrelevant input components in highly dimensional data, e.g. gene expression levels from DNA Microarrays. The CoRe model produces relevance estimates for the each covariate which is useful, for instance, to discover the best discriminating bio-markers. The description of the model includes a theoretical analysis using Huber’s robust statistics to show that the model is robust to outliers and noise in the data. The convergence properties of themodel also studied. It is shown that, besides its biological underpinning, the CoRe model has useful properties in terms of asymptotic behavior. By exploiting a kernel-based formulation for the CoRe learning error, a theoretically sound motivation is provided for the model’s ability to avoid local minima of its loss function. To do this a necessary and sufficient condition for global error minimization in vector quantization is generalized by extending it to distance metrics in generic Hilbert spaces. This leads to the derivation of a family of kernel-based algorithms that address the local minima issue of unsupervised vector quantization in a principled way. The experimental results show that the algorithm can achieve a consistent performance gain compared with state-of-the-art learning vector quantizers, while retaining a lower computational complexity (linear with respect to the dataset size). Bridging the gap between the low level representation of the visual content and the underlying high-level semantics is a major research issue of current interest. The second part of the thesis focuses on this problem by introducing a hierarchical and multi-resolution approach to visual content understanding. On a spatial level, CoRe learning is used to pool together the local visual patches by organizing them into perceptually meaningful intermediate structures. On the semantical level, it provides an extension of the probabilistic Latent Semantic Analysis (pLSA) model that allows discovery and organization of the visual topics into a hierarchy of aspects. The proposed hierarchical pLSA model is shown to effectively address the unsupervised discovery of relevant visual classes from pictorial collections, at the same time learning to segment the image regions containing the discovered classes. Furthermore, by drawing on a recent pLSA-based image annotation system, the hierarchical pLSA model is extended to process and representmulti-modal collections comprising textual and visual data. The results of the experimental evaluation show that the proposed model learns to attach textual labels (available only at the level of the whole image) to the discovered image regions, while increasing the precision/ recall performance with respect to flat, pLSA annotation model

IMT E-Theses

Archivio della Ricerca - Università di Pisa

Warped K-Means: An algorithm to cluster sequentially-distributed data

Author: Ackermann
Arikan
Athavale
Bashir
Beringer
Bezdek
Davies
Domingos
Dubes
Duda
Duda
Dunn
Dunn
Enrique Vidal
Farnstrom
Fod
Guha
Hofmann
Hubert
Jain
Jain
Kaufman
Kranen
Liu
Lloyd
Luis A. Leiva
Murtagh
Niebles
Panagiotakis
Patra
Peshkin
Pérez-Cortés
Seni
Trahanias
Veenman
Ward
Xu
Yu
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 10/07/2013
Field of study

[EN] Many devices generate large amounts of data that follow some sort of sequentiality, e.g., motion sensors, e-pens, eye trackers, etc. and often these data need to be compressed for classification, storage, and/or retrieval tasks. Traditional clustering algorithms can be used for this purpose, but unfortunately they do not cope with the sequential information implicitly embedded in such data. Thus, we revisit the well-known K-means algorithm and provide a general method to properly cluster sequentially-distributed data. We present Warped K-Means (WKM), a multi-purpose partitional clustering procedure that minimizes the sum of squared error criterion, while imposing a hard sequentiality constraint in the classification step. We illustrate the properties of WKM in three applications, one being the segmentation and classification of human activity. WKM outperformed five state-of- the-art clustering techniques to simplify data trajectories, achieving a recognition accuracy of near 97%, which is an improvement of around 66% over their peers. Moreover, such an improvement came with a reduction in the computational cost of more than one order of magnitude.This work has been partially supported by Casmacat (FP7-ICT-2011-7, Project 287576), tranScriptorium (FP7-ICT-2011-9, Project 600707), STraDA (MINECO, TIN2012-37475-0O2-01), and ALMPR (GVA, Prometeo/20091014) projects.Leiva Torres, LA.; Vidal, E. (2013). Warped K-Means: An algorithm to cluster sequentially-distributed data. Information Sciences. 237:196-210. https://doi.org/10.1016/j.ins.2013.02.042S19621023

Crossref

RiuNet

Projection-embedded BYY learning algorithm for Gaussian mixture-based clustering

Author: Guangyong Chen
Lei Xu
Pheng-Ann Heng
Publication venue: Springer Nature
Publication date: 01/01/2014
Field of study

Springer - Publisher Connector

Internetsuche und Neuronale Netze: Stand der Technik

Author: Heuser Udo
Rosenstiel Wolfgang
Publication venue: Universität Tübingen
Publication date: 11/10/2012
Field of study

Veröffentlichung des Wilhelm-Schickard-Institut für Informatik Universität Tübinge

Publikationsserver der Universität Tübingen

Unsupervised Selection and Estimation of Non-Gaussian Mixtures for High Dimensional Data Analysis

Author: Elguebaly Tarek
Publication venue
Publication date: 09/09/2014
Field of study

Lately, the enormous generation of databases in almost every aspect of life has created a great demand for new, powerful tools for turning data into useful information. Therefore, researchers were encouraged to explore and develop new machine learning ideas and methods. Mixture models are one of the machine learning techniques receiving considerable attention due to their ability to handle efficiently and effectively multidimensional data. Generally, four critical issues have to be addressed when adopting mixture models in high dimensional spaces: (1) choice of the probability density functions, (2) estimation of the mixture parameters, (3) automatic determination of the number of components M in the mixture, and (4) determination of what features best discriminate among the different components. The main goal of this thesis is to summarize all these challenging interrelated problems in one unified model. In most of the applications, the Gaussian density is used in mixture modeling of data. Although a Gaussian mixture may provide a reasonable approximation to many real-world distributions, it is certainly not always the best approximation especially in computer vision and image processing applications where we often deal with non-Gaussian data. Therefore, we propose to use three highly flexible distributions: the generalized Gaussian distribution (GGD), the asymmetric Gaussian distribution (AGD), and the asymmetric generalized Gaussian distribution (AGGD). We are motivated by the fact that these distributions are able to fit many distributional shapes and then can be considered as a useful class of flexible models to address several problems and applications involving measurements and features having well-known marked deviation from the Gaussian shape. Recently, researches have shown that model selection and parameter learning are highly dependent and should be performed simultaneously. For this purpose, many approaches have been suggested. The vast majority of these approaches can be classified, from a computational point of view, into two classes: deterministic and stochastic methods. Deterministic methods estimate the model parameters for a set of candidate models using the Expectation-Maximization (EM) framework, then choose the model that maximizes a model selection criterion. Stochastic methods such as Markov chain Monte Carlo (MCMC) can be used in order to sample from the full a posteriori distribution with M considered unknown. Hence, in this thesis, we propose three learning techniques capable of automatically determining model complexity while learning its parameters. First, we incorporate a Minimum Message Length (MML) penalty in the model learning step performed using the EM algorithm. Our second approach employs the Rival Penalized EM (RPEM) algorithm which is able to select an appropriate number of densities by fading out the redundant densities from a density mixture. Last but not least, we incorporate the nonparametric aspect of mixture models by assuming a countably infinite number of components and using Markov Chain Monte Carlo (MCMC) simulations for the estimation of the posterior distributions. Hence, the difficulty of choosing the appropriate number of clusters is sidestepped by assuming that there are an infinite number of mixture components. Another essential issue in the case of statistical modeling in general and finite mixtures in particular is feature selection (i.e. identification of the relevant or discriminative features describing the data) especially in the case of high-dimensional data. Indeed, feature selection has been shown to be a crucial step in several image processing, computer vision and pattern recognition applications not only because it speeds up learning but also because it improves model accuracy and generalization. Moreover, the learning of the mixture parameters ( i.e. both model selection and parameters estimation) is greatly affected by the quality of the features used. Hence, in this thesis, we are trying to solve the feature selection problem in unsupervised learning by casting it as an estimation problem, thus avoiding any combinatorial search. Finally, the effectiveness of our approaches is evaluated by applying them to different computer vision and image processing applications

Concordia University Research Repository

A Clustering Method for Data in Cylindrical Coordinates

Author: Kazuhisa Fujita
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

We propose a new clustering method for data in cylindrical coordinates based on the k-means. The goal of the k-means family is to maximize an optimization function, which requires a similarity. Thus, we need a new similarity to obtain the new clustering method for data in cylindrical coordinates. In this study, we first derive a new similarity for the new clustering method by assuming a particular probabilistic model. A data point in cylindrical coordinates has radius, azimuth, and height. We assume that the azimuth is sampled from a von Mises distribution and the radius and the height are independently generated from isotropic Gaussian distributions. We derive the new similarity from the log likelihood of the assumed probability distribution. Our experiments demonstrate that the proposed method using the new similarity can appropriately partition synthetic data defined in cylindrical coordinates. Furthermore, we apply the proposed method to color image quantization and show that the methods successfully quantize a color image with respect to the hue element

Crossref

Directory of Open Access Journals

Novel Application of Neutrosophic Logic in Classifiers Evaluated under Region-Based Image Categorization System

Author: Ju Wen
Publication venue: DigitalCommons@USU
Publication date: 01/05/2011
Field of study

Neutrosophic logic is a relatively new logic that is a generalization of fuzzy logic. In this dissertation, for the first time, neutrosophic logic is applied to the field of classifiers where a support vector machine (SVM) is adopted as the example to validate the feasibility and effectiveness of neutrosophic logic. The proposed neutrosophic set is integrated into a reformulated SVM, and the performance of the achieved classifier N-SVM is evaluated under an image categorization system. Image categorization is an important yet challenging research topic in computer vision. In this dissertation, images are first segmented by a hierarchical two-stage self organizing map (HSOM), using color and texture features. A novel approach is proposed to select the training samples of HSOM based on homogeneity properties. A diverse density support vector machine (DD-SVM) framework that extends the multiple-instance learning (MIL) technique is then applied to the image categorization problem by viewing an image as a bag of instances corresponding to the regions obtained from the image segmentation. Using the instance prototype, every bag is mapped to a point in the new bag space, and the categorization is transformed to a classification problem. Then, the proposed N-SVM based on the neutrosophic set is used as the classifier in the new bag space. N-SVM treats samples differently according to the weighting function, and it helps reduce the effects of outliers. Experimental results on a COREL dataset of 1000 general purpose images and a Caltech 101 dataset of 9000 images demonstrate the validity and effectiveness of the proposed method

DigitalCommons@USU

Recommended from our members

Scene Analysis Using Scale Invariant Feature Extraction and Probabilistic Modeling

Author: Shen Yao
Publication venue: 'University of North Texas Libraries'
Publication date: 01/08/2011
Field of study

Conventional pattern recognition systems have two components: feature analysis and pattern classification. For any object in an image, features could be considered as the major characteristic of the object either for object recognition or object tracking purpose. Features extracted from a training image, can be used to identify the object when attempting to locate the object in a test image containing many other objects. To perform reliable scene analysis, it is important that the features extracted from the training image are detectable even under changes in image scale, noise and illumination. Scale invariant feature has wide applications such as image classification, object recognition and object tracking in the image processing area. In this thesis, color feature and SIFT (scale invariant feature transform) are considered to be scale invariant feature. The classification, recognition and tracking result were evaluated with novel evaluation criterion and compared with some existing methods. I also studied different types of scale invariant feature for the purpose of solving scene analysis problems. I propose probabilistic models as the foundation of analysis scene scenario of images. In order to differential the content of image, I develop novel algorithms for the adaptive combination for multiple features extracted from images. I demonstrate the performance of the developed algorithm on several scene analysis tasks, including object tracking, video stabilization, medical video segmentation and scene classification

UNT Digital Library