7 research outputs found

    Mixture Models for Multidimensional Positive Data Clustering with Applications to Image Categorization and Retrieval

    Get PDF
    Model-based approaches have become important tools to model data and infer knowledge. Such approaches are often used for clustering and object recognition which are crucial steps in many applications, including but not limited to, recommendation systems, search engines, cyber security, surveillance and object tracking. Many of these applications have the urgent need to reduce the semantic gap of data representation between the system level and the human being understandable level. Indeed, the low level features extracted to represent a given object can be confusing to machines which cannot differentiate between very similar objects trivially distinguishable by human beings (e.g. apple vs tomato). Such a semantic gap between the system and the user perception for data, makes the modeling process hard to be designed basing on the features space only. Moreover those models should be flexible and updatable when new data are introduced to the system. Thus, apart from estimating the model parameters, the system should be somehow informed how new data should be perceived according to some criteria in order to establish model updates. In this thesis we propose a methodology for data representation using a hierarchical mixture model basing on the inverted Dirichlet and the generalized inverted Dirichlet distributions. The proposed approach allows to model a given object class by a set of components deduced by the system and grouped according to labeled training data representing the human level semantic. We propose an update strategy to the system components that takes into account adjustable metrics representing users perception. We also consider the "page zero" problem in image retrieval systems when a given user does not possess adequate tools and semantics to express what he/she is looking for, while he/she can visually identify it. We propose a statistical framework that enables users to start a search process and interact with the system in order to find their target "mental image". Finally we propose to improve our models by using a variational Bayesian inference to learn generalized inverted Dirichlet mixtures with features selection. The merit of our approaches is evaluated using extensive simulations and real life applications

    Positive data clustering using finite inverted dirichlet mixture models

    Get PDF
    In this thesis we present an unsupervised algorithm for learning finite mixture models from multivariate positive data. Indeed, this kind of data appears naturally in many applications, yet it has not been adequately addressed in the past. This mixture model is based on the inverted Dirichlet distribution, which offers a good representation and modeling of positive non gaussian data. The proposed approach for estimating the parameters of an inverted Dirichlet mixture is based on the maximum likelihood (ML) using Newton Raphson method. We also develop an approach, based on the Minimum Message Length (MML) criterion, to select the optimal number of clusters to represent the data using such a mixture. Experimental results are presented using artificial histograms and real data sets. The challenging problem of software modules classification is investigated within the proposed statistical framework, also

    Visual object categorization with new keypoint-based adaBoost features

    Full text link
    We present promising results for visual object categorization, obtained with adaBoost using new original ?keypoints-based features?. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a ?keypoint? (a kind of SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Preliminary tests on a small subset of a pedestrians database also gives promising 97% recall with 92 % precision, which shows the generality of our new family of features. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as ?wheel? or ?side skirt? in the case of lateral-cars) and thus have a ?semantic? meaning. We also made a first test on video for detecting vehicles from adaBoostselected keypoints filtered in real-time from all detected keypoints

    Bayesian learning of inverted Dirichlet mixtures for SVM kernels generation

    Get PDF
    We describe approaches for positive data modeling and classification using both finite inverted Dirichlet mixture models and support vector machines (SVMs). Inverted Dirichlet mixture models are used to tackle an outstanding challenge in SVMs namely the generation of accurate kernels. The kernels generation approaches, grounded on ideas from information theory that we consider, allow the incorporation of data structure and its structural constraints. Inverted Dirichlet mixture models are learned within a principled Bayesian framework using both Gibbs sampler and Metropolis-Hastings for parameter estimation and Bayes factor for model selection (i.e., determining the number of mixture’s components). Our Bayesian learning approach uses priors, which we derive by showing that the inverted Dirichlet distribution belongs to the family of exponential distributions, over the model parameters, and then combines these priors with information from the data to build posterior distributions. We illustrate the merits and the effectiveness of the proposed method with two real-world challenging applications namely object detection and visual scenes analysis and classification

    AdaBoost with "keypoint presence features" for real-time vehivle visual detection

    No full text
    International audienceWe present promising results for real-time vehicle visual detection, obtained with adaBoost using new original “keypoints presence features”. These weak-classifiers produce a boolean response based on presence or absence in the tested image of a “keypoint” (~ a SURF interest point) with a descriptor sufficiently similar (i.e. within a given distance) to a reference descriptor characterizing the feature. A first experiment was conducted on a public image dataset containing lateral-viewed cars, yielding 95% recall with 95% precision on test set. Moreover, analysis of the positions of adaBoost-selected keypoints show that they correspond to a specific part of the object category (such as “wheel” or “side skirt”) and thus have a “semantic” meaning
    corecore