84 research outputs found

    Next Generation of Product Search and Discovery

    Get PDF
    Online shopping has become an important part of people’s daily life with the rapid development of e-commerce. In some domains such as books, electronics, and CD/DVDs, online shopping has surpassed or even replaced the traditional shopping method. Compared with traditional retailing, e-commerce is information intensive. One of the key factors to succeed in e-business is how to facilitate the consumers’ approaches to discover a product. Conventionally a product search engine based on a keyword search or category browser is provided to help users find the product information they need. The general goal of a product search system is to enable users to quickly locate information of interest and to minimize users’ efforts in search and navigation. In this process human factors play a significant role. Finding product information could be a tricky task and may require an intelligent use of search engines, and a non-trivial navigation of multilayer categories. Searching for useful product information can be frustrating for many users, especially those inexperienced users. This dissertation focuses on developing a new visual product search system that effectively extracts the properties of unstructured products, and presents the possible items of attraction to users so that the users can quickly locate the ones they would be most likely interested in. We designed and developed a feature extraction algorithm that retains product color and local pattern features, and the experimental evaluation on the benchmark dataset demonstrated that it is robust against common geometric and photometric visual distortions. Besides, instead of ignoring product text information, we investigated and developed a ranking model learned via a unified probabilistic hypergraph that is capable of capturing correlations among product visual content and textual content. Moreover, we proposed and designed a fuzzy hierarchical co-clustering algorithm for the collaborative filtering product recommendation. Via this method, users can be automatically grouped into different interest communities based on their behaviors. Then, a customized recommendation can be performed according to these implicitly detected relations. In summary, the developed search system performs much better in a visual unstructured product search when compared with state-of-art approaches. With the comprehensive ranking scheme and the collaborative filtering recommendation module, the user’s overhead in locating the information of value is reduced, and the user’s experience of seeking for useful product information is optimized

    Solving Correspondences for Non-Rigid Deformations

    Get PDF
    Projecte final de carrera realitzat en col.laboraciĂł amb l'IR

    Feature Encoding of Spectral Descriptors for 3D Shape Recognition

    Get PDF
    Feature descriptors have become a ubiquitous tool in shape analysis. Features can be extracted and subsequently used to design discriminative signatures for solving a variety of 3D shape analysis problems. In particular, shape classification and retrieval are intriguing and challenging problems that lie at the crossroads of computer vision, geometry processing, machine learning and medical imaging. In this thesis, we propose spectral graph wavelet approaches for the classification and retrieval of deformable 3D shapes. First, we review the recent shape descriptors based on the spectral decomposition of the Laplace-Beltrami operator, which provides a rich set of eigenbases that are invariant to intrinsic isometries. We then provide a detailed overview of spectral graph wavelets. In an effort to capture both local and global characteristics of a 3D shape, we propose a three-step feature description framework. Local descriptors are first extracted via the spectral graph wavelet transform having the Mexican hat wavelet as a generating kernel. Then, mid-level features are obtained by embedding local descriptors into the visual vocabulary space using the soft-assignment coding step of the bag-of-features model. A global descriptor is subsequently constructed by aggregating mid-level features weighted by a geodesic exponential kernel, resulting in a matrix representation that describes the frequency of appearance of nearby codewords in the vocabulary. In order to analyze the performance of the proposed algorithms on 3D shape classification, support vector machines and deep belief networks are applied to mid-level features. To assess the performance of the proposed approach for nonrigid 3D shape retrieval, we compare the global descriptor of a query to the global descriptors of the rest of shapes in the dataset using a dissimilarity measure and find the closest shape. Experimental results on three standard 3D shape benchmarks demonstrate the effectiveness of the proposed classification and retrieval approaches in comparison with state-of-the-art methods

    Spectral Geometric Methods for Deformable 3D Shape Retrieval

    Get PDF
    As 3D applications ranging from medical imaging to industrial design continue to grow, so does the importance of developing robust 3D shape retrieval systems. A key issue in developing an accurate shape retrieval algorithm is to design an efïŹcient shape descriptor for which an index can be built, and similarity queries can be answered efïŹciently. While the overwhelming majority of prior work on 3D shape analysis has concentrated primarily on rigid shape retrieval, many real objects such as articulated motions of humans are nonrigid and hence can exhibit a variety of poses and deformations. In this thesis, we present novel spectral geometric methods for analyzing and distinguishing between deformable 3D shapes. First, we comprehensively review recent shape descriptors based on the spectral decomposition of the Laplace-Beltrami operator, which provides a rich set of eigenbases that are invariant to intrinsic isometries. Then we provide a general and ïŹ‚exible framework for the analysis and design of shape signatures from the spectral graph wavelet perspective. In a bid to capture the global and local geometry, we propose a multiresolution shape signature based on a cubic spline wavelet generating kernel. This signature delivers best-in-class shape retrieval performance. Second, we investigate the ambiguity modeling of codebook for the densely distributed low-level shape descriptors. Inspired by the ability of spatial cues to improve discrimination between shapes, we also propose to adopt the isocontours of the second eigenfunction of the Laplace-Beltrami operator to perform surface partition, which can signiïŹcantly ameliorate the retrieval performance of the time-scaled local descriptors. To further enhance the shape retrieval accuracy, we introduce an intrinsic spatial pyramid matching approach. Extensive experiments are carried out on two 3D shape benchmarks to assess the performance of the proposed spectral geometric approaches in comparison with state-of-the-art methods

    Robust and Efficient Camera-based Scene Reconstruction

    Get PDF
    For the simultaneous reconstruction of 3D scene geometry and camera poses from images or videos, there are two major approaches: On the one hand it is possible to perform a sparse reconstruction by extracting recognizable features from multiple images which correspond to the same 3D points in the scene. With those features, the positions of the 3D points as well as the camera poses can be obtained such that they explain the positions of the features in the images best. On the other hand, on video data, a dense reconstruction can be obtained by alternating between the tracking of the camera pose and updating a depth map representing the scene per frame of the video. In this dissertation, we introduce several improvements to both reconstruction strategies. We start from improving the reliability of image feature matches which leads to faster and more robust subsequent processing. Then, we present a sparse reconstruction pipeline completely optimized for high resolution and high frame rate video, exploiting the redundancy in the data to gain more efficiency. For (semi-)dense reconstruction on camera rigs which is prone to calibration inaccuracies, we show how to model and recover the rig calibration online in the reconstruction process. Finally, we explore the applicability of machine learning based on neural networks to the relative camera pose problem, focusing mainly on generating optimal training data. Robust and fast 3D reconstruction of the environment is demanded in several currently emerging applications ranging from set scanning for movies and computer games over inside-out tracking based augmented reality devices to autonomous robots and drones as well as self-driving cars.FĂŒr die gemeinsame Rekonstruktion von 3D Szenengeometrie und Kamera-Posen aus Bildern oder Videos gibt es zwei grundsĂ€tzliche AnsĂ€tze: Auf der einen Seite kann eine aus wenigen OberflĂ€chen-Punkten bestehende Rekonstruktion erstellt werden, indem einzelne wiedererkennbare Features, die zum selben 3D-Punkt der Szene gehören, aus Bildern extrahiert werden. Mit diesen Features können die Position der 3D-Punkte sowie die Posen der Kameras so bestimmt werden, dass sie die Positionen der Features in den Bildern bestmöglich erklĂ€ren. Auf der anderen Seite können bei Videos dichter gesampelte OberflĂ€chen rekonstruiert werden, indem fĂŒr jedes Einzelbild zuerst die Kamera-Pose bestimmt und dann die Szenengeometrie, die als Tiefenkarte vorhanden ist, verbessert wird. In dieser Dissertation werden verschiedene Verbesserungen fĂŒr beide Rekonstruktionsstrategien vorgestellt. Wir beginnen damit, die ZuverlĂ€ssigkeit beim Finden von Bildfeature-Paaren zu erhöhen, was zu einer robusteren und schnelleren Verarbeitung in den weiteren Rekonstruktionsschritten fĂŒhrt. Außerdem prĂ€sentieren wir eine Rekonstruktions-Pipeline fĂŒr die Feature-basierte Rekonstruktion, die auf hohe Auflösungen und Bildwiederholraten optimiert ist und die Redundanz in entsprechenden Daten fĂŒr eine effizientere Verarbeitung ausnutzt. FĂŒr die dichte Rekonstruktion von OberflĂ€chen mit Multi-Kamera-Rigs, welche anfĂ€llig fĂŒr Kalibrierungsungenauigkeiten ist, beschreiben wir, wie die Posen der Kameras innerhalb des Rigs modelliert und im Rekonstruktionsprozess laufend bestimmt werden können. Schließlich untersuchen wir die Anwendbarkeit von maschinellem Lernen basierend auf neuralen Netzen auf das Problem der Bestimmung der relativen Kamera-Pose. Unser Hauptaugenmerk liegt dabei auf dem Generieren möglichst optimaler Trainingsdaten. Eine robuste und schnelle 3D-Rekonstruktion der Umgebung wird in vielen zur Zeit aufstrebenden Anwendungsgebieten benötigt: Beim Erzeugen virtueller Abbilder realer Umgebungen fĂŒr Filme und Computerspiele, bei inside-out Tracking basierten Augmented Reality GerĂ€ten, fĂŒr autonome Roboter und Drohnen sowie bei selbstfahrenden Autos

    Action recognition in depth videos using nonparametric probabilistic graphical models

    Get PDF
    Action recognition involves automatically labelling videos that contain human motion with action classes. It has applications in diverse areas such as smart surveillance, human computer interaction and content retrieval. The recent advent of depth sensing technology that produces depth image sequences has offered opportunities to solve the challenging action recognition problem. The depth images facilitate robust estimation of a human skeleton’s 3D joint positions and a high level action can be inferred from a sequence of these joint positions. A natural way to model a sequence of joint positions is to use a graphical model that describes probabilistic dependencies between the observed joint positions and some hidden state variables. A problem with these models is that the number of hidden states must be fixed a priori even though for many applications this number is not known in advance. This thesis proposes nonparametric variants of graphical models with the number of hidden states automatically inferred from data. The inference is performed in a full Bayesian setting by using the Dirichlet Process as a prior over the model’s infinite dimensional parameter space. This thesis describes three original constructions of nonparametric graphical models that are applied in the classification of actions in depth videos. Firstly, the action classes are represented by a Hidden Markov Model (HMM) with an unbounded number of hidden states. The formulation enables information sharing and discriminative learning of parameters. Secondly, a hierarchical HMM with an unbounded number of actions and poses is used to represent activities. The construction produces a simplified model for activity classification by using logistic regression to capture the relationship between action states and activity labels. Finally, the action classes are modelled by a Hidden Conditional Random Field (HCRF) with the number of intermediate hidden states learned from data. Tractable inference procedures based on Markov Chain Monte Carlo (MCMC) techniques are derived for all these constructions. Experiments with multiple benchmark datasets confirm the efficacy of the proposed approaches for action recognition

    Sparse Modeling for Image and Vision Processing

    Get PDF
    In recent years, a large amount of multi-disciplinary research has been conducted on sparse models and their applications. In statistics and machine learning, the sparsity principle is used to perform model selection---that is, automatically selecting a simple model among a large collection of them. In signal processing, sparse coding consists of representing data with linear combinations of a few dictionary elements. Subsequently, the corresponding tools have been widely adopted by several scientific communities such as neuroscience, bioinformatics, or computer vision. The goal of this monograph is to offer a self-contained view of sparse modeling for visual recognition and image processing. More specifically, we focus on applications where the dictionary is learned and adapted to data, yielding a compact representation that has been successful in various contexts.Comment: 205 pages, to appear in Foundations and Trends in Computer Graphics and Visio
    • 

    corecore