42 research outputs found

    Novel color and local image descriptors for content-based image search

    Get PDF
    Content-based image classification, search and retrieval is a rapidly-expanding research area. With the advent of inexpensive digital cameras, cheap data storage, fast computing speeds and ever-increasing data transfer rates, millions of images are stored and shared over the Internet every day. This necessitates the development of systems that can classify these images into various categories without human intervention and on being presented a query image, can identify its contents in order to retrieve similar images. Towards that end, this dissertation focuses on investigating novel image descriptors based on texture, shape, color, and local information for advancing content-based image search. Specifically, first, a new color multi-mask Local Binary Patterns (mLBP) descriptor is presented to improve upon the traditional Local Binary Patterns (LBP) texture descriptor for better image classification performance. Second, the mLBP descriptors from different color spaces are fused to form the Color LBP Fusion (CLF) and Color Grayscale LBP Fusion (CGLF) descriptors that further improve image classification performance. Third, a new HaarHOG descriptor, which integrates the Haar wavelet transform and the Histograms of Oriented Gradients (HOG), is presented for extracting both shape and local information for image classification. Next, a novel three Dimensional Local Binary Patterns (3D-LBP) descriptor is proposed for color images by encoding both color and texture information for image search. Furthermore, the novel 3DLH and 3DLH-fusion descriptors are proposed, which combine the HaarHOG and the 3D-LBP descriptors by means of Principal Component Analysis (PCA) and are able to improve upon the individual HaarHOG and 3D-LBP descriptors for image search. Subsequently, the innovative H-descriptor, and the H-fusion descriptor are presented that improve upon the 3DLH descriptor. Finally, the innovative Bag of Words-LBP (BoWL) descriptor is introduced that combines the idea of LBP with a bag-of-words representation to further improve image classification performance. To assess the feasibility of the proposed new image descriptors, two classification frameworks are used. In one, the PCA and the Enhanced Fisher Model (EFM) are applied for feature extraction and the nearest neighbor classification rule for classification. In the other, a Support Vector Machine (SVM) is used for classification. The classification performance is tested on several widely used and publicly available image datasets. The experimental results show that the proposed new image descriptors achieve an image classification performance better than or comparable to other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), the Spatial Envelope (SE), the Color SIFT four Concentric Circles (C4CC), the Object Bank (OB), the Hierarchical Matching Pursuit (HMP), the Kernel Spatial Pyramid Matching (KSPM), the SIFT Sparse-coded Spatial Pyramid Matching (ScSPM), the Kernel Codebook (KC) and the LBP

    Investigation of new feature descriptors for image search and classification

    Get PDF
    Content-based image search, classification and retrieval is an active and important research area due to its broad applications as well as the complexity of the problem. Understanding the semantics and contents of images for recognition remains one of the most difficult and prevailing problems in the machine intelligence and computer vision community. With large variations in size, pose, illumination and occlusions, image classification is a very challenging task. A good classification framework should address the key issues of discriminatory feature extraction as well as efficient and accurate classification. Towards that end, this dissertation focuses on exploring new image descriptors by incorporating cues from the human visual system, and integrating local, texture, shape as well as color information to construct robust and effective feature representations for advancing content-based image search and classification. Based on the Gabor wavelet transformation, whose kernels are similar to the 2D receptive field profiles of the mammalian cortical simple cells, a series of new image descriptors is developed. Specifically, first, a new color Gabor-HOG (GHOG) descriptor is introduced by concatenating the Histograms of Oriented Gradients (HOG) of the component images produced by applying Gabor filters in multiple scales and orientations to encode shape information. Second, the GHOG descriptor is analyzed in six different color spaces and grayscale to propose different color GHOG descriptors, which are further combined to present a new Fused Color GHOG (FC-GHOG) descriptor. Third, a novel GaborPHOG (GPHOG) descriptor is proposed which improves upon the Pyramid Histograms of Oriented Gradients (PHOG) descriptor, and subsequently a new FC-GPHOG descriptor is constructed by combining the multiple color GPHOG descriptors and employing the Principal Component Analysis (PCA). Next, the Gabor-LBP (GLBP) is derived by accumulating the Local Binary Patterns (LBP) histograms of the local Gabor filtered images to encode texture and local information of an image. Furthermore, a novel Gabor-LBPPHOG (GLP) image descriptor is proposed which integrates the GLBP and the GPHOG descriptors as a feature set and an innovative Fused Color Gabor-LBP-PHOG (FC-GLP) is constructed by fusing the GLP from multiple color spaces. Subsequently, The GLBP and the GHOG descriptors are then combined to produce the Gabor-LBP-HOG (GLH) feature vector which performs well on different object and scene image categories. The six color GLH vectors are further concatenated to form the Fused Color GLH (FC-GLH) descriptor. Finally, the Wigner based Local Binary Patterns (WLBP) descriptor is proposed that combines multi-neighborhood LBP, Pseudo-Wigner distribution of images and the popular bag of words model to effectively classify scene images. To assess the feasibility of the proposed new image descriptors, two classification methods are used: one method applies the PCA and the Enhanced Fisher Model (EFM) for feature extraction and the nearest neighbor rule for classification, while the other method employs the Support Vector Machine (SVM). The classification performance of the proposed descriptors is tested on several publicly available popular image datasets. The experimental results show that the proposed new image descriptors achieve image search and classification results better than or at par with other popular image descriptors, such as the Scale Invariant Feature Transform (SIFT), the Pyramid Histograms of visual Words (PHOW), the Pyramid Histograms of Oriented Gradients (PHOG), the Spatial Envelope (SE), the Color SIFT four Concentric Circles (C4CC), the Object Bank (OB), the Context Aware Topic Model (CA-TM), the Hierarchical Matching Pursuit (HMP), the Kernel Spatial Pyramid Matching (KSPM), the SIFT Sparse-coded Spatial Pyramid Matching (Sc-SPM), the Kernel Codebook (KC) and the LBP

    Local depth patterns for fine-grained activity recognition in depth videos

    Full text link
    © 2016 IEEE. Fine-grained activities are human activities involving small objects and small movements. Automatic recognition of such activities can prove useful for many applications, including detailed diarization of meetings and training sessions, assistive human-computer interaction and robotics interfaces. Existing approaches to fine-grained activity recognition typically leverage the combined use of multiple sensors including cameras, RFID tags, gyroscopes and accelerometers borne by the monitored people and target objects. Although effective, the downside of these solutions is that they require minute instrumentation of the environment that is intrusive and hard to scale. To this end, this paper investigates fine-grained activity recognition in a kitchen setting by solely using a depth camera. The primary contribution of this work is an aggregated depth descriptor that effectively captures the shape of the objects and the actors. Experimental results over the challenging '50 Salads' dataset of kitchen activities show an accuracy comparable to that of a state-of-the-art approach based on multiple sensors, thereby validating a less intrusive and more practical way of monitoring fine-grained activities

    Object detection for big data

    Get PDF
    "May 2014."Dissertation supervisor: Dr. Tony X. Han.Includes vita.We have observed significant advances in object detection over the past few decades and gladly seen the related research has began to contribute to the world: Vehicles could automatically stop before hitting any pedestrian; Face detectors have been integrated into smart phones and tablets; Video surveillance systems could locate the suspects and stop crimes. All these applications demonstrate the substantial research progress on object detection. However learning a robust object detector is still quite challenging due to the fact that object detection is a very unbalanced big data problem. In this dissertation, we aim at improving the object detector's performance from different aspects. For object detection, the state-of-the-art performance is achieved through supervised learning. The performances of object detectors of this kind are mainly determined by two factors: features and underlying classification algorithms. We have done thorough research on both of these factors. Our contribution involves model adaption, local learning, contextual boosting, template learning and feature development. Since the object detection is an unbalanced problem, in which positive examples are hard to be collected, we propose to adapt a general object detector for a specific scenario with a few positive examples; To handle the large intra-class variation problem lying in object detection task, we propose a local adaptation method to learn a set of efficient and effective detectors for a single object category; To extract the effective context from the huge amount of negative data in object detection, we introduce a novel contextual descriptor to iteratively improve the detector; To detect object with a depth sensor, we design an effective depth descriptor; To distinguish the object categories with the similar appearance, we propose a local feature embedding and template selection algorithm, which has been successfully incorporated into a real-world fine-grained object recognition application. All the proposed algorithms and featuIncludes bibliographical references (pages 117-130)

    Image-based food classification and volume estimation for dietary assessment: a review.

    Get PDF
    A daily dietary assessment method named 24-hour dietary recall has commonly been used in nutritional epidemiology studies to capture detailed information of the food eaten by the participants to help understand their dietary behaviour. However, in this self-reporting technique, the food types and the portion size reported highly depends on users' subjective judgement which may lead to a biased and inaccurate dietary analysis result. As a result, a variety of visual-based dietary assessment approaches have been proposed recently. While these methods show promises in tackling issues in nutritional epidemiology studies, several challenges and forthcoming opportunities, as detailed in this study, still exist. This study provides an overview of computing algorithms, mathematical models and methodologies used in the field of image-based dietary assessment. It also provides a comprehensive comparison of the state of the art approaches in food recognition and volume/weight estimation in terms of their processing speed, model accuracy, efficiency and constraints. It will be followed by a discussion on deep learning method and its efficacy in dietary assessment. After a comprehensive exploration, we found that integrated dietary assessment systems combining with different approaches could be the potential solution to tackling the challenges in accurate dietary intake assessment

    Low-dimensional spatial-textural descriptors of multispectral images.

    Get PDF
    Prepoznavanje vizuelnog i semantickog sadrzaja u slikama primjenom racunarskih programa ima sve veci znacaj u raznim granama privrede i industrije, medicini, vojnoj industriji, itd. Prepoznavanje sadrzaja u slikama se u vecini prakticnih aplikacija oslanja na metode obrade koje na osnovu numerickih vrijednosti na digitalnim slikama odreduju njihov sadrzaj. U mnogim slucajevima vazno je odrediti koliko je sadrzaj dvije slike slican, da li prikazuju isti objekat ili isti dogadaj. Sa druge strane, svjedoci smo da se razvojem moderne tehnologije nezaustavljivo povecava broj generisanih digitalnih slika. Savremeni klinicki centri opremljeni digitalnom radiologijom, dnevno generisu i do nekoliko desetaka hiljada novih snimaka. Manuelno opisivanje sadrzaja tako velikog broja slika predstavlja praktican problem. Takode, svakodnevno dobijamo veliku kolicinu podataka snimljenih tehnikama daljinske detekcije, pri cemu specicne aplikacije zahtjevaju brzo prepoznavanje sadrzaja takvih snimaka. Potreba za prepoznavanjem vizuelnog i semantickog sadrzaja u slikama dovela je do razvoja velikog broja pristupa za opisivanje tog sadrzaja na nacin pogodan za koriscenje u racunarskim sistemima. Cesto se slikama pridruzuju odgovarajuci deskriptori koji treba da opisu" sadrzaj u slikama. Ti deskriptori su vektori numeri ckih vrijednosti ili skup kljucnih rijeci, koji treba da budu odredeni tako da se pomocu njih mogu razlikovati slike razlicitog vizuelnog ili semantickog sadrzaja ili prepoznati slike slicnog sadrzaja. Posto ljudski vizuelni sistem ekasno koristi informacije o teksturi za prepoznavanje objekata, u prakticnim aplikacijama se cesto koriste deskriptori teksture. Razvoj tehnologije omogucio je upotrebu jeftinih multispektralnih kamera, pa se postavlja pitanje kako opisati sadrzaj slika sa vecim brojem spektralnih opsega. Jednostavno prosirivanje deskriptora i upotreba dodatnih podataka moze posluziti da se na odgovarajuci nacin opise sadrzaj multispektralnih slika, ali sa znacajnim povecanjem potrebnih memorijskih resursa i racunarske kompleksnosti. U ovoj disertaciji su predlozene su metode za izdvajanje niskodimenzionalnih deskriptora multispektralnih slika, pogodnih za automatsku klasikaciju slika. Takode, razmotreni su pristupi za ukljucivanje podataka o prostornom rasporedu lokalnih obiljezja na slikama u deskriptor, kako bi se povecala tacnost klasikacije. Na kraju, predlozena je nova metoda za izdvajanje niskodimenzionalnih prostorno-teksturalnih deskriptora za multispektralne slike...Recognition of visual and semantic content on images using computer programs gained an importance in various elds of agriculture, industry, medicine, military industry etc. Most practical applications use certain methods based on the numerical value of the digital images to determine what is a content of those images. In many cases, it is important to determine a level of visual or semantic similarity between two dierent images, does two images showing the same object or maybe the same event. We are witnessing that development of modern technology cause unstoppable increase of the number of daily generated digital images. Modern clinical centers, equipped with digital radiology, generate up to tens of thousands of new images per day, so their manual annotation presents a practical problem. Moreover, each day brings a large amount of remotely sensed images and many specic applications require fast identication of their visual content. The need to recognize visual and semantic content in images initiated the development of a large number of methods for description of image contents, in such a manner suitable for use in specic computer systems. Images are is associated with appropriate descriptors that should describe" the visual or semantic content of those images. These descriptors can be vectors with numerical values, which should be calculated so it is possible to use them to distinguish between images with dierent visual or semantic content or to recognize images with similar content. Since the human visual system eectively relies on texture to identify objects, texture descriptors are often used in practical applications. Technology development enable the widespread usage of cheap multispectral cameras, which can capture the data beyond visible spectra. Thus, it is necessary to investigate how to represent and describe the content of multispectral images in the way suitable for practical applications based on image classication. Simple extension of descriptors can increase classication accuracy, but with the cost of more memory resources and computational complexity. In this dissertation, dierent methods for extraction of low-dimensional descriptors for multispectral images are proposed, which used for automatic image classi cation. Moreover, the usage of spatial position of local textural features is discussed as well. It was concluded that extension of texture descriptor of grayscale images with additional data providing spatial-based texture features, can increase classication accuracy..

    Parallel Tracking and Mapping for Manipulation Applications with Golem Krang

    Get PDF
    Implementing a simultaneous localization and mapping system and an image semantic segmentation method on a mobile manipulation. The application of the SLAM is working towards navigating among obstacles in unknown environments. The object detection method will be integrated for future manipulation tasks such as grasping. This work will be demonstrated on a real robotics hardware system in the lab.Outgoin

    Characterizing Objects in Images using Human Context

    Get PDF
    Humans have an unmatched capability of interpreting detailed information about existent objects by just looking at an image. Particularly, they can effortlessly perform the following tasks: 1) Localizing various objects in the image and 2) Assigning functionalities to the parts of localized objects. This dissertation addresses the problem of aiding vision systems accomplish these two goals. The first part of the dissertation concerns object detection in a Hough-based framework. To this end, the independence assumption between features is addressed by grouping them in a local neighborhood. We study the complementary nature of individual and grouped features and combine them to achieve improved performance. Further, we consider the challenging case of detecting small and medium sized household objects under human-object interactions. We first evaluate appearance based star and tree models. While the tree model is slightly better, appearance based methods continue to suffer due to deficiencies caused by human interactions. To this end, we successfully incorporate automatically extracted human pose as a form of context for object detection. The second part of the dissertation addresses the tedious process of manually annotating objects to train fully supervised detectors. We observe that videos of human-object interactions with activity labels can serve as weakly annotated examples of household objects. Since such objects cannot be localized only through appearance or motion, we propose a framework that includes human centric functionality to retrieve the common object. Designed to maximize data utility by detecting multiple instances of an object per video, the framework achieves performance comparable to its fully supervised counterpart. The final part of the dissertation concerns localizing functional regions or affordances within objects by casting the problem as that of semantic image segmentation. To this end, we introduce a dataset involving human-object interactions with strong i.e. pixel level and weak i.e. clickpoint and image level affordance annotations. We propose a framework that utilizes both forms of weak labels and demonstrate that efforts for weak annotation can be further optimized using human context

    Structural learning for large scale image classification

    Get PDF
    To leverage large-scale collaboratively-tagged (loosely-tagged) images for training a large number of classifiers to support large-scale image classification, we need to develop new frameworks to deal with the following issues: (1) spam tags, i.e., tags are not relevant to the semantic of the images; (2) loose object tags, i.e., multiple object tags are loosely given at the image level without their locations in the images; (3) missing object tags, i.e. some object tags are missed due to incomplete tagging; (4) inter-related object classes, i.e., some object classes are visually correlated and their classifiers need to be trained jointly instead of independently; (5) large scale object classes, which requires to limit the computational time complexity for classifier training algorithms as well as the storage spaces for intermediate results. To deal with these issues, we propose a structural learning framework which consists of the following key components: (1) cluster-based junk image filtering to address the issue of spam tags; (2) automatic tag-instance alignment to address the issue of loose object tags; (3) automatic missing object tag prediction; (4) object correlation network for inter-class visual correlation characterization to address the issue of missing tags; (5) large-scale structural learning with object correlation network for enhancing the discrimination power of object classifiers. To obtain enough numbers of labeled training images, our proposed framework leverages the abundant web images and their social tags. To make those web images usable, tag cleansing has to be done to neutralize the noise from user tagging preferences, in particularly junk tags, loose tags and missing tags. Then a discriminative learning algorithm is developed to train a large number of inter-related classifiers for achieving large-scale image classification, e.g., learning a large number of classifiers for categorizing large-scale images into a large number of inter-related object classes and image concepts. A visual concept network is first constructed for organizing enumorus object classes and image concepts according to their inter-concept visual correlations. The visual concept network is further used to: (a) identify inter-related learning tasks for classifier training; (b) determine groups of visually-similar object classes and image concepts; and (c) estimate the learning complexity for classifier training. A large-scale discriminative learning algorithm is developed for supporting multi-class classifier training and achieving accurate inter-group discrimination and effective intra-group separation. Our discriminative learning algorithm can significantly enhance the discrimination power of the classifiers and dramatically reduce the computational cost for large-scale classifier training
    corecore