9 research outputs found

    Scale Selective Extended Local Binary Pattern for Texture Classification

    Full text link
    In this paper, we propose a new texture descriptor, scale selective extended local binary pattern (SSELBP), to characterize texture images with scale variations. We first utilize multi-scale extended local binary patterns (ELBP) with rotation-invariant and uniform mappings to capture robust local micro- and macro-features. Then, we build a scale space using Gaussian filters and calculate the histogram of multi-scale ELBPs for the image at each scale. Finally, we select the maximum values from the corresponding bins of multi-scale ELBP histograms at different scales as scale-invariant features. A comprehensive evaluation on public texture databases (KTH-TIPS and UMD) shows that the proposed SSELBP has high accuracy comparable to state-of-the-art texture descriptors on gray-scale-, rotation-, and scale-invariant texture classification but uses only one-third of the feature dimension.Comment: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 201

    Gabor Contrast Patterns: A Novel Framework to Extract Features From Texture Images

    Get PDF
    In this paper, a novel rotation and scale invariant approach for texture classification based on Gabor filters has been proposed. These filters are designed to capture the visual content of the images based on their impulse responses which are sensitive to rotation and scaling in the images. The filter responses are rearranged according to the filter exhibiting the response having largest amplitude, followed by the calculation of patterns after binarizing the responses based on a particular threshold. This threshold is obtained as the average energy of Gabor filter responses at a particular pixel. The binary patterns are converted to decimal numbers, the histograms of which are used as texture features. The proposed features are used to classify the images from two famous texture datasets: Brodatz, CUReT and UMD texture albums. Experiments show that the proposed feature extraction method performs really well when compared with several other state-of-the-art methods considered in this paper and is more robust to noise

    Towards Realistic Facial Expression Recognition

    Get PDF
    Automatic facial expression recognition has attracted significant attention over the past decades. Although substantial progress has been achieved for certain scenarios (such as frontal faces in strictly controlled laboratory settings), accurate recognition of facial expression in realistic environments remains unsolved for the most part. The main objective of this thesis is to investigate facial expression recognition in unconstrained environments. As one major problem faced by the literature is the lack of realistic training and testing data, this thesis presents a web search based framework to collect realistic facial expression dataset from the Web. By adopting an active learning based method to remove noisy images from text based image search results, the proposed approach minimizes the human efforts during the dataset construction and maximizes the scalability for future research. Various novel facial expression features are then proposed to address the challenges imposed by the newly collected dataset. Finally, a spectral embedding based feature fusion framework is presented to combine the proposed facial expression features to form a more descriptive representation. This thesis also systematically investigates how the number of frames of a facial expression sequence can affect the performance of facial expression recognition algorithms, since facial expression sequences may be captured under different frame rates in realistic scenarios. A facial expression keyframe selection method is proposed based on keypoint based frame representation. Comprehensive experiments have been performed to demonstrate the effectiveness of the presented methods

    Radon Projections as Image Descriptors for Content-Based Retrieval of Medical Images

    Get PDF
    Clinical analysis and medical diagnosis of diverse diseases adopt medical imaging techniques to empower specialists to perform their tasks by visualizing internal body organs and tissues for classifying and treating diseases at an early stage. Content-Based Image Retrieval (CBIR) systems are a set of computer vision techniques to retrieve similar images from a large database based on proper image representations. Particularly in radiology and histopathology, CBIR is a promising approach to effectively screen, understand, and retrieve images with similar level of semantic descriptions from a database of previously diagnosed cases to provide physicians with reliable assistance for diagnosis, treatment planning and research. Over the past decade, the development of CBIR systems in medical imaging has expedited due to the increase in digitized modalities, an increase in computational efficiency (e.g., availability of GPUs), and progress in algorithm development in computer vision and artificial intelligence. Hence, medical specialists may use CBIR prototypes to query similar cases from a large image database based solely on the image content (and no text). Understanding the semantics of an image requires an expressive descriptor that has the ability to capture and to represent unique and invariant features of an image. Radon transform, one of the oldest techniques widely used in medical imaging, can capture the shape of organs in form of a one-dimensional histogram by projecting parallel rays through a two-dimensional object of concern at a specific angle. In this work, the Radon transform is re-designed to (i) extract features and (ii) generate a descriptor for content-based retrieval of medical images. Radon transform is applied to feed a deep neural network instead of raw images in order to improve the generalization of the network. Specifically, the framework is composed of providing Radon projections of an image to a deep autoencoder, from which the deepest layer is isolated and fed into a multi-layer perceptron for classification. This approach enables the network to (a) train much faster as the Radon projections are computationally inexpensive compared to raw input images, and (b) perform more accurately as Radon projections can make more pronounced and salient features to the network compared to raw images. This framework is validated on a publicly available radiography data set called "Image Retrieval in Medical Applications" (IRMA), consisting of 12,677 train and 1,733 test images, for which an classification accuracy of approximately 82% is achieved, outperforming all autoencoder strategies reported on the Image Retrieval in Medical Applications (IRMA) dataset. The classification accuracy is calculated by dividing the total IRMA error, a calculation outlined by the authors of the data set, with the total number of test images. Finally, a compact handcrafted image descriptor based on Radon transform was designed in this work that is called "Forming Local Intersections of Projections" (FLIP). The FLIP descriptor has been designed, through numerous experiments, for representing histopathology images. The FLIP descriptor is based on Radon transform wherein parallel projections are applied in a local 3x3 neighborhoods with 2 pixel overlap of gray-level images (staining of histopathology images is ignored). Using four equidistant projection directions in each window, the characteristics of the neighborhood is quantified by taking an element-wise minimum between each adjacent projection in each window. Thereafter, the FLIP histogram (descriptor) for each image is constructed. A multi-resolution FLIP (mFLIP) scheme is also proposed which is observed to outperform many state-of-the-art methods, among others deep features, when applied on the histopathology data set KIMIA Path24. Experiments show a total classification accuracy of approximately 72% using SVM classification, which surpasses the current benchmark of approximately 66% on the KIMIA Path24 data set

    HandSight: A Touch-Based Wearable System to Increase Information Accessibility for People with Visual Impairments

    Get PDF
    Many activities of daily living such as getting dressed, preparing food, wayfinding, or shopping rely heavily on visual information, and the inability to access that information can negatively impact the quality of life for people with vision impairments. While numerous researchers have explored solutions for assisting with visual tasks that can be performed at a distance, such as identifying landmarks for navigation or recognizing people and objects, few have attempted to provide access to nearby visual information through touch. Touch is a highly attuned means of acquiring tactile and spatial information, especially for people with vision impairments. By supporting touch-based access to information, we may help users to better understand how a surface appears (e.g., document layout, clothing patterns), thereby improving the quality of life. To address this gap in research, this dissertation explores methods to augment a visually impaired user’s sense of touch with interactive, real-time computer vision to access information about the physical world. These explorations span three application areas: reading and exploring printed documents, controlling mobile devices, and identifying colors and visual textures. At the core of each application is a system called HandSight that uses wearable cameras and other sensors to detect touch events and identify surface content beneath the user’s finger. To create HandSight, we designed and implemented the physical hardware, developed signal processing and computer vision algorithms, and designed real-time feedback that enables users to interpret visual or digital content. We involve visually impaired users throughout the design and development process, conducting several user studies to assess usability and robustness and to improve our prototype designs. The contributions of this dissertation include: (i) developing and iteratively refining HandSight, a novel wearable system to assist visually impaired users in their daily lives; (ii) evaluating HandSight across a diverse set of tasks, and identifying tradeoffs of a finger-worn approach in terms of physical design, algorithmic complexity and robustness, and usability; and (iii) identifying broader design implications for future wearable systems and for the fields of accessibility, computer vision, augmented and virtual reality, and human-computer interaction

    물체 수송을 위한 협업 로봇의 행동 연구

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. 이범희.This dissertation presents two cooperative object transportation techniques according to the characteristics of objects: passive and active. The passive object is a typical object, which cannot communicate with and detect other robots. The active object, however, has abilities to communicate with robots and can measure the distance from other robots using proximity sensors. Typical areas of research in cooperative object transportation include grasping, pushing, and caging techniques, but these require precise grasping behaviors, iterative motion correction according to the object pose, and the real-time acquisition of the object shape, respectively. For solving these problems, we propose two new object transportation techniques by considering the properties of objects. First, this dissertation presents a multi-agent behavior to cooperatively transport an active object using a sound signal and interactive communication. We first developed a sound localization method, which estimates the sound source from an active object by using three microphone sensors. Next, since the active object cannot be recalled by only a single robot, the robots organized a heterogeneous team by themselves with a pusher, a puller, and a supervisor. This self-organized team succeeded in moving the active object to a goal using the cooperation of its neighboring robots and interactive communication between the object and robots. Second, this dissertation presents a new cooperative passive object transportation technique using cyclic shift motion. The proposed technique does not need to consider the shape or the pose of objects, and equipped tools are also unnecessary for object transportation. Multiple robots create a parallel row formation using a virtual electric dipole field and then push multiple objects into the formation. This parallel row is extended to the goal using cyclic motion by the robots. The above processes are decentralized and activated based on the finite state machine of each robot. Simulations and practical experiments are presented to verify the proposed techniques.Chapter 1 Introduction 1 1.1 Background and Motivation 1 1.2 Related Work 4 1.2.1 The Categories of Object Transportation Techniques 4 1.2.2 Sound Localization Techniques for Active Object Transportation 7 1.3 Contributions 8 1.4 Organization 10 Chapter 2 Object Transportation Problem 11 2.1 Passive Object versus Active Object 11 2.2 Problem Formulation 13 2.3 Assumptions 13 Chapter 3 Active Object Transportation using a Sound Signal and Interactive Communication 15 3.1 Overview of Active Object Transportation 16 3.2 Sound Vector Generation using Triple Microphones 17 3.2.1 Sound Isocontour Generation using ILD 18 3.2.2 Sound Circle Generation using Inverse-square Law 21 3.2.3 Sound Vector Generation 22 3.3 Cooperative Control Method using Interactive Communication 25 3.3.1 Role Assignment of Multi-robot Team 25 3.3.2 Position Assignment of Multi-robot Team 26 3.3.3 Transportation Process of an Active Object 29 Chapter 4 Passive Object Transportation using Cyclic Shift Motion 33 4.1 Overview of Passive Object Transportation 34 4.2 Multi-robot Team Organization 35 4.3 Row Formation Generation using Multiple Robots 37 4.3.1 Cyclic Shift Motion 37 4.3.2 Path Generation using Virtual Electric Dipole Field 39 4.3.3 Path Following using Bang-bang Controller 42 4.4 Multi-object Transportation by a Decentralized Multi-robot Team 45 4.4.1 Information Acquisition Methods for Finite State Machine 45 4.4.2 Finite State Machines (FSMs) 48 4.4.2.1 The FSM of Guider Robots 49 4.4.2.2 The FSM of a Pusher Robot 52 4.4.2.3 The FSM of a Leader Robot 54 4.4.3 Object Transportation Process 55 4.4.4 Formation Constraints for Curved Transportation Path 57 Chapter 5 Simulation Results 61 5.1 Simulation Environment 61 5.2 Simulation Result of Passive Object Transportation 63 5.3 Comparison Results with Other Passive Object Transportation Techniques 69 5.3.1 Simulation Result of Leader-Follower Technique 70 5.3.2 Simulation Result of Caging Technique 72 Chapter 6 Practical Experiments 77 6.1 Experimental Environment 77 6.2 Experimental Results of Active Object Transportation 81 6.2.1 Experimental Result of the SV Estimation 81 6.2.2 Experimental Result of Active Object Transportation 82 6.3 Experimental Results of Passive Object Transportation 86 6.3.1 Small-object Transportation with Straight Path 86 6.3.2 Small-object Transportation with Curved Path 91 6.3.3 Large-object Transportation 93 6.4 Comparison Result with Caging Technique 95 Chapter 7 Discussion 96 Chapter 8 Conclusions 99 Appendix A: The Approaching Phase of Passive Object Transportation 101 A.1 Approaching Phase 101 A.2 Experimental Result of Approaching Phase 107 Appendix B: Object Transportation in a Static Environment 109 B.1 Overview 109 B.2 Object Transportation Problem in a Static Environment 111 B.3 Multi-object Transportation using Hybrid System 112 B.4 New Finite State Machines 113 B.4.1 The States of Guider Robots 114 B.4.2 The States of a Pusher Robot 115 B.4.3 The States of a Leader Robot 116 B.5 Simulation Results 118 B.5.1 Simulation Result: An Obstacle 118 B.5.2 Simulation Result: Two Obstacles 120 B.6 Practical Experiment 122 Bibliography 124Docto

    Actas de las XXXIV Jornadas de Automática

    Get PDF
    Postprint (published version
    corecore