5 research outputs found

    Knock-Knock: Acoustic Object Recognition using Stacked Denoising Autoencoders

    Get PDF
    This paper presents a successful application of deep learning for object recognition based on acoustic data. The shortcomings of previously employed approaches where handcrafted features describing the acoustic data are being used, include limiting the capability of the found representation to be widely applicable and facing the risk of capturing only insignificant characteristics for a task. In contrast, there is no need to define the feature representation format when using multilayer/deep learning architecture methods: features can be learned from raw sensor data without defining discriminative characteristics a-priori. In this paper, stacked denoising autoencoders are applied to train a deep learning model. Knocking each object in our test set 120 times with a marker pen to obtain the auditory data, thirty different objects were successfully classified in our experiment and each object was knocked 120 times by a marker pen to obtain the auditory data. By employing the proposed deep learning framework, a high accuracy of 91.50% was achieved. A traditional method using handcrafted features with a shallow classifier was taken as a benchmark and the attained recognition rate was only 58.22%. Interestingly, a recognition rate of 82.00% was achieved when using a shallow classifier with raw acoustic data as input. In addition, we could show that the time taken to classify one object using deep learning was far less (by a factor of more than 6) than utilizing the traditional method. It was also explored how different model parameters in our deep architecture affect the recognition performance.Comment: 6 pages, 10 figures, Neurocomputin

    3D Shape Perception from Monocular Vision, Touch, and Shape Priors

    Full text link
    Perceiving accurate 3D object shape is important for robots to interact with the physical world. Current research along this direction has been primarily relying on visual observations. Vision, however useful, has inherent limitations due to occlusions and the 2D-3D ambiguities, especially for perception with a monocular camera. In contrast, touch gets precise local shape information, though its efficiency for reconstructing the entire shape could be low. In this paper, we propose a novel paradigm that efficiently perceives accurate 3D object shape by incorporating visual and tactile observations, as well as prior knowledge of common object shapes learned from large-scale shape repositories. We use vision first, applying neural networks with learned shape priors to predict an object's 3D shape from a single-view color image. We then use tactile sensing to refine the shape; the robot actively touches the object regions where the visual prediction has high uncertainty. Our method efficiently builds the 3D shape of common objects from a color image and a small number of tactile explorations (around 10). Our setup is easy to apply and has potentials to help robots better perform grasping or manipulation tasks on real-world objects.Comment: IROS 2018. The first two authors contributed equally to this wor

    Sensor de distância por infravermelhos para a caracterização do espaço de trabalho

    Get PDF
    A navegação autónoma é um tema que é recorrente nos dias de hoje. Desde aspiradores inteligentes a carros autónomos. Requisitos de sistemas deste tipo são a possibilidade de calcular a distância a potenciais obstáculos, sendo necessário realizar a caracterização do espaço de trabalho de modo a permitir a navegação no mesmo. Existem soluções para estes problemas, no entanto estas tendem a ser dispendiosas e nem sempre permitem uma caracterização do espaço de trabalho. Com esta investigação recorre-se a um sensor de medição de distância por infravermelhos para determinar o perfil do espaço de trabalho, dando um maior foco à caracterização do funcionamento do sensor. Existem alguns estudos sobre este tipo de sensores. No entanto existe uma carência de informação no que diz respeito à caracterização do funcionamento dos mesmos. Neste trabalho combina-se a utilização do dispositivo GP2Y0A60SZ0F da Sharp e uma unidade de processamento para fazer a caracterização do espaço de trabalho. A solução passa por usar redes neuronais para estimar a distância a um obstáculo assim como determinar a sua presença. No final obtém-se um sistema capaz de medir a distância a um potencial obstáculo e traçar o perfil do espaço de trabalho

    組み込みシステムにおける画像分類のためのマルチトリムネットワーク構造を用いたモデル圧縮

    Get PDF
    Much effort has gone into developing smart robots, wherein perception and manipulation are among the most fundamental and challenging problems. Embedded systems (ESs) are critical in robot composition. However, as an embedded system, a robot brain has a fixed resource budget and is unsuitable for modern convolutional neural networks (CNNs). Thus, the approach of CNN compression plays an important role in reducing their computational cost to make a suitable model for embedded systems. Recently, CNN compression approaches can be categorized into two groups, namely hand-crafted and model compression (MC) approach. The hand-crafted approach involves factorization and manual compression, but it is time consuming and usually requires significant amounts of manual effort and domain knowledge. Instead, the MC approach takes advantage of pre-trained models and it can solve a hand-crafted problem. The MC squeezes an existing model into one that is smaller and requires less computation. Although most MC methods can achieve a low latency or high accuracy, they are non-optimum accuracy–latency trade-off, complex, and do not affect certain dimensions (e.g., the width, resolution, and depth) of the models. To overcome this problem, the thesis presents a simple model-compression approach that optimize the accuracy–latency trade-off of the model. The multi-trimmed network structure (MTNS) is a robust combination of model compression (MC) techniques providing a lightweight model with trade-off optimization. The thesis describes a number of significant advances. Firstly, a new simple and efficient MC technique is introduced, which takes into width, resolution and depth compression. Secondly, a new multi-objective function is devised, which uses the accuracy–latency trade-off of compressed models to optimize the performance of a target model. Thirdly, a new training-accelerator is developed, which integrates pruning of convolutional kernels into shrinking the model structure to reduce training time at compressing width dimension. Finally, a new search strategy is developed, which combines Neural Architecture Search (NAS) with shrinking the model structure to explore more-complex conditions of shrinking the model structure with a relatively short training period. In an experimental evaluation, the thesis compares the performances of the proposed MTNS approach with those of CNN filter pruning, the model quantization technique, an adaptive mixture of low-rank factorizations, and knowledge distillation. The MTNS better resolved the accuracy–latency trade-off in image classification than the modern MC methods. It will be useful and friendly to the embedded system to perform a compressed model of MTNS with the maximum trade-off, lightweight, low computation and rapid process. The outstanding of the thesis is that the model compression problems have been solved by using MTNS techniques which are simple and optimum accuracy–latency trade-off for model compression.九州工業大学博士学位論文 学位記番号:工博甲第532号 学位授与年月日:令和3年9月24日1 Introduction|2 Literature Reviews|3 Preliminary Knowledge and Technique for Model Compression|4 Shrinking Structure of Models|5 Shrinking Structure of Models with Training Accelerator|6 Trim Neural Architecture Search|7 Conclusions九州工業大学令和3年
    corecore