100,295 research outputs found

    Kohteentunnistus kuvista konvoluutioneuroverkoilla

    Get PDF
    Object detection is a subfield of computer vision that is currently heavily based on machine learning. For the past decade, the field of machine learning has been dominated by so-called deep neural networks, which take advantage of improvements in computing power and data availability. A subtype of a neural network called a convolutional neural network (CNN) is well-suited for image-related tasks. The network is trained to look for different features, such as edges, corners and colour differences, across the image and to combine these into more complex shapes. For object detection, the system has to both estimate the locations of probable objects and to classify these. For this master's thesis, we reviewed the current literature on convolutional object detection and tested the implementability of one of the methods. We found that convolutional object detection is still evolving as a technology, despite outranking other object detection methods. By virtue of free availability of datasets and pretrained networks, it is possible to create a functional implementation of a deep neural network without access to specialist hardware. Pretrained networks can also be used as a starting point for training new networks, decreasing costly training time. For the experimental part, we implemented Fast R-CNN using MATLAB and MatConvNet and tested a general object detector on two different traffic-related datasets. We found that Fast R-CNN is relatively precise and considerably faster than the original convolutional object detection method, R-CNN, and can be implemented on a home computer. Advanced methods, such as Faster R-CNN and SSD, improve the speed of Fast R-CNN. We also experimented with a geometry-based scene estimation model, which was reported to improve the precision of a previous generation object detection method. We found that with our implementation of Fast R-CNN there was no such improvement, although further adjustments are possible. Combining whole scene modelling with convolutional networks is a potential subject of further study.Kohteentunnistus on tietokonenäön osa-alue, joka pohjautuu vahvasti koneoppimiseen. Koneoppimisen tämän vuosikymmenen trendi ovat niin kutsutut syväoppivat neuroverkot, jotka perustuvat laskentatehon ja datan saatavuuden kasvuun. Konvoluutioneuroverkko on neuroverkon alatyyppi, joka sopii erityisesti kuviin liittyvien ongelmien ratkaisuun. Verkko opetetaan etsimään yksinkertaisia kuvapiirteitä ja yhdistelemään näitä monimutkaisemmiksi muodoiksi. Kohteentunnistusongelmassa menetelmän tulee sekä paikallistaa että luokitella kiinnostavat kohteet. Diplomityöni sisältää kirjallisuuskatsauksen konvoluutioon perustuviin kohteentunnistusmenetelmiin sekä selostuksen erään tällaisen menetelmän toteuttamisesta. Konvoluutioon perustuva kohteentunnistus kehittyy tällä hetkellä kiivaasti ja on muita menetelmiä tarkempi ja nopeampi. Vapaasti saatavilla olevien opetusaineistojen ja esiopetetujen verkkojen avulla syvä neuroverkko on mahdollista toteuttaa suhteellisen vaivattomasti ja ilman erikoislaitteita. Esiopetettuja verkkoja voidaan käyttää pohjana uusien verkkojen kouluttamiseen. Kokeellisessa osassa toteutin Fast R-CNN:n MATLABin ja MatConvNetin avulla ja kokeilin kahden liikennedata-aineiston avulla, kuinka yleisellä datalla opetettu verkko suoriutui erityisongelmasta. Fast R-CNN suoritti tunnistuksen kohtuullisen tarkasti ja on edeltäjäänsä R-CNN:ää sen verran nopeampi, että on toteutettavissa kotitietokoneella. Kehittyneemmät menetelmät, kuten Faster R-CNN ja SSD, olisivat tätäkin nopeampia, mutta eivät juurikaan tarkempia. Kokeilin myös yhdistää Fast R-CNN geometriantunnistusmenetelmän kanssa, jota on käytetty aikaisemman sukupolven menetelmän tarkkuuden parantamiseen. Konvoluutiomenetelmän kanssa tarkkuus ei noussut, mutta tutkin työssäni, mistä tämä johtui ja kuinka koko näkymän estimointia voidaan mahdollisesti hyödyntää konvoluutioneuroverkoissa

    Deep Detection of People and their Mobility Aids for a Hospital Robot

    Full text link
    Robots operating in populated environments encounter many different types of people, some of whom might have an advanced need for cautious interaction, because of physical impairments or their advanced age. Robots therefore need to recognize such advanced demands to provide appropriate assistance, guidance or other forms of support. In this paper, we propose a depth-based perception pipeline that estimates the position and velocity of people in the environment and categorizes them according to the mobility aids they use: pedestrian, person in wheelchair, person in a wheelchair with a person pushing them, person with crutches and person using a walker. We present a fast region proposal method that feeds a Region-based Convolutional Network (Fast R-CNN). With this, we speed up the object detection process by a factor of seven compared to a dense sliding window approach. We furthermore propose a probabilistic position, velocity and class estimator to smooth the CNN's detections and account for occlusions and misclassifications. In addition, we introduce a new hospital dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm that our pipeline successfully keeps track of people and their mobility aids, even in challenging situations with multiple people from different categories and frequent occlusions. Videos of our experiments and the dataset are available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAidsComment: 7 pages, ECMR 2017, dataset and videos: http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids

    CNN for IMU Assisted Odometry Estimation using Velodyne LiDAR

    Full text link
    We introduce a novel method for odometry estimation using convolutional neural networks from 3D LiDAR scans. The original sparse data are encoded into 2D matrices for the training of proposed networks and for the prediction. Our networks show significantly better precision in the estimation of translational motion parameters comparing with state of the art method LOAM, while achieving real-time performance. Together with IMU support, high quality odometry estimation and LiDAR data registration is realized. Moreover, we propose alternative CNNs trained for the prediction of rotational motion parameters while achieving results also comparable with state of the art. The proposed method can replace wheel encoders in odometry estimation or supplement missing GPS data, when the GNSS signal absents (e.g. during the indoor mapping). Our solution brings real-time performance and precision which are useful to provide online preview of the mapping results and verification of the map completeness in real time

    Cross Modal Distillation for Supervision Transfer

    Full text link
    In this work we propose a technique that transfers supervision between images from different modalities. We use learned representations from a large labeled modality as a supervisory signal for training representations for a new unlabeled paired modality. Our method enables learning of rich representations for unlabeled modalities and can be used as a pre-training procedure for new modalities with limited labeled data. We show experimental results where we transfer supervision from labeled RGB images to unlabeled depth and optical flow images and demonstrate large improvements for both these cross modal supervision transfers. Code, data and pre-trained models are available at https://github.com/s-gupta/fast-rcnn/tree/distillationComment: Updated version (v2) contains additional experiments and result

    Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition

    Full text link
    Two approaches are proposed for cross-pose face recognition, one is based on the 3D reconstruction of facial components and the other is based on the deep Convolutional Neural Network (CNN). Unlike most 3D approaches that consider holistic faces, the proposed approach considers 3D facial components. It segments a 2D gallery face into components, reconstructs the 3D surface for each component, and recognizes a probe face by component features. The segmentation is based on the landmarks located by a hierarchical algorithm that combines the Faster R-CNN for face detection and the Reduced Tree Structured Model for landmark localization. The core part of the CNN-based approach is a revised VGG network. We study the performances with different settings on the training set, including the synthesized data from 3D reconstruction, the real-life data from an in-the-wild database, and both types of data combined. We investigate the performances of the network when it is employed as a classifier or designed as a feature extractor. The two recognition approaches and the fast landmark localization are evaluated in extensive experiments, and compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
    corecore