100,295 research outputs found
Kohteentunnistus kuvista konvoluutioneuroverkoilla
Object detection is a subfield of computer vision that is currently heavily based on machine learning. For the past decade, the field of machine learning has been dominated by so-called deep neural networks, which take advantage of improvements in computing power and data availability. A subtype of a neural network called a convolutional neural network (CNN) is well-suited for image-related tasks. The network is trained to look for different features, such as edges, corners and colour differences, across the image and to combine these into more complex shapes. For object detection, the system has to both estimate the locations of probable objects and to classify these.
For this master's thesis, we reviewed the current literature on convolutional object detection and tested the implementability of one of the methods. We found that convolutional object detection is still evolving as a technology, despite outranking other object detection methods. By virtue of free availability of datasets and pretrained networks, it is possible to create a functional implementation of a deep neural network without access to specialist hardware. Pretrained networks can also be used as a starting point for training new networks, decreasing costly training time.
For the experimental part, we implemented Fast R-CNN using MATLAB and MatConvNet and tested a general object detector on two different traffic-related datasets. We found that Fast R-CNN is relatively precise and considerably faster than the original convolutional object detection method, R-CNN, and can be implemented on a home computer. Advanced methods, such as Faster R-CNN and SSD, improve the speed of Fast R-CNN. We also experimented with a geometry-based scene estimation model, which was reported to improve the precision of a previous generation object detection method. We found that with our implementation of Fast R-CNN there was no such improvement, although further adjustments are possible. Combining whole scene modelling with convolutional networks is a potential subject of further study.Kohteentunnistus on tietokonenäön osa-alue, joka pohjautuu vahvasti koneoppimiseen. Koneoppimisen tämän vuosikymmenen trendi ovat niin kutsutut syväoppivat neuroverkot, jotka perustuvat laskentatehon ja datan saatavuuden kasvuun. Konvoluutioneuroverkko on neuroverkon alatyyppi, joka sopii erityisesti kuviin liittyvien ongelmien ratkaisuun. Verkko opetetaan etsimään yksinkertaisia kuvapiirteitä ja yhdistelemään näitä monimutkaisemmiksi muodoiksi. Kohteentunnistusongelmassa menetelmän tulee sekä paikallistaa että luokitella kiinnostavat kohteet.
Diplomityöni sisältää kirjallisuuskatsauksen konvoluutioon perustuviin kohteentunnistusmenetelmiin sekä selostuksen erään tällaisen menetelmän toteuttamisesta. Konvoluutioon perustuva kohteentunnistus kehittyy tällä hetkellä kiivaasti ja on muita menetelmiä tarkempi ja nopeampi. Vapaasti saatavilla olevien opetusaineistojen ja esiopetetujen verkkojen avulla syvä neuroverkko on mahdollista toteuttaa suhteellisen vaivattomasti ja ilman erikoislaitteita. Esiopetettuja verkkoja voidaan käyttää pohjana uusien verkkojen kouluttamiseen.
Kokeellisessa osassa toteutin Fast R-CNN:n MATLABin ja MatConvNetin avulla ja kokeilin kahden liikennedata-aineiston avulla, kuinka yleisellä datalla opetettu verkko suoriutui erityisongelmasta. Fast R-CNN suoritti tunnistuksen kohtuullisen tarkasti ja on edeltäjäänsä R-CNN:ää sen verran nopeampi, että on toteutettavissa kotitietokoneella. Kehittyneemmät menetelmät, kuten Faster R-CNN ja SSD, olisivat tätäkin nopeampia, mutta eivät juurikaan tarkempia. Kokeilin myös yhdistää Fast R-CNN geometriantunnistusmenetelmän kanssa, jota on käytetty aikaisemman sukupolven menetelmän tarkkuuden parantamiseen. Konvoluutiomenetelmän kanssa tarkkuus ei noussut, mutta tutkin työssäni, mistä tämä johtui ja kuinka koko näkymän estimointia voidaan mahdollisesti hyödyntää konvoluutioneuroverkoissa
Deep Detection of People and their Mobility Aids for a Hospital Robot
Robots operating in populated environments encounter many different types of
people, some of whom might have an advanced need for cautious interaction,
because of physical impairments or their advanced age. Robots therefore need to
recognize such advanced demands to provide appropriate assistance, guidance or
other forms of support. In this paper, we propose a depth-based perception
pipeline that estimates the position and velocity of people in the environment
and categorizes them according to the mobility aids they use: pedestrian,
person in wheelchair, person in a wheelchair with a person pushing them, person
with crutches and person using a walker. We present a fast region proposal
method that feeds a Region-based Convolutional Network (Fast R-CNN). With this,
we speed up the object detection process by a factor of seven compared to a
dense sliding window approach. We furthermore propose a probabilistic position,
velocity and class estimator to smooth the CNN's detections and account for
occlusions and misclassifications. In addition, we introduce a new hospital
dataset with over 17,000 annotated RGB-D images. Extensive experiments confirm
that our pipeline successfully keeps track of people and their mobility aids,
even in challenging situations with multiple people from different categories
and frequent occlusions. Videos of our experiments and the dataset are
available at http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAidsComment: 7 pages, ECMR 2017, dataset and videos:
http://www2.informatik.uni-freiburg.de/~kollmitz/MobilityAids
CNN for IMU Assisted Odometry Estimation using Velodyne LiDAR
We introduce a novel method for odometry estimation using convolutional
neural networks from 3D LiDAR scans. The original sparse data are encoded into
2D matrices for the training of proposed networks and for the prediction. Our
networks show significantly better precision in the estimation of translational
motion parameters comparing with state of the art method LOAM, while achieving
real-time performance. Together with IMU support, high quality odometry
estimation and LiDAR data registration is realized. Moreover, we propose
alternative CNNs trained for the prediction of rotational motion parameters
while achieving results also comparable with state of the art. The proposed
method can replace wheel encoders in odometry estimation or supplement missing
GPS data, when the GNSS signal absents (e.g. during the indoor mapping). Our
solution brings real-time performance and precision which are useful to provide
online preview of the mapping results and verification of the map completeness
in real time
Cross Modal Distillation for Supervision Transfer
In this work we propose a technique that transfers supervision between images
from different modalities. We use learned representations from a large labeled
modality as a supervisory signal for training representations for a new
unlabeled paired modality. Our method enables learning of rich representations
for unlabeled modalities and can be used as a pre-training procedure for new
modalities with limited labeled data. We show experimental results where we
transfer supervision from labeled RGB images to unlabeled depth and optical
flow images and demonstrate large improvements for both these cross modal
supervision transfers. Code, data and pre-trained models are available at
https://github.com/s-gupta/fast-rcnn/tree/distillationComment: Updated version (v2) contains additional experiments and result
Fast Landmark Localization with 3D Component Reconstruction and CNN for Cross-Pose Recognition
Two approaches are proposed for cross-pose face recognition, one is based on
the 3D reconstruction of facial components and the other is based on the deep
Convolutional Neural Network (CNN). Unlike most 3D approaches that consider
holistic faces, the proposed approach considers 3D facial components. It
segments a 2D gallery face into components, reconstructs the 3D surface for
each component, and recognizes a probe face by component features. The
segmentation is based on the landmarks located by a hierarchical algorithm that
combines the Faster R-CNN for face detection and the Reduced Tree Structured
Model for landmark localization. The core part of the CNN-based approach is a
revised VGG network. We study the performances with different settings on the
training set, including the synthesized data from 3D reconstruction, the
real-life data from an in-the-wild database, and both types of data combined.
We investigate the performances of the network when it is employed as a
classifier or designed as a feature extractor. The two recognition approaches
and the fast landmark localization are evaluated in extensive experiments, and
compared to stateof-the-art methods to demonstrate their efficacy.Comment: 14 pages, 12 figures, 4 table
- …