Event cameras, such as dynamic vision sensors (DVS), and dynamic and
active-pixel vision sensors (DAVIS) can supplement other autonomous driving
sensors by providing a concurrent stream of standard active pixel sensor (APS)
images and DVS temporal contrast events. The APS stream is a sequence of
standard grayscale global-shutter image sensor frames. The DVS events represent
brightness changes occurring at a particular moment, with a jitter of about a
millisecond under most lighting conditions. They have a dynamic range of >120
dB and effective frame rates >1 kHz at data rates comparable to 30 fps
(frames/second) image sensors. To overcome some of the limitations of current
image acquisition technology, we investigate in this work the use of the
combined DVS and APS streams in end-to-end driving applications. The dataset
DDD17 accompanying this paper is the first open dataset of annotated DAVIS
driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor
recording highway and city driving in daytime, evening, night, dry and wet
weather conditions, along with vehicle speed, GPS position, driver steering,
throttle, and brake captured from the car's on-board diagnostics interface. As
an example application, we performed a preliminary end-to-end learning study of
using a convolutional neural network that is trained to predict the
instantaneous steering angle from DVS and APS visual data.Comment: Presented at the ICML 2017 Workshop on Machine Learning for
  Autonomous Vehicle

Binas, Jonathan

Delbruck, Tobi

Liu, Shih-Chii

Neil, Daniel

English

arXiv

Event cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel vision sensors (DAVIS) can supplement other autonomous driving sensors by providing a concurrent stream of standard active pixel sensor (APS) images and DVS temporal contrast events. The APS stream is a sequence of standard grayscale global-shutter image sensor frames. The DVS events represent brightness changes occurring at a particular moment, with a jitter of about a millisecond under most lighting conditions. They have a dynamic range of >120 dB and effective frame rates >1 kHz at data rates comparable to 30 fps (frames/second) image sensors. To overcome some of the limitations of current image acquisition technology, we investigate in this work the use of the combined DVS and APS streams in end-to-end driving applications. The dataset DDD17 accompanying this paper is the first open dataset of annotated DAVIS driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recording highway and city driving in daytime, evening, night, dry and wet weather conditions, along with vehicle speed, GPS position, driver steering, throttle, and brake captured from the car's on-board diagnostics interface. As an example application, we performed a preliminary end-to-end learning study of using a convolutional neural network that is trained to predict the instantaneous steering angle from DVS and APS visual data

ZORA

Zurich Open Repository andArchiveUniversity of ZurichMain LibraryStrickhofstrasse 39CH-8057 Zurichwww.zora.uzh.chYear: 2017DDD17: End-To-End DAVIS Driving DatasetBinas, Jonathan; Neil, Daniel; Liu, Shih-Chii; Delbruck, TobiAbstract: Event cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel visionsensors (DAVIS) can supplement other autonomous driving sensors by providing a concurrent streamof standard active pixel sensor (APS) images and DVS temporal contrast events. The APS stream is asequence of standard grayscale global-shutter image sensor frames. The DVS events represent brightnesschanges occurring at a particular moment, with a jitter of about a millisecond under most lightingconditions. They have a dynamic range of >120 dB and effective frame rates >1 kHz at data ratescomparable to 30 fps (frames/second) image sensors. To overcome some of the limitations of currentimage acquisition technology, we investigate in this work the use of the combined DVS and APS streamsin end-to-end driving applications. The dataset DDD17 accompanying this paper is the first open datasetof annotated DAVIS driving recordings. DDD17 has over 12 h of a 346x260 pixel DAVIS sensor recordinghighway and city driving in daytime, evening, night, dry and wet weather conditions, along with vehiclespeed, GPS position, driver steering, throttle, and brake captured from the car’s on-board diagnosticsinterface. As an example application, we performed a preliminary end-to-end learning study of using aconvolutional neural network that is trained to predict the instantaneous steering angle from DVS andAPS visual data.Posted at the Zurich Open Repository and Archive, University of ZurichZORA URL: https://doi.org/10.5167/uzh-149345Published VersionOriginally published at:Binas, Jonathan; Neil, Daniel; Liu, Shih-Chii; Delbruck, Tobi (2017). DDD17: End-To-End DAVISDriving Dataset. ArXiv Computer Vision and Pattern Recognition:0.DDD17: End-To-End DAVIS Driving DatasetJonathan Binas, Daniel Neil, Shih-Chii Liu, and Tobi Delbruck∗.Institute of Neuroinformatics,University of Zurich and ETH Zurich, SwitzerlandNovember 7, 2017AbstractEvent cameras, such as dynamic vision sensors (DVS), and dynamic and active-pixel vision sensors (DAVIS) can supplement other autonomous driving sensors byproviding a concurrent stream of standard active pixel sensor (APS) images andDVS temporal contrast events. The APS stream is a sequence of standard grayscaleglobal-shutter image sensor frames. The DVS events represent brightness changesoccurring at a particular moment, with a jitter of about a millisecond under mostlighting conditions. They have a dynamic range of >120 dB and effective framerates >1 kHz at data rates comparable to 30 fps (frames/second) image sensors.To overcome some of the limitations of current image acquisition technology, weinvestigate in this work the use of the combined DVS and APS streams in end-to-end driving applications. The dataset DDD17 accompanying this paper is thefirst open dataset of annotated DAVIS driving recordings. DDD17 has over 12 hof a 346x260 pixel DAVIS sensor recording highway and city driving in daytime,evening, night, dry and wet weather conditions, along with vehicle speed, GPSposition, driver steering, throttle, and brake captured from the car’s on-board diag-nostics interface. As an example application, we performed a preliminary end-to-end learning study of using a convolutional neural network that is trained to predictthe instantaneous steering angle from DVS and APS visual data.1 IntroductionThe rapid improvement of machine learning and computer vision systems has spurredthe development of self driving vehicles, which have already covered millions of kilo-meters in real world scenarios. It appears that the development of processing tech-nology and algorithms currently advances at greater speed than the development ofsensing hardware for capturing the necessary information from the surroundings ofthe vehicle, such as obstacles, traffic, marks, and signs. Automotive image sensorsare being intensively developed to deal with the conflicting requirements for low cost,high dynamic range, high sensitivity, and resistance to artifacts from flickering light∗Corresponding author: tobi@ini.uzh.ch1arXiv:1711.01458v1  [cs.CV]  4 Nov 2017sources such as LED traffic signs and car taillights. Operation under bad weatherand/or lighting conditions is a primary requirement for automotive self driving or auto-matic driver assistance systems (ADAS), however, current ADAS sensors and systemsstill face many problems compared to human driver performance in challenging situa-tions. Since event cameras have been proposed as possible ADAS sensors (Posch et al.,2014), we collected data to study the use of an event camera to augment conventionalimager technology.Rather than providing frame-based video as output, the event camera dynamic vi-sion sensor (DVS) detects local changes in the brightness of individual pixels and asyn-chronously outputs those changes at the time of occurrence (Lichtsteiner et al., 2008;Posch et al., 2014). Thus, only the parts of the scene that change produce data, low-ering the output data rate, increasing the temporal resolution and reducing the latencyin comparison to frame-based systems, since changes in pixel brightness are streamedout of the camera as they occur. The local instantaneous gain control increases usabil-ity under uncontrolled lighting conditions. The higher temporal resolution and limiteddata rate makes the DVS well suited for autonomous driving applications, where bothlatency and power consumption are important. A dynamic and active-pixel vision sen-sor (DAVIS) has pixels that concurrently output DVS events and standard image sensorintensity frames (Brandli et al., 2014).Recent studies have shown the utility of using DVS in data-driven convolutionalneural network (CNN) real time applications (Moeys et al., 2016; Lungu et al., 2017).In these applications, DVS input frames typically consist of a 2D histogram imageof a constant number of a few thousand DVS events. Because the DVS event rate isproportional to the rate of change of brightness, i.e. scene reflectance (Lichtsteineret al., 2008), the CNN frame rate is variable, ranging from about 1 fps up to 1000 fps.Moeys et al. (2016) showed that combining the standard image sensor frames from thesensor with the DVS frames resulted in higher accuracy and lower average reactiontime. Here we extend this work to real world driving in the first published end-to-enddataset of DVS or DAVIS driving data.2 Davis Driving Dataset 2017 (DDD17)DDD17 is available from sensors.ini.uzh.ch/databases. This data is collected fromSwiss and German road driving under various conditions. It includes DAVIS data andcar data. Since the main aim of this dataset is to enable studying the fusion of APS andDVS data for ADAS, we did not include other sensors such as LIDAR.2.1 DAVIS dataVisual data was captured using a DAVIS346B prototype camera, containing a DAVISAPS+DVS camera, such that event-based and traditional frame-based data could berecorded at the same time, through the same optics. The camera resolution is 346×260pixels. The camera architecture is similar to Brandli et al. (2014), but the sensor has2.1X more pixels and includes on-chip column parallel analog to digital converters(ADCs) for frame-based APS output up to 50 fps. The DAVIS346B also has optimized2buried photodiodes with microlenses that increase fill factor and reduce dark current,thereby improving operation at low light intensities by factor of about 4 comparedwith the Brandli et al. (2014) DAVIS240C. A fixed focal length lens (C-mount, 6mm)was used for all recordings, providing a horizontal field of view of 56◦. The aperturewas set manually, depending on lighting conditions. The APS frame rate depended onexposure duration to a value between 10 fps and 50 fps; in some recordings it varieddepending on the auto-exposure duration algorithm. The frames were captured usingthe DAVIS global shutter mode to minimize motion artifacts. The camera was mountedusing a glass suction tripod mount behind the windshield, just below the rear mirror,and aligned to point to the center of the hood. Markers on the car hood were usedto initially align the camera for the first recording session and the camera was nevermoved from this position. These markers were left on the hood throughout the entirerecording period for control. A polarization filter was used in some of the recordingsto reduce windshield and hood glare. The camera was powered by and connected to alaptop computer through high speed USB 2.0. The raw data was read out using inilabscAER software1 and streamed to the custom recording framework described in Sec. 2.3for further processing.2.2 Vehicle control and diagnostic dataData was acquired using a Ford Mondeo MK 3 European Model. We used the OpenXCFord Reference vehicle interface, that plugs into the passenger compartment OBDIIport, to read out control and diagnostic data from the car’s CAN bus. The vehicleinterface connects to a host USB port2.The vehicle interface was programmed with the vendor-provided firmware for theFord Mondeo MK 3 car model (“type 3” firmware) and read out using the OpenXCpython library. The raw data was passed to the custom recording software described inSec. 2.3. The following quantities were read out at rates of about 10 Hz each. Likelytargets for experiments in end-to-end learning are in boldface.• steering wheel angle (degrees, up to 720◦)• accelerator pedal position (% pressed),• brake pedal status (pressed/not pressed),• engine speed (rpm),• vehicle speed (km/h),• latitude,• longitude,• headlamp status (on/off),• high beam status (on/off),• windshield wiper status (on/off),• odometer (km),• torque at transmission,• transmission gear position (gear no.),• fuel consumed since restart,1cAER support2OpenXC vehicle interface3• fuel level (%),• ignition status,• parking brake status (on/off).2.3 Recording and viewing softwareA python software framework 3 for recording, viewing, and exporting the data was cre-ated for the main purpose of combining and synchronizing the data from the differentinput devices and storing it in a standardized file format. In particular, since the APSframes and DVS data are microsecond time-stamped on the camera using its own localclock, whereas the data provided by the vehicle interface is not, both data streams wereaugmented with the millisecond system time of the recording computer, which couldthen be used for synchronization. With the vehicle interface streaming data at ratesof only around 10 Hz per recorded variable, such off-device time-stamping is justified.The computer time was synchronized to a standard time server before recordings. Thedata was stored in HDF5 format, for which widely used libraries for various environ-ments exist. Each data type (e.g. DVS events, steering wheel angle, vehicle speed...)was stored in a separate container, each containing one container for the system times-tamp and one for the data. In this way, the system timestamp can be used for fast index-ing and for synchronizing the data when reading. With data being provided at irregularintervals by the recording devices, each data type was stored in an event-driven fashion,such that different containers contain different numbers of samples. The DAVIS datawas stored in its native cAER AER-DAT3.1 format4 in each HDF5 container.In addition to the recording framework, a python-based viewer view.py visual-izes the recorded DAVIS data together with selected vehicle data such as the steeringangle or speed (Fig. 1). The script export.py exports the data into frames for prepar-ing data for further processing by machine learning algorithms.3 Recorded dataIn total, over 12 h of data were recorded under various weather, driving, road, andlighting conditions on six consecutive days, covering over 1000 km of different typesof roads in Switzerland and Germany. Recordings were started and stopped manuallyand typically have durations of between a minute and an hour. The resulting recordingsare summarized in Table 1. Fig. 2 shows the distributions of several recorded variablesover the whole dataset. Steering angles are dominated by straight driving and smalldeviations of ±10◦. Speed is uniformly distributed over the range 0-160 km/h. Theautomatically controlled headlight is on about half the time, indicating a substantialfraction of the data was captured in low-light conditions.3ddd17-utils4inilabs file formats4File(.hdf5) Scene Cond. T (s) GB1487339175 cty wet 347 2.81487349453 campus dark 192 1.71487350455 fwy ngt, rain 1404 11.21487354030 cty ngt, wet 377 31487354811 cty ngt, wet 190 1.41487355025 cty ngt, wet 57 0.41487355090 cty, hwy ngt, wet 984 5.91487356509 fwy ngt, wet 2233 12.41487417411 fwy day 2096 18.21487419513 fwy day 1976 18.31487424147 m. fwy day 3040 30.31487427200 fwy day 1947 17.61487430438 fwy day 3135 26.21487433587 fwy+cty ngt 2355 18.51487593224 hwy day 586 5.31487594667 fwy day 2985 29.71487597945 cty evening 50 0.51487598202 fwy day 1882 15.11487600962 fwy day 2143 15.11487608147 fwy evening 1208 91487609463 fwy evening 1458 6.31487778564 campus day 101 1.11487779465 cty+hwy day 1170 11.21487781509 campus evening 127 0.61487782014 cty+hwy evening 1118 7.31487839456 cty day, sun 406 5.71487842276 cty day, sun 625 9.51487844247 cty day, sun 523 7.51487846842 twn+hwy day, sun 1799 20.61487849151 twn day, sun 429 5.51487849663 twn+hwy day, sun 2863 34.71487856408 twn day, sun 817 13.21487857941 twn day, sun 99 1.41487858093 cty day, sun 2421 34.71487860613 cty day, sun 1065 17.41487864316 cty+fwy evening 1087 12.9Table 1: Summary of the acquired data. Keys: hwy=highway, fwy=freeway, cty=city,twn=town, ngt=night. GB=size of recording in gigabytes. T=duration of recording inseconds.5Figure 1: Example scenario visualized by the recording file viewer. The top panelsshow the DAVIS frames (left; overlaid with some driving data) and events (right), thebottom panel shows a progress bar as well as visualizations of different vehicle data(headlamp status at the top, steering angle in the middle, speed at the bottom).4 Experiments: Steering prediction networkEnd-to-end learning of a control model is an attractive approach for self-driving appli-cations, since it eliminates the need for tedious hand-labeling of the data or features– a task which is prohibitive in the face of the enormous amounts of data acquired bytoday’s vehicles (Bojarski et al., 2016). The presented dataset has clear limitations,since it does not include other sensors such as LIDAR, does not include route infor-mation that would allow better prediction of user intentions, and the data tends to beunbalanced. Nevertheless, under certain conditions such as highway driving, drivingalong roads without turns onto other roads, or unpredictable user actions, it can be usedto study the utility of of the data for prediction of measured user actions.We trained simple steering prediction networks. These networks take input APSand/or DVS data and attempt to predict the instantaneous steering wheel angle. Theyare inspired by LeCun’s early work (LeCun et al., 2005), the seminal open datasetfrom comma.ai (Santana & Hotz, 2016), and by recent Nvidia (Bojarski et al., 2016)and unpublished VW studies.Our results compare the steering prediction accuracy of networks operating on pureAPS data to such operating on pure DVS data. Our example implementation shouldbe regarded as a preliminary study to validate the usability of the data and associatedsoftware. In particular, the experiments presented here are based on a small subsetof the whole dataset (recordings 1487858093 and 1487433587 in Table 1). Work isongoing to train more architectures using more of the data.Fig. 3 shows our first results, obtained from a CNN with 4 convolutional layers,each with 8 feature maps and using 3x3 kernels, and trained on a single 1.5 h recording.Each layer is followed by a 2x2 max pooling layer. The final feature map layer is6-10 -5 0 5 10100102104Steering angle (deg)Seconds0 50 100 150100102104Vehicle speed (km/h)0 20 40100102104Acc. pedal pos.(% pressed)Secondsreleasedpressed101103105Brakepedalo onHeadlampstatusFigure 2: Statistical distribution of various recorded signals.mapped to a 64-unit fully connected (FC) layer. The FC layer is mapped to an outputsteering angle in the range ±180◦. The DVS and APS inputs were subsampled to80x60 images. Input frame normalization was done as in Moeys et al. (2016).Our quantitative accuracy results are too inconclusive to report but we have verifiedthe usability of the dataset and tools. Further analysis is necessary and the subject ofongoing work.5 ConclusionThe main result of this paper is to introduce the DDD17 first open dataset of DAVISdriving data with end-to-end labeling, along with necessary software tools. A prelimi-nary study on an end-to-end steering angle prediction by a CNN show usability of thedata.AcknowledgementsWe thank Dimitri Rettig and Anna Stockklauser for their help with recording some of the data,inilabs and the INI Sensors Group for device support. This work was made possible by fundingfrom the EU projects SeeBetter and Visualise and by Samsung.7Figure 3: Steering prediction initial result. Comparison of our first APS and DVSsteering prediction experiments. A: DVS frame and CNN output. B: APS frame andCNN output. C: segment of time history.ReferencesBojarski, Mariusz, Del Testa, Davide, Dworakowski, Daniel, Firner, Bernhard,Flepp, Beat, Goyal, Prasoon, Jackel, Lawrence D, Monfort, Mathew, Muller, Urs,Zhang, Jiakai, et al. End to end learning for self-driving cars. arXiv preprintarXiv:1604.07316, 2016.Brandli, C., Berner, R., Yang, M., Liu, S-C., and Delbruck, T. A 240×180 130 dB 3µs latency global shutter spatiotemporal vision sensor. IEEE Journal of Solid-StateCircuits, 49(10):2333–2341, 2014.LeCun, Yann, Muller, Urs, Ben, Jan, Cosatto, Eric, and Flepp, Beat. Off-Road Ob-stacle Avoidance through End-to-End Learning. In Advances in Neural InformationProcessing Systems, pp. 739–746, 2005.Lichtsteiner, P., Posch, C., and Delbruck, T. A 128x128 120 dB 15 µs latency asyn-chronous temporal contrast vision sensor. IEEE Journal of Solid-State Circuits, 43(2):566–576, Feb 2008.Lungu, Iulia-Alexandra, Corradi, Federico, and Delbruck, Tobias. Live Demonstra-tion: Convolutional Neural Network Driven by Dynamic Vision Sensor PlayingRoShamBo. In 2017 IEEE Symposium on Circuits and Systems (ISCAS 2017), Bal-timore, MD, USA, 2017.8Moeys, D. P., Corradi, F., Kerr, E., Vance, P., Das, G., Neil, D., Kerr, D., and Delbru¨ck,T. Steering a predator robot using a mixed frame/event-driven convolutional neuralnetwork. In 2016 Second International Conference on Event-based Control, Com-munication, and Signal Processing (EBCCSP), pp. 1–8, June 2016.Posch, C., Serrano-Gotarredona, T., Linares-Barranco, B., and Delbruck, T.Retinomorphic Event-Based Vision Sensors: Bioinspired Cameras With SpikingOutput. Proceedings of the IEEE, 102(10):1470–1484, October 2014.Santana, Eder and Hotz, George. Learning a driving simulator. arXiv preprintarXiv:1608.01230, 2016.9

DDD17: End-To-End DAVIS Driving Dataset

http://www.zora.uzh.ch/id/eprint/149345/1/ddd1711.01458.pdf

DDD17: End-To-End DAVIS Driving Dataset

Abstract

Similar works

Full text

Available Versions

ZORA