100 research outputs found
TinyTracker: Ultra-Fast and Ultra-Low-Power Edge Vision In-Sensor for Gaze Estimation
Intelligent edge vision tasks encounter the critical challenge of ensuring
power and latency efficiency due to the typically heavy computational load they
impose on edge platforms.This work leverages one of the first "AI in sensor"
vision platforms, IMX500 by Sony, to achieve ultra-fast and ultra-low-power
end-to-end edge vision applications. We evaluate the IMX500 and compare it to
other edge platforms, such as the Google Coral Dev Micro and Sony Spresense, by
exploring gaze estimation as a case study. We propose TinyTracker, a highly
efficient, fully quantized model for 2D gaze estimation designed to maximize
the performance of the edge vision systems considered in this study.
TinyTracker achieves a 41x size reduction (600Kb) compared to iTracker [1]
without significant loss in gaze estimation accuracy (maximum of 0.16 cm when
fully quantized). TinyTracker's deployment on the Sony IMX500 vision sensor
results in end-to-end latency of around 19ms. The camera takes around 17.9ms to
read, process and transmit the pixels to the accelerator. The inference time of
the network is 0.86ms with an additional 0.24 ms for retrieving the results
from the sensor. The overall energy consumption of the end-to-end system is 4.9
mJ, including 0.06 mJ for inference. The end-to-end study shows that IMX500 is
1.7x faster than CoralMicro (19ms vs 34.4ms) and 7x more power efficient (4.9mJ
VS 34.2mJ
Event-based Vision: A Survey
Event cameras are bio-inspired sensors that differ from conventional frame
cameras: Instead of capturing images at a fixed rate, they asynchronously
measure per-pixel brightness changes, and output a stream of events that encode
the time, location and sign of the brightness changes. Event cameras offer
attractive properties compared to traditional cameras: high temporal resolution
(in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low
power consumption, and high pixel bandwidth (on the order of kHz) resulting in
reduced motion blur. Hence, event cameras have a large potential for robotics
and computer vision in challenging scenarios for traditional cameras, such as
low-latency, high speed, and high dynamic range. However, novel methods are
required to process the unconventional output of these sensors in order to
unlock their potential. This paper provides a comprehensive overview of the
emerging field of event-based vision, with a focus on the applications and the
algorithms developed to unlock the outstanding properties of event cameras. We
present event cameras from their working principle, the actual sensors that are
available and the tasks that they have been used for, from low-level vision
(feature detection and tracking, optic flow, etc.) to high-level vision
(reconstruction, segmentation, recognition). We also discuss the techniques
developed to process events, including learning-based techniques, as well as
specialized processors for these novel sensors, such as spiking neural
networks. Additionally, we highlight the challenges that remain to be tackled
and the opportunities that lie ahead in the search for a more efficient,
bio-inspired way for machines to perceive and interact with the world
A high speed Tri-Vision system for automotive applications
Purpose: Cameras are excellent ways of non-invasively monitoring the interior and exterior of vehicles. In particular, high speed stereovision and multivision systems are important for transport applications such as driver eye tracking or collision avoidance. This paper addresses the synchronisation problem which arises when multivision camera systems are used to capture the high speed motion common in such applications.
Methods: An experimental, high-speed tri-vision camera system intended for real-time driver eye-blink and saccade measurement was designed, developed, implemented and tested using prototype, ultra-high dynamic range, automotive-grade image sensors specifically developed by E2V (formerly Atmel) Grenoble SA as part of the European FP6 project – sensation (advanced sensor development for attention stress, vigilance and sleep/wakefulness monitoring).
Results : The developed system can sustain frame rates of 59.8 Hz at the full stereovision resolution of 1280 × 480 but this can reach 750 Hz when a 10 k pixel Region of Interest (ROI) is used, with a maximum global shutter speed of 1/48000 s and a shutter efficiency of 99.7%. The data can be reliably transmitted uncompressed over standard copper Camera-Link® cables over 5 metres. The synchronisation error between the left and right stereo images is less than 100 ps and this has been verified both electrically and optically. Synchronisation is automatically established at boot-up and maintained during resolution changes. A third camera in the set can be configured independently. The dynamic range of the 10bit sensors exceeds 123 dB with a spectral sensitivity extending well into the infra-red range.
Conclusion: The system was subjected to a comprehensive testing protocol, which confirms that the salient requirements for the driver monitoring application are adequately met and in some respects, exceeded. The synchronisation technique presented may also benefit several other automotive stereovision applications including near and far-field obstacle detection and collision avoidance, road condition monitoring and others.Partially funded by the EU FP6 through the IST-507231 SENSATION project.peer-reviewe
Recommended from our members
Leveraging Eye Structure and Motion to Build a Low-Power Wearable Gaze Tracking System
Clinical studies have shown that features of a person\u27s eyes can function as an effective proxy for cognitive state and neurological function. Technological advances in recent decades have allowed us to deepen this understanding and discover that the actions of the eyes are in fact very tightly coupled to the operation of the brain. Researchers have used camera-based eye monitoring technology to exploit this connection and analyze mental state across across many different metrics of interest. These range from simple things like attention and scene processing, to impairments such as a fatigue or substance use, and even significant mental disorders such as Parkinson\u27s, autism, and schizophrenia.
While there is a wealth of knowledge and social benefit to be gained from eye tracking, the field has historically been restricted to laboratory use by crippling technological limitations - most notably, device size and power consumption. These issues primarily stem from the use of high-resolution cameras and heavyweight video-processing algorithms, both of which induce extremely high performance overhead on the eye tracker. To address this problem, we have constructed a lightweight, ultra-low-power eye monitoring device in the form factor of a pair of eyeglasses. The key guiding design principle for its construction was saliency-aware resource minimization. Specifically, our design leverages the fact that close-up images of the eye are characterized by large salient features which provide a high degree of redundant information; we exploit this to heavily subsample the eye image and reduce resource utilization while performing effective eye tracking.
In the first part of this thesis, we present an initial design of a wearable system to enable ubiquitous eye tracking. By exploiting the fact that the eye has several large, visually redundant features such as the iris and pupil, we were able to develop a neural-network-based adaptive-sampling algorithm for predicting the gaze point while sampling a minimal number of pixels from the image. This enabled us to realize a power savings using specialized imaging hardware that would sample only those most salient pixels, which proportionally reduced the power and time cost of reading images for eye tracking. With these optimizations we were able to build a first-of-of its kind wearable eye tracker that consumed 40 mW of power and demonstrated a gaze tracking error of only 3 degrees across multiple subjects. We refer to this device as the iShadow platform.
The second contribution and section of this thesis is a significant improvement upon the original iShadow design for the purpose of improving both power utilization and eye tracking performance. We constructed a new pupil-tracking algorithm based on lightweight computer vision features, which leverages the smoothness of the eye\u27s motion to reduce even further the amount of camera sampling needed. To guard against very infrequent discontinuities resulting from blinks or reflections off the eye, we integrated this model with the previously-used one-shot neural network algorithm. Because the common case (smooth, uninterrupted eye motion) occurs 90% of the time, we were able to realize a dramatic increase in performance due to the efficiency of the smooth tracking algorithm. The new and improved system, labeled CIDER, enabled much more accurate eye tracking - 0.4 degree error - with power consumption as low as 7 mW. This design also enabled a tradeoff between power consumption and eye tracking rate, in which it was also possible to draw higher power of ~30 mW in order to do eye tracking at rates of up to 240 frames per second.
The final contribution of this thesis is a re-designed version of the iShadow glasses hardware that is suitable for ``in-the-wild\u27\u27 studies on subjects in their daily living environment. A wearable device, especially one that is worn on the head, must be minimally obtrusive in order to be accepted and used in the field by subjects. This design goal conflicts with the ideal placement of cameras that is needed for achieving consistent eye tracking fidelity. We present multiple possible methods we explored for addressing these competing design challenges, and discuss the reasons that many proved infeasible. To conclude, we present a working design solution that appears to optimally trade off user comfort and convenience and against the technical requirements of the system
Towards Energy Efficient Mobile Eye Tracking for AR Glasses through Optical Sensor Technology
After the introduction of smartphones and smartwatches, Augmented Reality (AR) glasses
are considered the next breakthrough in the field of wearables. While the transition from
smartphones to smartwatches was based mainly on established display technologies, the display
technology of AR glasses presents a technological challenge. Many display technologies,
such as retina projectors, are based on continuous adaptive control of the display based on
the user’s pupil position. Furthermore, head-mounted systems require an adaptation and
extension of established interaction concepts to provide the user with an immersive experience.
Eye-tracking is a crucial technology to help AR glasses achieve a breakthrough through
optimized display technology and gaze-based interaction concepts. Available eye-tracking
technologies, such as Video Oculography (VOG), do not meet the requirements of AR glasses,
especially regarding power consumption, robustness, and integrability. To further overcome
these limitations and push mobile eye-tracking for AR glasses forward, novel laser-based
eye-tracking sensor technologies are researched in this thesis. The thesis contributes to a significant
scientific advancement towards energy-efficientmobile eye-tracking for AR glasses.
In the first part of the thesis, novel scanned laser eye-tracking sensor technologies for AR
glasses with retina projectors as display technology are researched. The goal is to solve the
disadvantages of VOG systems and to enable robust eye-tracking and efficient ambient light
and slippage through optimized sensing methods and algorithms.
The second part of the thesis researches the use of static Laser Feedback Interferometry (LFI)
sensors as low power always-on sensor modality for detection of user interaction by gaze
gestures and context recognition through Human Activity Recognition (HAR) for AR glasses.
The static LFI sensors can measure the distance to the eye and the eye’s surface velocity with
an outstanding sampling rate. Furthermore, they offer high integrability regardless of the
display technology.
In the third part of the thesis, a model-based eye-tracking approach is researched based on
the static LFI sensor technology. The approach leads to eye-tracking with an extremely high
sampling rate by fusing multiple LFI sensors, which enables methods for display resolution
enhancement such as foveated rendering for AR glasses and Virtual Reality (VR) systems.
The scientific contributions of this work lead to a significant advance in the field of mobile
eye-tracking for AR glasses through the introduction of novel sensor technologies that enable
robust eye tracking in uncontrolled environments in particular. Furthermore, the scientific
contributions of this work have been published in internationally renowned journals and
conferences
A portable device for time-resolved fluorescence based on an array of CMOS SPADs with integrated microfluidics
[eng] Traditionally, molecular analysis is performed in laboratories equipped with desktop instruments operated by specialized technicians. This paradigm has been changing in recent decades, as biosensor technology has become as accurate as desktop instruments, providing results in much shorter periods and miniaturizing the instrumentation, moving the diagnostic tests gradually out of the central laboratory. However, despite the inherent advantages of time-resolved fluorescence spectroscopy applied to molecular diagnosis, it is only in the last decade that POC (Point Of Care) devices have begun to be developed based on the detection of fluorescence, due to the challenge of developing high-performance, portable and low-cost spectroscopic sensors. This thesis presents the development of a compact, robust and low-cost system for molecular diagnosis based on time-resolved fluorescence spectroscopy, which serves as a general-purpose platform for the optical detection of a variety of biomarkers, bridging the gap between the laboratory and the POC of the fluorescence lifetime based bioassays. In particular, two systems with different levels of integration have been developed that combine a one-dimensional array of SPAD (Single-Photon Avalanch Diode) pixels capable of detecting a single photon, with an interchangeable microfluidic cartridge used to insert the sample and a laser diode Pulsed low-cost UV as a source of excitation. The contact-oriented design of the binomial formed by the sensor and the microfluidic, together with the timed operation of the sensors, makes it possible to dispense with the use of lenses and filters. In turn, custom packaging of the sensor chip allows the microfluidic cartridge to be positioned directly on the sensor array without any alignment procedure. Both systems have been validated, determining the decomposition time of quantum dots in 20 nl of solution for different concentrations, emulating a molecular test in a POC device.[cat] Tradicionalment, l'anĂ lisi molecular es realitza en laboratoris equipats amb instruments de sobretaula operats per tècnics especialitzats. Aquest paradigma ha anat canviant en les Ăşltimes dècades, a mesura que la tecnologia de biosensor s'ha tornat tan precisa com els instruments de sobretaula, proporcionant resultats en perĂodes molt mĂ©s curts de temps i miniaturitzant la instrumentaciĂł, permetent aixĂ, traslladar gradualment les proves de diagnòstic fora de laboratori central. No obstant això i malgrat els avantatges inherents de l'espectroscòpia de fluorescència resolta en el temps aplicada a la diagnosi molecular, no ha estat fins a l'Ăşltima dècada que s'han començat a desenvolupar dispositius POC (Point Of Care) basats en la detecciĂł de la fluorescència, degut al desafiament que suposa el desenvolupament de sensors espectroscòpics d'alt rendiment, portĂ tils i de baix cost. Aquesta tesi presenta el desenvolupament d'un sistema compacte, robust i de baix cost per al diagnòstic molecular basat en l'espectroscòpia de fluorescència resolta en el temps, que serveixi com a plataforma d'Ăşs general per a la detecciĂł òptica d'una varietat de biomarcadors, tancant la bretxa entre el laboratori i el POC dels bioassaigs basats en l'anĂ lisi de la pèrdua de la fluorescència. En particular, s'han desenvolupat dos sistemes amb diferents nivells d'integraciĂł que combinen una matriu unidimensional de pĂxels SPAD (Single-Photon Avalanch Diode) capaços de detectar un sol fotĂł, amb un cartutx microfluĂdic intercanviable emprat per inserir la mostra, aixĂ com un dĂode lĂ ser UV premut de baix cost com a font d'excitaciĂł. El disseny orientat a la detecciĂł per contacte de l'binomi format pel sensor i la microfluĂdica, juntament amb l'operaciĂł temporitzada dels sensors, permet prescindir de l'Ăşs de lents i filtres. Al seu torn, l'empaquetat a mida de l'xip sensor permet posicionar el cartutx microfluĂdic directament sobre la matriu de sensors sense cap procediment d'alineament. Tots dos sistemes han estat validats determinant el temps de descomposiciĂł de "quantum dots" en 20 nl de soluciĂł per a diferents concentracions, emulant aixĂ un assaig molecular en un dispositiu POC
Kamerajärjestelmän suunnan optimointi navigointitehtävässä
Navigation in an unknown environment consists of multiple separable subtasks, such as collecting information about the surroundings and navigating to the current goal. In the case of pure visual navigation, all these subtasks need to utilize the same vision system, and therefore a way to optimally control the direction of focus is needed. This thesis presents a case study, where the active sensing problem of directing the gaze of a mobile robot with three machine vision cameras is modeled as a partially observable Markov decision process (POMDP) using a mutual information (MI) based reward function. The key aspect of the solution is that the cameras are dynamically used either in monocular or stereo configuration.
The algorithms are implemented on Robot Operating System (ROS) and the benefits of using the proposed active sensing implementation over fixed stereo cameras are demonstrated with simulations experiments. The proposed active sensing outperforms the fixed camera solution when prior information about the environment is highly uncertain, and performs just as good in other tested scenarios.
---
Navigaatio ennalta tuntemattomassa ympäristössä koostuu useista erillisistä alitehtävistä kuten informaation keräämisestä ja tämänhetkiseen kohteeseen navigoinnista. Kun kyse on puhtaasti visuaalisesta navigoinnista, tarvitsee kaikkien alitehtävien hyödyntää samaa kamerajärjestelmää, joten kamerajärjestelmän suunnan optimointi on tarpeen. Tässä diplomityössä esitellään esimerkkitapaus, jossa kolmen mobiiliin robottiin kiinnitetyn kameran suunnan aktiivinen operointiongelma mallinnetaan osittain havaittavana Markov-päätösprosessina (POMDP), jossa käytetään keskinäisinformaatioon (MI) perustuvaa palkkiota. Olennainen osa ratkaisua on, että kameroita voidaan käyttää dynaamisesti sekä monokulaarisessa- että stereokamera-konfiguraatiossa.
Kehitetyt algoritmit implementoidaan Robot Operating System (ROS) -järjestelmälle ja kameroiden aktiivisen operoinnin hyödyt verrattuna kiinteästi asennettuihin stereokameroihin osoitetaan simulaatioilla. Kehitetty aktiivinen operointi suoriutuu kiinteitä kameroita paremmin kun ennakkotieto ympäristöstä on hyvin epävarmaa, ja muissa kokeilluissa tapauksissa vähintään yhtä hyvin
- …