256 research outputs found

    Object detection and recognition with event driven cameras

    Get PDF
    This thesis presents study, analysis and implementation of algorithms to perform object detection and recognition using an event-based cam era. This sensor represents a novel paradigm which opens a wide range of possibilities for future developments of computer vision. In partic ular it allows to produce a fast, compressed, illumination invariant output, which can be exploited for robotic tasks, where fast dynamics and signi\ufb01cant illumination changes are frequent. The experiments are carried out on the neuromorphic version of the iCub humanoid platform. The robot is equipped with a novel dual camera setup mounted directly in the robot\u2019s eyes, used to generate data with a moving camera. The motion causes the presence of background clut ter in the event stream. In such scenario the detection problem has been addressed with an at tention mechanism, speci\ufb01cally designed to respond to the presence of objects, while discarding clutter. The proposed implementation takes advantage of the nature of the data to simplify the original proto object saliency model which inspired this work. Successively, the recognition task was \ufb01rst tackled with a feasibility study to demonstrate that the event stream carries su\ufb03cient informa tion to classify objects and then with the implementation of a spiking neural network. The feasibility study provides the proof-of-concept that events are informative enough in the context of object classi\ufb01 cation, whereas the spiking implementation improves the results by employing an architecture speci\ufb01cally designed to process event data. The spiking network was trained with a three-factor local learning rule which overcomes weight transport, update locking and non-locality problem. The presented results prove that both detection and classi\ufb01cation can be carried-out in the target application using the event data

    Human Action Recognition from Various Data Modalities:A Review

    Get PDF
    Human Action Recognition (HAR), aiming to understand human behaviors and then assign category labels, has a wide range of applications, and thus has been attracting increasing attention in the field of computer vision. Generally, human actions can be represented using various data modalities, such as RGB, skeleton, depth, infrared sequence, point cloud, event stream, audio, acceleration, radar, and WiFi, etc., which encode different sources of useful yet distinct information and have various advantages and application scenarios. Consequently, lots of existing works have attempted to investigate different types of approaches for HAR using various modalities. In this paper, we give a comprehensive survey for HAR from the perspective of the input data modalities. Specifically, we review both the hand-crafted feature-based and deep learning-based methods for single data modalities, and also review the methods based on multiple modalities, including the fusion-based frameworks and the co-learning-based approaches. The current benchmark datasets for HAR are also introduced. Finally, we discuss some potentially important research directions in this area

    Novel computational methods for in vitro and in situ cryo-electron microscopy

    Get PDF
    Over the past decade, advances in microscope hardware and image data processing algorithms have made cryo-electron microscopy (cryo-EM) a dominant technique for protein structure determination. Near-atomic resolution can now be obtained for many challenging in vitro samples using single-particle analysis (SPA), while sub-tomogram averaging (STA) can obtain sub-nanometer resolution for large protein complexes in a crowded cellular environment. Reaching high resolution requires large amounts of im-age data. Modern transmission electron microscopes (TEMs) automate the acquisition process and can acquire thousands of micrographs or hundreds of tomographic tilt se-ries over several days without intervention. In a first step, the data must be pre-processed: Micrographs acquired as movies are cor-rected for stage and beam-induced motion. For tilt series, additional alignment of all micrographs in 3D is performed using gold- or patch-based fiducials. Parameters of the contrast-transfer function (CTF) are estimated to enable its reversal during SPA refine-ment. Finally, individual protein particles must be located and extracted from the aligned micrographs. Current pre-processing algorithms, especially those for particle picking, are not robust enough to enable fully unsupervised operation. Thus, pre-processing is start-ed after data collection, and takes several days due to the amount of supervision re-quired. Pre-processing the data in parallel to acquisition with more robust algorithms would save time and allow to discover bad samples and microscope settings early on. Warp is a new software for cryo-EM data pre-processing. It implements new algorithms for motion correction, CTF estimation, tomogram reconstruction, as well as deep learn-ing-based approaches to particle picking and image denoising. The algorithms are more accurate and robust, enabling unsupervised operation. Warp integrates all pre-processing steps into a pipeline that is executed on-the-fly during data collection. Inte-grated with SPA tools, the pipeline can produce 2D and 3D classes less than an hour into data collection for favorable samples. Here I describe the implementation of the new algorithms, and evaluate them on various movie and tilt series data sets. I show that un-supervised pre-processing of a tilted influenza hemagglutinin trimer sample with Warp and refinement in cryoSPARC can improve previously published resolution from 3.9 Å to 3.2 Å. Warp’s algorithms operate in a reference-free manner to improve the image resolution at the pre-processing stage when no high-resolution maps are available for the particles yet. Once 3D maps have been refined, they can be used to go back to the raw data and perform reference-based refinement of sample motion and CTF in movies and tilt series. M is a new tool I developed to solve this task in a multi-particle framework. Instead of following the SPA assumption that every particle is single and independent, M models all particles in a field of view as parts of a large, physically connected multi-particle system. This allows M to optimize hyper-parameters of the system, such as sample motion and deformation, or higher-order aberrations in the CTF. Because M models these effects accurately and optimizes all hyper-parameters simultaneously with particle alignments, it can surpass previous reference-based frame and tilt series alignment tools. Here I de-scribe the implementation of M, evaluate it on several data sets, and demonstrate that the new algorithms achieve equally high resolution with movie and tilt series data of the same sample. Most strikingly, the combination of Warp, RELION and M can resolve 70S ribosomes bound to an antibiotic at 3.5 Å inside vitrified Mycoplasma pneumoniae cells, marking a major advance in resolution for in situ imaging

    Sensory coding of complex visual motion in the locust (Locusta migratoria)

    Get PDF
    The visual environment of any animal is a complex amalgamation of sensory information (Lochmann and Deneve, 2011); however, it is adaptive for an animal to only react to salient cues (Zupanc, 2010). For many organisms, the detection of an approaching object, such as an oncoming conspecific or a predator, is particularly important. An approaching object with constant velocity is called looming, and has been widely studied for evoking avoidance behaviours in a number of animal species (Gibson, 1958). The migratory locust, Locusta migratoria, has been used extensively as a model system for visually guided behaviour, due to its robust collision-avoidance behaviours and its tractable nervous system (Schlotterer, 1977). The Lobula Giant Movement Detector (LGMD) and the Descending Contralateral Movement Detector (DCMD) constitute one pathway in the locust visual system that integrates the entire field of view that has been implicated in coordinating these types of behaviours (Santer et al., 2006). Previous studies have found that the LGMD/DCMD pathway responds to many visual stimuli, including complex scenes (Rind and Simmons, 1992), approaching paired objects (Guest and Gray, 2006), objects with compound shapes (Guest and Gray, 2006), and objects that follow compound trajectories (McMillan and Gray, 2012). These findings suggest that this pathway is capable of encoding complex motion such as exists in the locust’s natural environment. In my first objective (Chapter 2), I tested the response of the locust DCMD to increasingly complex motion. Using computer generated disks that followed compound trajectories with different velocities, I demonstrate that the DCMD is capable of encoding the location, trajectory, and velocity of an approaching object through aspects of the response profile over time. The motor systems of invertebrates are often controlled by ensembles of neurons working together (Dubuc et al., 2008; Hedrich et al., 2011; Gonzalez-Bellido et al., 2013). The locust visual system has at least five identified descending neurons, beyond the DCMD, that respond to visual motion (Rowell, 1971; Griss and Rowell, 1986; Gray et al., 2010). Due to the tractability of extracellular recordings of the DCMD, these neurons remain relatively little studied. Furthermore, their responses to stimuli have not been investigated concurrently. With recent advancements in multichannel recordings and spike sorting algorithms, it is now possible to explore the responses of multiple neurons in the locust system together. In my second objective (Chapter 3), I recorded from the connective of the locust using multichannel electrodes while challenging it with a wide array of visual stimuli. Preliminary results of these experiments identified as many as five neuronal units with distinctive firing patterns, some which appear to be novel. Together, these results illustrate that the locust visual system is more complex than previously thought, through both the abilities of a single neuron to encode many aspects of visual motion and the presence of multiple unique, visually-sensitive neurons

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Synaptic Learning for Neuromorphic Vision - Processing Address Events with Spiking Neural Networks

    Get PDF
    Das Gehirn übertrifft herkömmliche Computerarchitekturen in Bezug auf Energieeffizienz, Robustheit und Anpassungsfähigkeit. Diese Aspekte sind auch für neue Technologien wichtig. Es lohnt sich daher, zu untersuchen, welche biologischen Prozesse das Gehirn zu Berechnungen befähigen und wie sie in Silizium umgesetzt werden können. Um sich davon inspirieren zu lassen, wie das Gehirn Berechnungen durchführt, ist ein Paradigmenwechsel im Vergleich zu herkömmlichen Computerarchitekturen erforderlich. Tatsächlich besteht das Gehirn aus Nervenzellen, Neuronen genannt, die über Synapsen miteinander verbunden sind und selbstorganisierte Netzwerke bilden. Neuronen und Synapsen sind komplexe dynamische Systeme, die durch biochemische und elektrische Reaktionen gesteuert werden. Infolgedessen können sie ihre Berechnungen nur auf lokale Informationen stützen. Zusätzlich kommunizieren Neuronen untereinander mit kurzen elektrischen Impulsen, den so genannten Spikes, die sich über Synapsen bewegen. Computational Neuroscientists versuchen, diese Berechnungen mit spikenden neuronalen Netzen zu modellieren. Wenn sie auf dedizierter neuromorpher Hardware implementiert werden, können spikende neuronale Netze wie das Gehirn schnelle, energieeffiziente Berechnungen durchführen. Bis vor kurzem waren die Vorteile dieser Technologie aufgrund des Mangels an funktionellen Methoden zur Programmierung von spikenden neuronalen Netzen begrenzt. Lernen ist ein Paradigma für die Programmierung von spikenden neuronalen Netzen, bei dem sich Neuronen selbst zu funktionalen Netzen organisieren. Wie im Gehirn basiert das Lernen in neuromorpher Hardware auf synaptischer Plastizität. Synaptische Plastizitätsregeln charakterisieren Gewichtsaktualisierungen im Hinblick auf Informationen, die lokal an der Synapse anliegen. Das Lernen geschieht also kontinuierlich und online, während sensorischer Input in das Netzwerk gestreamt wird. Herkömmliche tiefe neuronale Netze werden üblicherweise durch Gradientenabstieg trainiert. Die durch die biologische Lerndynamik auferlegten Einschränkungen verhindern jedoch die Verwendung der konventionellen Backpropagation zur Berechnung der Gradienten. Beispielsweise behindern kontinuierliche Aktualisierungen den synchronen Wechsel zwischen Vorwärts- und Rückwärtsphasen. Darüber hinaus verhindern Gedächtnisbeschränkungen, dass die Geschichte der neuronalen Aktivität im Neuron gespeichert wird, so dass Verfahren wie Backpropagation-Through-Time nicht möglich sind. Neuartige Lösungen für diese Probleme wurden von Computational Neuroscientists innerhalb des Zeitrahmens dieser Arbeit vorgeschlagen. In dieser Arbeit werden spikende neuronaler Netzwerke entwickelt, um Aufgaben der visuomotorischen Neurorobotik zu lösen. In der Tat entwickelten sich biologische neuronale Netze ursprünglich zur Steuerung des Körpers. Die Robotik stellt also den künstlichen Körper für das künstliche Gehirn zur Verfügung. Auf der einen Seite trägt diese Arbeit zu den gegenwärtigen Bemühungen um das Verständnis des Gehirns bei, indem sie schwierige Closed-Loop-Benchmarks liefert, ähnlich dem, was dem biologischen Gehirn widerfährt. Auf der anderen Seite werden neue Wege zur Lösung traditioneller Robotik Probleme vorgestellt, die auf vom Gehirn inspirierten Paradigmen basieren. Die Forschung wird in zwei Schritten durchgeführt. Zunächst werden vielversprechende synaptische Plastizitätsregeln identifiziert und mit ereignisbasierten Vision-Benchmarks aus der realen Welt verglichen. Zweitens werden neuartige Methoden zur Abbildung visueller Repräsentationen auf motorische Befehle vorgestellt. Neuromorphe visuelle Sensoren stellen einen wichtigen Schritt auf dem Weg zu hirninspirierten Paradigmen dar. Im Gegensatz zu herkömmlichen Kameras senden diese Sensoren Adressereignisse aus, die lokalen Änderungen der Lichtintensität entsprechen. Das ereignisbasierte Paradigma ermöglicht eine energieeffiziente und schnelle Bildverarbeitung, erfordert aber die Ableitung neuer asynchroner Algorithmen. Spikende neuronale Netze stellen eine Untergruppe von asynchronen Algorithmen dar, die vom Gehirn inspiriert und für neuromorphe Hardwaretechnologie geeignet sind. In enger Zusammenarbeit mit Computational Neuroscientists werden erfolgreiche Methoden zum Erlernen räumlich-zeitlicher Abstraktionen aus der Adressereignisdarstellung berichtet. Es wird gezeigt, dass Top-Down-Regeln der synaptischen Plastizität, die zur Optimierung einer objektiven Funktion abgeleitet wurden, die Bottom-Up-Regeln übertreffen, die allein auf Beobachtungen im Gehirn basieren. Mit dieser Einsicht wird eine neue synaptische Plastizitätsregel namens "Deep Continuous Local Learning" eingeführt, die derzeit den neuesten Stand der Technik bei ereignisbasierten Vision-Benchmarks erreicht. Diese Regel wurde während eines Aufenthalts an der Universität von Kalifornien, Irvine, gemeinsam abgeleitet, implementiert und evaluiert. Im zweiten Teil dieser Arbeit wird der visuomotorische Kreis geschlossen, indem die gelernten visuellen Repräsentationen auf motorische Befehle abgebildet werden. Drei Ansätze werden diskutiert, um ein visuomotorisches Mapping zu erhalten: manuelle Kopplung, Belohnungs-Kopplung und Minimierung des Vorhersagefehlers. Es wird gezeigt, wie diese Ansätze, welche als synaptische Plastizitätsregeln implementiert sind, verwendet werden können, um einfache Strategien und Bewegungen zu lernen. Diese Arbeit ebnet den Weg zur Integration von hirninspirierten Berechnungsparadigmen in das Gebiet der Robotik. Es wird sogar prognostiziert, dass Fortschritte in den neuromorphen Technologien und bei den Plastizitätsregeln die Entwicklung von Hochleistungs-Lernrobotern mit geringem Energieverbrauch ermöglicht

    Visual Cortex

    Get PDF
    The neurosciences have experienced tremendous and wonderful progress in many areas, and the spectrum encompassing the neurosciences is expansive. Suffice it to mention a few classical fields: electrophysiology, genetics, physics, computer sciences, and more recently, social and marketing neurosciences. Of course, this large growth resulted in the production of many books. Perhaps the visual system and the visual cortex were in the vanguard because most animals do not produce their own light and offer thus the invaluable advantage of allowing investigators to conduct experiments in full control of the stimulus. In addition, the fascinating evolution of scientific techniques, the immense productivity of recent research, and the ensuing literature make it virtually impossible to publish in a single volume all worthwhile work accomplished throughout the scientific world. The days when a single individual, as Diderot, could undertake the production of an encyclopedia are gone forever. Indeed most approaches to studying the nervous system are valid and neuroscientists produce an almost astronomical number of interesting data accompanied by extremely worthy hypotheses which in turn generate new ventures in search of brain functions. Yet, it is fully justified to make an encore and to publish a book dedicated to visual cortex and beyond. Many reasons validate a book assembling chapters written by active researchers. Each has the opportunity to bind together data and explore original ideas whose fate will not fall into the hands of uncompromising reviewers of traditional journals. This book focuses on the cerebral cortex with a large emphasis on vision. Yet it offers the reader diverse approaches employed to investigate the brain, for instance, computer simulation, cellular responses, or rivalry between various targets and goal directed actions. This volume thus covers a large spectrum of research even though it is impossible to include all topics in the extremely diverse field of neurosciences

    Efficient and Accurate Segmentation of Defects in Industrial CT Scans

    Get PDF
    Industrial computed tomography (CT) is an elementary tool for the non-destructive inspection of cast light-metal or plastic parts. A comprehensive testing not only helps to ensure the stability and durability of a part, it also allows reducing the rejection rate by supporting the optimization of the casting process and to save material (and weight) by producing equivalent but more filigree structures. With a CT scan it is theoretically possible to locate any defect in the part under examination and to exactly determine its shape, which in turn helps to draw conclusions about its harmfulness. However, most of the time the data quality is not good enough to allow segmenting the defects with simple filter-based methods which directly operate on the gray-values—especially when the inspection is expanded to the entire production. In such in-line inspection scenarios the tight cycle times further limit the available time for the acquisition of the CT scan, which renders them noisy and prone to various artifacts. In recent years, dramatic advances in deep learning (and convolutional neural networks in particular) made even the reliable detection of small objects in cluttered scenes possible. These methods are a promising approach to quickly yield a reliable and accurate defect segmentation even in unfavorable CT scans. The huge drawback: a lot of precisely labeled training data is required, which is utterly challenging to obtain—particularly in the case of the detection of tiny defects in huge, highly artifact-afflicted, three-dimensional voxel data sets. Hence, a significant part of this work deals with the acquisition of precisely labeled training data. Firstly, we consider facilitating the manual labeling process: our experts annotate on high-quality CT scans with a high spatial resolution and a high contrast resolution and we then transfer these labels to an aligned ``normal'' CT scan of the same part, which holds all the challenging aspects we expect in production use. Nonetheless, due to the indecisiveness of the labeling experts about what to annotate as defective, the labels remain fuzzy. Thus, we additionally explore different approaches to generate artificial training data, for which a precise ground truth can be computed. We find an accurate labeling to be crucial for a proper training. We evaluate (i) domain randomization which simulates a super-set of reality with simple transformations, (ii) generative models which are trained to produce samples of the real-world data distribution, and (iii) realistic simulations which capture the essential aspects of real CT scans. Here, we develop a fully automated simulation pipeline which provides us with an arbitrary amount of precisely labeled training data. First, we procedurally generate virtual cast parts in which we place reasonable artificial casting defects. Then, we realistically simulate CT scans which include typical CT artifacts like scatter, noise, cupping, and ring artifacts. Finally, we compute a precise ground truth by determining for each voxel the overlap with the defect mesh. To determine whether our realistically simulated CT data is eligible to serve as training data for machine learning methods, we compare the prediction performance of learning-based and non-learning-based defect recognition algorithms on the simulated data and on real CT scans. In an extensive evaluation, we compare our novel deep learning method to a baseline of image processing and traditional machine learning algorithms. This evaluation shows how much defect detection benefits from learning-based approaches. In particular, we compare (i) a filter-based anomaly detection method which finds defect indications by subtracting the original CT data from a generated ``defect-free'' version, (ii) a pixel-classification method which, based on densely extracted hand-designed features, lets a random forest decide about whether an image element is part of a defect or not, and (iii) a novel deep learning method which combines a U-Net-like encoder-decoder-pair of three-dimensional convolutions with an additional refinement step. The encoder-decoder-pair yields a high recall, which allows us to detect even very small defect instances. The refinement step yields a high precision by sorting out the false positive responses. We extensively evaluate these models on our realistically simulated CT scans as well as on real CT scans in terms of their probability of detection, which tells us at which probability a defect of a given size can be found in a CT scan of a given quality, and their intersection over union, which gives us information about how precise our segmentation mask is in general. While the learning-based methods clearly outperform the image processing method, the deep learning method in particular convinces by its inference speed and its prediction performance on challenging CT scans—as they, for example, occur in in-line scenarios. Finally, we further explore the possibilities and the limitations of the combination of our fully automated simulation pipeline and our deep learning model. With the deep learning method yielding reliable results for CT scans of low data quality, we examine by how much we can reduce the scan time while still maintaining proper segmentation results. Then, we take a look on the transferability of the promising results to CT scans of parts of different materials and different manufacturing techniques, including plastic injection molding, iron casting, additive manufacturing, and composed multi-material parts. Each of these tasks comes with its own challenges like an increased artifact-level or different types of defects which occasionally are hard to detect even for the human eye. We tackle these challenges by employing our simulation pipeline to produce virtual counterparts that capture the tricky aspects and fine-tuning the deep learning method on this additional training data. With that we can tailor our approach towards specific tasks, achieving reliable and robust segmentation results even for challenging data. Lastly, we examine if the deep learning method, based on our realistically simulated training data, can be trained to distinguish between different types of defects—the reason why we require a precise segmentation in the first place—and we examine if the deep learning method can detect out-of-distribution data where its predictions become less trustworthy, i.e. an uncertainty estimation

    Advanced Sensors for Real-Time Monitoring Applications

    Get PDF
    It is impossible to imagine the modern world without sensors, or without real-time information about almost everything—from local temperature to material composition and health parameters. We sense, measure, and process data and act accordingly all the time. In fact, real-time monitoring and information is key to a successful business, an assistant in life-saving decisions that healthcare professionals make, and a tool in research that could revolutionize the future. To ensure that sensors address the rapidly developing needs of various areas of our lives and activities, scientists, researchers, manufacturers, and end-users have established an efficient dialogue so that the newest technological achievements in all aspects of real-time sensing can be implemented for the benefit of the wider community. This book documents some of the results of such a dialogue and reports on advances in sensors and sensor systems for existing and emerging real-time monitoring applications
    • …
    corecore