Search CORE

988 research outputs found

E^2TAD: An Energy-Efficient Tracking-based Action Detector

Author: Fan Siqi
Hu Xin
Hu Zhenyu
Hua Gang
Long Taiyu
Miao Hao-Yu
Pi Pengcheng
Ren Zhou
Wang Zhangyang
Wu Yi
Wu Zhenyu
Publication venue
Publication date: 09/04/2022
Field of study

Video action detection (spatio-temporal action localization) is usually the starting point for human-centric intelligent analysis of videos nowadays. It has high practical impacts for many applications across robotics, security, healthcare, etc. The two-stage paradigm of Faster R-CNN inspires a standard paradigm of video action detection in object detection, i.e., firstly generating person proposals and then classifying their actions. However, none of the existing solutions could provide fine-grained action detection to the "who-when-where-what" level. This paper presents a tracking-based solution to accurately and efficiently localize predefined key actions spatially (by predicting the associated target IDs and locations) and temporally (by predicting the time in exact frame indices). This solution won first place in the UAV-Video Track of 2021 Low-Power Computer Vision Challenge (LPCVC)

arXiv.org e-Print Archive

Event-based Vision: A Survey

Author: Bartolozzi Chiara
Censi Andrea
Conradt Joerg
Daniilidis Kostas
Davison Andrew
Delbruck Tobi
Gallego Guillermo
Leutenegger Stefan
Orchard Garrick
Scaramuzza Davide
Taba Brian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Event cameras are bio-inspired sensors that differ from conventional frame cameras: Instead of capturing images at a fixed rate, they asynchronously measure per-pixel brightness changes, and output a stream of events that encode the time, location and sign of the brightness changes. Event cameras offer attractive properties compared to traditional cameras: high temporal resolution (in the order of microseconds), very high dynamic range (140 dB vs. 60 dB), low power consumption, and high pixel bandwidth (on the order of kHz) resulting in reduced motion blur. Hence, event cameras have a large potential for robotics and computer vision in challenging scenarios for traditional cameras, such as low-latency, high speed, and high dynamic range. However, novel methods are required to process the unconventional output of these sensors in order to unlock their potential. This paper provides a comprehensive overview of the emerging field of event-based vision, with a focus on the applications and the algorithms developed to unlock the outstanding properties of event cameras. We present event cameras from their working principle, the actual sensors that are available and the tasks that they have been used for, from low-level vision (feature detection and tracking, optic flow, etc.) to high-level vision (reconstruction, segmentation, recognition). We also discuss the techniques developed to process events, including learning-based techniques, as well as specialized processors for these novel sensors, such as spiking neural networks. Additionally, we highlight the challenges that remain to be tackled and the opportunities that lie ahead in the search for a more efficient, bio-inspired way for machines to perceive and interact with the world

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

ZORA

A Methodology for Extracting Human Bodies from Still Images

Author: Tsitsoulis Athanasios
Publication venue: CORE Scholar
Publication date: 01/01/2013
Field of study

Monitoring and surveillance of humans is one of the most prominent applications of today and it is expected to be part of many future aspects of our life, for safety reasons, assisted living and many others. Many efforts have been made towards automatic and robust solutions, but the general problem is very challenging and remains still open. In this PhD dissertation we examine the problem from many perspectives. First, we study the performance of a hardware architecture designed for large-scale surveillance systems. Then, we focus on the general problem of human activity recognition, present an extensive survey of methodologies that deal with this subject and propose a maturity metric to evaluate them. One of the numerous and most popular algorithms for image processing found in the field is image segmentation and we propose a blind metric to evaluate their results regarding the activity at local regions. Finally, we propose a fully automatic system for segmenting and extracting human bodies from challenging single images, which is the main contribution of the dissertation. Our methodology is a novel bottom-up approach relying mostly on anthropometric constraints and is facilitated by our research in the fields of face, skin and hands detection. Experimental results and comparison with state-of-the-art methodologies demonstrate the success of our approach

CORE

Vulnerability assessment in the use of biometrics in unsupervised environments

Author: Husseis Anas Hussein Ahmad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/05/2021
Field of study

Mención Internacional en el título de doctorIn the last few decades, we have witnessed a large-scale deployment of biometric systems in different life applications replacing the traditional recognition methods such as passwords and tokens. We approached a time where we use biometric systems in our daily life. On a personal scale, the authentication to our electronic devices (smartphones, tablets, laptops, etc.) utilizes biometric characteristics to provide access permission. Moreover, we access our bank accounts, perform various types of payments and transactions using the biometric sensors integrated into our devices. On the other hand, different organizations, companies, and institutions use biometric-based solutions for access control. On the national scale, police authorities and border control measures use biometric recognition devices for individual identification and verification purposes. Therefore, biometric systems are relied upon to provide a secured recognition where only the genuine user can be recognized as being himself. Moreover, the biometric system should ensure that an individual cannot be identified as someone else. In the literature, there are a surprising number of experiments that show the possibility of stealing someone’s biometric characteristics and use it to create an artificial biometric trait that can be used by an attacker to claim the identity of the genuine user. There were also real cases of people who successfully fooled the biometric recognition system in airports and smartphones [1]–[3]. That urges the necessity to investigate the potential threats and propose countermeasures that ensure high levels of security and user convenience. Consequently, performing security evaluations is vital to identify: (1) the security flaws in biometric systems, (2) the possible threats that may target the defined flaws, and (3) measurements that describe the technical competence of the biometric system security. Identifying the system vulnerabilities leads to proposing adequate security solutions that assist in achieving higher integrity. This thesis aims to investigate the vulnerability of fingerprint modality to presentation attacks in unsupervised environments, then implement mechanisms to detect those attacks and avoid the misuse of the system. To achieve these objectives, the thesis is carried out in the following three phases. In the first phase, the generic biometric system scheme is studied by analyzing the vulnerable points with special attention to the vulnerability to presentation attacks. The study reviews the literature in presentation attack and the corresponding solutions, i.e. presentation attack detection mechanisms, for six biometric modalities: fingerprint, face, iris, vascular, handwritten signature, and voice. Moreover, it provides a new taxonomy for presentation attack detection mechanisms. The proposed taxonomy helps to comprehend the issue of presentation attacks and how the literature tried to address it. The taxonomy represents a starting point to initialize new investigations that propose novel presentation attack detection mechanisms. In the second phase, an evaluation methodology is developed from two sources: (1) the ISO/IEC 30107 standard, and (2) the Common Evaluation Methodology by the Common Criteria. The developed methodology characterizes two main aspects of the presentation attack detection mechanism: (1) the resistance of the mechanism to presentation attacks, and (2) the corresponding threat of the studied attack. The first part is conducted by showing the mechanism's technical capabilities and how it influences the security and ease-of-use of the biometric system. The second part is done by performing a vulnerability assessment considering all the factors that affect the attack potential. Finally, a data collection is carried out, including 7128 fingerprint videos of bona fide and attack presentation. The data is collected using two sensing technologies, two presentation scenarios, and considering seven attack species. The database is used to develop dynamic presentation attack detection mechanisms that exploit the fingerprint spatio-temporal features. In the final phase, a set of novel presentation attack detection mechanisms is developed exploiting the dynamic features caused by the natural fingerprint phenomena such as perspiration and elasticity. The evaluation results show an efficient capability to detect attacks where, in some configurations, the mechanisms are capable of eliminating some attack species and mitigating the rest of the species while keeping the user convenience at a high level.En las últimas décadas, hemos asistido a un despliegue a gran escala de los sistemas biométricos en diferentes aplicaciones de la vida cotidiana, sustituyendo a los métodos de reconocimiento tradicionales, como las contraseñas y los tokens. Actualmente los sistemas biométricos ya forman parte de nuestra vida cotidiana: es habitual emplear estos sistemas para que nos proporcionen acceso a nuestros dispositivos electrónicos (teléfonos inteligentes, tabletas, ordenadores portátiles, etc.) usando nuestras características biométricas. Además, accedemos a nuestras cuentas bancarias, realizamos diversos tipos de pagos y transacciones utilizando los sensores biométricos integrados en nuestros dispositivos. Por otra parte, diferentes organizaciones, empresas e instituciones utilizan soluciones basadas en la biometría para el control de acceso. A escala nacional, las autoridades policiales y de control fronterizo utilizan dispositivos de reconocimiento biométrico con fines de identificación y verificación individual. Por lo tanto, en todas estas aplicaciones se confía en que los sistemas biométricos proporcionen un reconocimiento seguro en el que solo el usuario genuino pueda ser reconocido como tal. Además, el sistema biométrico debe garantizar que un individuo no pueda ser identificado como otra persona. En el estado del arte, hay un número sorprendente de experimentos que muestran la posibilidad de robar las características biométricas de alguien, y utilizarlas para crear un rasgo biométrico artificial que puede ser utilizado por un atacante con el fin de reclamar la identidad del usuario genuino. También se han dado casos reales de personas que lograron engañar al sistema de reconocimiento biométrico en aeropuertos y teléfonos inteligentes [1]–[3]. Esto hace que sea necesario investigar estas posibles amenazas y proponer contramedidas que garanticen altos niveles de seguridad y comodidad para el usuario. En consecuencia, es vital la realización de evaluaciones de seguridad para identificar (1) los fallos de seguridad de los sistemas biométricos, (2) las posibles amenazas que pueden explotar estos fallos, y (3) las medidas que aumentan la seguridad del sistema biométrico reduciendo estas amenazas. La identificación de las vulnerabilidades del sistema lleva a proponer soluciones de seguridad adecuadas que ayuden a conseguir una mayor integridad. Esta tesis tiene como objetivo investigar la vulnerabilidad en los sistemas de modalidad de huella dactilar a los ataques de presentación en entornos no supervisados, para luego implementar mecanismos que permitan detectar dichos ataques y evitar el mal uso del sistema. Para lograr estos objetivos, la tesis se desarrolla en las siguientes tres fases. En la primera fase, se estudia el esquema del sistema biométrico genérico analizando sus puntos vulnerables con especial atención a los ataques de presentación. El estudio revisa la literatura sobre ataques de presentación y las soluciones correspondientes, es decir, los mecanismos de detección de ataques de presentación, para seis modalidades biométricas: huella dactilar, rostro, iris, vascular, firma manuscrita y voz. Además, se proporciona una nueva taxonomía para los mecanismos de detección de ataques de presentación. La taxonomía propuesta ayuda a comprender el problema de los ataques de presentación y la forma en que la literatura ha tratado de abordarlo. Esta taxonomía presenta un punto de partida para iniciar nuevas investigaciones que propongan novedosos mecanismos de detección de ataques de presentación. En la segunda fase, se desarrolla una metodología de evaluación a partir de dos fuentes: (1) la norma ISO/IEC 30107, y (2) Common Evaluation Methodology por el Common Criteria. La metodología desarrollada considera dos aspectos importantes del mecanismo de detección de ataques de presentación (1) la resistencia del mecanismo a los ataques de presentación, y (2) la correspondiente amenaza del ataque estudiado. Para el primer punto, se han de señalar las capacidades técnicas del mecanismo y cómo influyen en la seguridad y la facilidad de uso del sistema biométrico. Para el segundo aspecto se debe llevar a cabo una evaluación de la vulnerabilidad, teniendo en cuenta todos los factores que afectan al potencial de ataque. Por último, siguiendo esta metodología, se lleva a cabo una recogida de datos que incluye 7128 vídeos de huellas dactilares genuinas y de presentación de ataques. Los datos se recogen utilizando dos tecnologías de sensor, dos escenarios de presentación y considerando siete tipos de instrumentos de ataque. La base de datos se utiliza para desarrollar y evaluar mecanismos dinámicos de detección de ataques de presentación que explotan las características espacio-temporales de las huellas dactilares. En la fase final, se desarrolla un conjunto de mecanismos novedosos de detección de ataques de presentación que explotan las características dinámicas causadas por los fenómenos naturales de las huellas dactilares, como la transpiración y la elasticidad. Los resultados de la evaluación muestran una capacidad eficiente de detección de ataques en la que, en algunas configuraciones, los mecanismos son capaces de eliminar completamente algunos tipos de instrumentos de ataque y mitigar el resto de los tipos manteniendo la comodidad del usuario en un nivel alto.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Cristina Conde Vila.- Secretario: Mariano López García.- Vocal: Farzin Derav

Universidad Carlos III de Madrid e-Archivo

Towards a High Quality Real-Time Graphics Pipeline

Author: Munkberg Jacob
Publication venue
Publication date: 01/01/2011
Field of study

Modern graphics hardware pipelines create photorealistic images with high geometric complexity in real time. The quality is constantly improving and advanced techniques from feature film visual effects, such as high dynamic range images and support for higher-order surface primitives, have recently been adopted. Visual effect techniques have large computational costs and significant memory bandwidth usage. In this thesis, we identify three problem areas and propose new algorithms that increase the performance of a set of computer graphics techniques. Our main focus is on efficient algorithms for the real-time graphics pipeline, but parts of our research are equally applicable to offline rendering. Our first focus is texture compression, which is a technique to reduce the memory bandwidth usage. The core idea is to store images in small compressed blocks which are sent over the memory bus and are decompressed on-the-fly when accessed. We present compression algorithms for two types of texture formats. High dynamic range images capture environment lighting with luminance differences over a wide intensity range. Normal maps store perturbation vectors for local surface normals, and give the illusion of high geometric surface detail. Our compression formats are tailored to these texture types and have compression ratios of 6:1, high visual fidelity, and low-cost decompression logic. Our second focus is tessellation culling. Culling is a commonly used technique in computer graphics for removing work that does not contribute to the final image, such as completely hidden geometry. By discarding rendering primitives from further processing, substantial arithmetic computations and memory bandwidth can be saved. Modern graphics processing units include flexible tessellation stages, where rendering primitives are subdivided for increased geometric detail. Images with highly detailed models can be synthesized, but the incurred cost is significant. We have devised a simple remapping technique that allowsfor better tessellation distribution in screen space. Furthermore, we present programmable tessellation culling, where bounding volumes for displaced geometry are computed and used to conservatively test if a primitive can be discarded before tessellation. We introduce a general tessellation culling framework, and an optimized algorithm for rendering of displaced Bézier patches, which is expected to be a common use case for graphics hardware tessellation. Our third and final focus is forward-looking, and relates to efficient algorithms for stochastic rasterization, a rendering technique where camera effects such as depth of field and motion blur can be faithfully simulated. We extend a graphics pipeline with stochastic rasterization in spatio-temporal space and show that stochastic motion blur can be rendered with rather modest pipeline modifications. Furthermore, backface culling algorithms for motion blur and depth of field rendering are presented, which are directly applicable to stochastic rasterization. Hopefully, our work in this field brings us closer to high quality real-time stochastic rendering

Lund University Publications

A Few Days of A Robot's Life in the Human's World: Toward Incremental Individual Recognition

Author: Aryananda Lijin
Publication venue
Publication date: 01/01/2007
Field of study

PhD thesisThis thesis presents an integrated framework and implementation for Mertz, an expressive robotic creature for exploring the task of face recognition through natural interaction in an incremental and unsupervised fashion. The goal of this thesis is to advance toward a framework which would allow robots to incrementally ``get to know'' a set of familiar individuals in a natural and extendable way. This thesis is motivated by the increasingly popular goal of integrating robots in the home. In order to be effective in human-centric tasks, the robots must be able to not only recognize each family member, but also to learn about the roles of various people in the household.In this thesis, we focus on two particular limitations of the current technology. Firstly, most of face recognition research concentrate on the supervised classification problem. Currently, one of the biggest problems in face recognition is how to generalize the system to be able to recognize new test data that vary from the training data. Thus, until this problem is solved completely, the existing supervised approaches may require multiple manual introduction and labelling sessions to include training data with enough variations. Secondly, there is typically a large gap between research prototypes and commercial products, largely due to lack of robustness and scalability to different environmental settings.In this thesis, we propose an unsupervised approach which wouldallow for a more adaptive system which can incrementally update thetraining set with more recent data or new individuals over time.Moreover, it gives the robots a more natural {\em socialrecognition} mechanism to learn not only to recognize each person'sappearance, but also to remember some relevant contextualinformation that the robot observed during previous interactionsessions. Therefore, this thesis focuses on integrating anunsupervised and incremental face recognition system within aphysical robot which interfaces directly with humans through naturalsocial interaction. The robot autonomously detects, tracks, andsegments face images during these interactions and automaticallygenerates a training set for its face recognition system. Moreover,in order to motivate robust solutions and address scalabilityissues, we chose to put the robot, Mertz, in unstructured publicenvironments to interact with naive passersby, instead of with onlythe researchers within the laboratory environment.While an unsupervised and incremental face recognition system is acrucial element toward our target goal, it is only a part of thestory. A face recognition system typically receives eitherpre-recorded face images or a streaming video from a static camera.As illustrated an ACLU review of a commercial face recognitioninstallation, a security application which interfaces with thelatter is already very challenging. In this case, our target goalis a robot that can recognize people in a home setting. Theinterface between robots and humans is even more dynamic. Both therobots and the humans move around.We present the robot implementation and its unsupervised incremental face recognition framework. We describe analgorithm for clustering local features extracted from a large set of automatically generated face data. We demonstrate the robot's capabilities and limitations in a series of experiments at a public lobby. In a final experiment, the robot interacted with a few hundred individuals in an eight day period and generated a training set of over a hundred thousand face images. We evaluate the clustering algorithm performance across a range of parameters on this automatically generated training data and also the Honda-UCSD video face database. Lastly, we present some recognition results using the self-labelled clusters

DSpace@MIT