376 research outputs found

    Using Prior Knowledge for Verification and Elimination of Stationary and Variable Objects in Real-time Images

    Get PDF
    With the evolving technologies in the autonomous vehicle industry, now it has become possible for automobile passengers to sit relaxed instead of driving the car. Technologies like object detection, object identification, and image segmentation have enabled an autonomous car to identify and detect an object on the road in order to drive safely. While an autonomous car drives by itself on the road, the types of objects surrounding the car can be dynamic (e.g., cars and pedestrians), stationary (e.g., buildings and benches), and variable (e.g., trees) depending on if the location or shape of an object changes or not. Different from the existing image-based approaches to detect and recognize objects in the scene, in this research 3D virtual world is employed to verify and eliminate stationary and variable objects to allow the autonomous car to focus on dynamic objects that may cause danger to its driving. This methodology takes advantage of prior knowledge of stationary and variable objects presented in a virtual city and verifies their existence in a real-time scene by matching keypoints between the virtual and real objects. In case of a stationary or variable object that does not exist in the virtual world due to incomplete pre-existing information, this method uses machine learning for object detection. Verified objects are then removed from the real-time image with a combined algorithm using contour detection and class activation map (CAM), which helps to enhance the efficiency and accuracy when recognizing moving objects

    Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System

    Get PDF
    Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome. We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization

    Automated Facial Anthropometry Over 3D Face Surface Textured Meshes

    Get PDF
    The automation of human face measurement means facing mayor technical and technological challenges. The use of 3D scanning technology is widely accepted in the scientific community and it offers the possibility of developing non-invasive measurement techniques. However, the selection of the points that form the basis of the measurements is a task that still requires human intervention. This work introduces digital image processing methods for automatic localization of facial features. The first goal was to examine different ways to represent 3D shapes and to evaluate whether these could be used as representative features of facial attributes, in order to locate them automatically. Based on the above, a non-rigid registration procedure was developed to estimate dense point-to-point correspondence between two surfaces. The method is able to register 3D models of faces in the presence of facial expressions. Finally, a method that uses both shape and appearance information of the surface, was designed for automatic localization of a set of facial features that are the basis for determining anthropometric ratios, which are widely used in fields such as ergonomics, forensics, surgical planning, among othersResumen : La automatización de la medición del rostro humano implica afrontar grandes desafíos técnicos y tecnológicos. Una alternativa de solución que ha encontrado gran aceptación dentro de la comunidad científica, corresponde a la utilización de tecnología de digitalización 3D con lo cual ha sido posible el desarrollo de técnicas de medición no invasivas. Sin embargo, la selección de los puntos que son la base de las mediciones es una tarea que aún requiere de la intervención humana. En este trabajo se presentan métodos de procesamiento digital de imágenes para la localización automática de características faciales. Lo primero que se hizo fue estudiar diversas formas de representar la forma en 3D y cómo estas podían contribuir como características representativas de los atributos faciales con el fin de poder ubicarlos automáticamente. Con base en lo anterior, se desarrolló un método para la estimación de correspondencia densa entre dos superficies a partir de un procedimiento de registro no rígido, el cual se enfocó a modelos de rostros 3D en presencia de expresiones faciales. Por último, se plantea un método, que utiliza tanto información de la forma como de la apariencia de las superficies, para la localización automática de un conjunto de características faciales que son la base para determinar índices antropométricos ampliamente utilizados en campos tales como la ergonomía, ciencias forenses, planeación quirúrgica, entre otrosDoctorad

    Self-adjusted active contours using multi-directional texture cues

    Full text link
    Parameterization is an open issue in active contour research, associated with the cumbersome and time-consuming process of empirical adjustment. This work introduces a novel framework for self-adjustment of region-based active contours, based on multi-directional texture cues. The latter are mined by applying filtering transforms characterized by multi-resolution, anisotropy, localization and directionality. This process yields to entropy-based image “heatmaps”, used to weight the regularization and data fidelity terms, which guide contour evolution. Experimental evaluation is performed on a large benchmark dataset as well as on textured images. Τhe segmentation results demonstrate that the proposed framework is capable of accelerating contour convergence, maintaining a segmentation quality which is comparable to the one obtained by empirically adjusted active contours

    High-quality face capture, animation and editing from monocular video

    Get PDF
    Digitization of virtual faces in movies requires complex capture setups and extensive manual work to produce superb animations and video-realistic editing. This thesis pushes the boundaries of the digitization pipeline by proposing automatic algorithms for high-quality 3D face capture and animation, as well as photo-realistic face editing. These algorithms reconstruct and modify faces in 2D videos recorded in uncontrolled scenarios and illumination. In particular, advances in three main areas offer solutions for the lack of depth and overall uncertainty in video recordings. First, contributions in capture include model-based reconstruction of detailed, dynamic 3D geometry that exploits optical and shading cues, multilayer parametric reconstruction of accurate 3D models in unconstrained setups based on inverse rendering, and regression-based 3D lip shape enhancement from high-quality data. Second, advances in animation are video-based face reenactment based on robust appearance metrics and temporal clustering, performance-driven retargeting of detailed facial models in sync with audio, and the automatic creation of personalized controllable 3D rigs. Finally, advances in plausible photo-realistic editing are dense face albedo capture and mouth interior synthesis using image warping and 3D teeth proxies. High-quality results attained on challenging application scenarios confirm the contributions and show great potential for the automatic creation of photo-realistic 3D faces.Die Digitalisierung von Gesichtern zum Einsatz in der Filmindustrie erfordert komplizierte Aufnahmevorrichtungen und die manuelle Nachbearbeitung von Rekonstruktionen, um perfekte Animationen und realistische Videobearbeitung zu erzielen. Diese Dissertation erweitert vorhandene Digitalisierungsverfahren durch die Erforschung von automatischen Verfahren zur qualitativ hochwertigen 3D Rekonstruktion, Animation und Modifikation von Gesichtern. Diese Algorithmen erlauben es, Gesichter in 2D Videos, die unter allgemeinen Bedingungen und unbekannten Beleuchtungsverhältnissen aufgenommen wurden, zu rekonstruieren und zu modifizieren. Vor allem Fortschritte in den folgenden drei Hauptbereichen tragen zur Kompensation von fehlender Tiefeninformation und der allgemeinen Mehrdeutigkeit von 2D Videoaufnahmen bei. Erstens, Beiträge zur modellbasierten Rekonstruktion von detaillierter und dynamischer 3D Geometrie durch optische Merkmale und die Shading-Eigenschaften des Gesichts, mehrschichtige parametrische Rekonstruktion von exakten 3D Modellen mittels inversen Renderings in allgemeinen Szenen und regressionsbasierter 3D Lippenformverfeinerung mittels qualitativ hochwertigen Daten. Zweitens, Fortschritte im Bereich der Computeranimation durch videobasierte Gesichtsausdrucksübertragung und temporaler Clusterbildung, Übertragung von detaillierten Gesichtsmodellen, deren Mundbewegung mit Ton synchronisiert ist, und die automatische Erstellung von personalisierten "3D Face Rigs". Schließlich werden Fortschritte im Bereich der realistischen Videobearbeitung vorgestellt, welche auf der dichten Rekonstruktion von Hautreflektionseigenschaften und der Mundinnenraumsynthese mittels bildbasierten und geometriebasierten Verfahren aufbauen. Qualitativ hochwertige Ergebnisse in anspruchsvollen Anwendungen untermauern die Wichtigkeit der geleisteten Beiträgen und zeigen das große Potential der automatischen Erstellung von realistischen digitalen 3D Gesichtern auf

    Multi-touch Detection and Semantic Response on Non-parametric Rear-projection Surfaces

    Get PDF
    The ability of human beings to physically touch our surroundings has had a profound impact on our daily lives. Young children learn to explore their world by touch; likewise, many simulation and training applications benefit from natural touch interactivity. As a result, modern interfaces supporting touch input are ubiquitous. Typically, such interfaces are implemented on integrated touch-display surfaces with simple geometry that can be mathematically parameterized, such as planar surfaces and spheres; for more complicated non-parametric surfaces, such parameterizations are not available. In this dissertation, we introduce a method for generalizable optical multi-touch detection and semantic response on uninstrumented non-parametric rear-projection surfaces using an infrared-light-based multi-camera multi-projector platform. In this paradigm, touch input allows users to manipulate complex virtual 3D content that is registered to and displayed on a physical 3D object. Detected touches trigger responses with specific semantic meaning in the context of the virtual content, such as animations or audio responses. The broad problem of touch detection and response can be decomposed into three major components: determining if a touch has occurred, determining where a detected touch has occurred, and determining how to respond to a detected touch. Our fundamental contribution is the design and implementation of a relational lookup table architecture that addresses these challenges through the encoding of coordinate relationships among the cameras, the projectors, the physical surface, and the virtual content. Detecting the presence of touch input primarily involves distinguishing between touches (actual contact events) and hovers (near-contact proximity events). We present and evaluate two algorithms for touch detection and localization utilizing the lookup table architecture. One of the algorithms, a bounded plane sweep, is additionally able to estimate hover-surface distances, which we explore for interactions above surfaces. The proposed method is designed to operate with low latency and to be generalizable. We demonstrate touch-based interactions on several physical parametric and non-parametric surfaces, and we evaluate both system accuracy and the accuracy of typical users in touching desired targets on these surfaces. In a formative human-subject study, we examine how touch interactions are used in the context of healthcare and present an exploratory application of this method in patient simulation. A second study highlights the advantages of touch input on content-matched physical surfaces achieved by the proposed approach, such as decreases in induced cognitive load, increases in system usability, and increases in user touch performance. In this experiment, novice users were nearly as accurate when touching targets on a 3D head-shaped surface as when touching targets on a flat surface, and their self-perception of their accuracy was higher

    image analysis and processing with applications in proteomics and medicine

    Get PDF
    This thesis introduces unsupervised image analysis algorithms for the segmentation of several types of images, with an emphasis on proteomics and medical images. Τhe presented algorithms are tailored upon the principles of deformable models and more specific region-based active contours. Two different objectives are pursued. The first is the core issue of unsupervised parameterization in image segmentation, whereas the second is the formulation of a complete model for the segmentation of proteomics images, which is the first to exploit the appealing attributes of active contours. The first major contribution of this thesis is a novel framework for the automated parameterization of region-based active contours. The presented framework aims to endow segmentation results with objectivity and robustness as well as to set domain users free from the cumbersome and time-consuming process of empirical adjustment. It is applicable on various medical imaging modalities and remains insensitive on alterations in the settings of the acquisition devices. The experimental results demonstrate that the presented framework maintains a segmentation quality which is comparable to the one obtained with empirical parameterization. The second major contribution of this thesis is an unsupervised active contour-based model for the segmentation of proteomics images. The presented model copes with crucial issues in 2D-GE image analysis including streaks, artifacts, faint and overlapping spots. In addition, it provides an alternate to the laborious, error-prone process of manual editing, which is required in state-of-the-art 2D-GE image analysis software packages. The experimental results demonstrate that the presented model outperforms 2D-GE image analysis software packages in terms of detection and segmentation quantity metrics
    corecore