136 research outputs found

    Exploring Super-Resolution for Face Recognition

    Get PDF
    Biometric recognition is part of many aspects of modern society. With the popularization of smartphones, facial recognition gains space in this environment of biometric technologies. With the diversity of image capture devices, of different brands and qualities, the images will not always be in the ideal standard to be recognized. This article tests and compares different scenarios and situations to assess the results obtained by facial recognition in different environments. For this, the quantity method of data analysis was used. In the first scenario, all images were submitted without changes. In the following, we have the reduction of image resolution, which may or may not be followed by enlargement to the original resolution via bicubic interpolation or through the Image Super-Resolution algorithm, these images can be all, or only that undergo tests. Results indicate that the first scenario obtained the best performance, followed by only the tests images change. The worst performance occurs where the properties of all images are affected. In situations where there is a reduction and enlargement are optional, the enlargement option performs better, so the bicubic enlargement has an advantage over the ISR, the situation in which only the reduction occurs has the worst performance.Biometric recognition is part of many aspects of modern society. With the popularization of smartphones, facial recognition gains space in this environment of biometric technologies. With the diversity of image capture devices, of different brands and qualities, the images will not always be in the ideal standard to be recognized. This article tests and compares different scenarios and situations to assess the results obtained by facial recognition in different environments. For this, the quantity method of data analysis was used. In the first scenario, all images were submitted without changes. In the following, we have the reduction of image resolution, which may or may not be followed by enlargement to the original resolution via bicubic interpolation or through the Image Super-Resolution algorithm, these images can be all, or only that undergo tests. Results indicate that the first scenario obtained the best performance, followed by only the tests images change. The worst performance occurs where the properties of all images are affected. In situations where there is a reduction and enlargement are optional, the enlargement option performs better, so the bicubic enlargement has an advantage over the ISR, the situation in which only the reduction occurs has the worst performance

    Reconocimiento de rostros en tiempo real sobre dispositivos móviles de bajo costo

    Get PDF
    Some of the most recognized face recognition methods are tested to determine their usefulness in the construction of real-time mobile applications, intended to a low-cost mobile market. To this end, a brief description of the main algorithms used in face recognition applications is made. It is shown how face detection phase is vital in terms of performance on these devices. It is also demonstrated the impossibility of performing the processing of each frame of a video stream, which runs at a rate of 30 frames per second, using the considered methods.Se prueban algunos de los métodos más conocidos de reconocimiento de rostros, para determinar su utilidad real en la construcción de aplicaciones en tiempo real que puedan ejecutarse sobre un dispositivo móvil de bajo costo. Con este fin, se realiza una breve descripción de los principales algoritmos utilizados en aplicaciones de reconocimiento de rostros y se muestra cómo la fase de detección de rostros es de vital importancia en cuanto a desempeño se refiere en estos dispositivos. Se demuestra además la imposibilidad de realizar el procesamiento de cada frame de un stream de video, a una rata de 30 frames por segundo, con los métodos revisados

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Advancements in multi-view processing for reconstruction, registration and visualization.

    Get PDF
    The ever-increasing diffusion of digital cameras and the advancements in computer vision, image processing and storage capabilities have lead, in the latest years, to the wide diffusion of digital image collections. A set of digital images is usually referred as a multi-view images set when the pictures cover different views of the same physical object or location. In multi-view datasets, correlations between images are exploited in many different ways to increase our capability to gather enhanced understanding and information on a scene. For example, a collection can be enhanced leveraging on the camera position and orientation, or with information about the 3D structure of the scene. The range of applications of multi-view data is really wide, encompassing diverse fields such as image-based reconstruction, image-based localization, navigation of virtual environments, collective photographic retouching, computational photography, object recognition, etc. For all these reasons, the development of new algorithms to effectively create, process, and visualize this type of data is an active research trend. The thesis will present four different advancements related to different aspects of the multi-view data processing: - Image-based 3D reconstruction: we present a pre-processing algorithm, that is a special color-to-gray conversion. This was developed with the aim to improve the accuracy of image-based reconstruction algorithms. In particular, we show how different dense stereo matching results can be enhanced by application of a domain separation approach that pre-computes a single optimized numerical value for each image location. - Image-based appearance reconstruction: we present a multi-view processing algorithm, this can enhance the quality of the color transfer from multi-view images to a geo-referenced 3D model of a location of interest. The proposed approach computes virtual shadows and allows to automatically segment shadowed regions from the input images preventing to use those pixels in subsequent texture synthesis. - 2D to 3D registration: we present an unsupervised localization and registration system. This system can recognize a site that has been framed in a multi-view data and calibrate it on a pre-existing 3D representation. The system has a very high accuracy and it can validate the result in a completely unsupervised manner. The system accuracy is enough to seamlessly view input images correctly super-imposed on the 3D location of interest. - Visualization: we present PhotoCloud, a real-time client-server system for interactive exploration of high resolution 3D models and up to several thousand photographs aligned over this 3D data. PhotoCloud supports any 3D models that can be rendered in a depth-coherent way and arbitrary multi-view image collections. Moreover, it tolerates 2D-to-2D and 2D-to-3D misalignments, and it provides scalable visualization of generic integrated 2D and 3D datasets by exploiting data duality. A set of effective 3D navigation controls, tightly integrated with innovative thumbnail bars, enhances the user navigation. These advancements have been developed in tourism and cultural heritage application contexts, but they are not limited to these

    Deliverable D1.1 State of the art and requirements analysis for hypervideo

    Get PDF
    This deliverable presents a state-of-art and requirements analysis report for hypervideo authored as part of the WP1 of the LinkedTV project. Initially, we present some use-case (viewers) scenarios in the LinkedTV project and through the analysis of the distinctive needs and demands of each scenario we point out the technical requirements from a user-side perspective. Subsequently we study methods for the automatic and semi-automatic decomposition of the audiovisual content in order to effectively support the annotation process. Considering that the multimedia content comprises of different types of information, i.e., visual, textual and audio, we report various methods for the analysis of these three different streams. Finally we present various annotation tools which could integrate the developed analysis results so as to effectively support users (video producers) in the semi-automatic linking of hypervideo content, and based on them we report on the initial progress in building the LinkedTV annotation tool. For each one of the different classes of techniques being discussed in the deliverable we present the evaluation results from the application of one such method of the literature to a dataset well-suited to the needs of the LinkedTV project, and we indicate the future technical requirements that should be addressed in order to achieve higher levels of performance (e.g., in terms of accuracy and time-efficiency), as necessary

    Two and three dimensional segmentation of multimodal imagery

    Get PDF
    The role of segmentation in the realms of image understanding/analysis, computer vision, pattern recognition, remote sensing and medical imaging in recent years has been significantly augmented due to accelerated scientific advances made in the acquisition of image data. This low-level analysis protocol is critical to numerous applications, with the primary goal of expediting and improving the effectiveness of subsequent high-level operations by providing a condensed and pertinent representation of image information. In this research, we propose a novel unsupervised segmentation framework for facilitating meaningful segregation of 2-D/3-D image data across multiple modalities (color, remote-sensing and biomedical imaging) into non-overlapping partitions using several spatial-spectral attributes. Initially, our framework exploits the information obtained from detecting edges inherent in the data. To this effect, by using a vector gradient detection technique, pixels without edges are grouped and individually labeled to partition some initial portion of the input image content. Pixels that contain higher gradient densities are included by the dynamic generation of segments as the algorithm progresses to generate an initial region map. Subsequently, texture modeling is performed and the obtained gradient, texture and intensity information along with the aforementioned initial partition map are used to perform a multivariate refinement procedure, to fuse groups with similar characteristics yielding the final output segmentation. Experimental results obtained in comparison to published/state-of the-art segmentation techniques for color as well as multi/hyperspectral imagery, demonstrate the advantages of the proposed method. Furthermore, for the purpose of achieving improved computational efficiency we propose an extension of the aforestated methodology in a multi-resolution framework, demonstrated on color images. Finally, this research also encompasses a 3-D extension of the aforementioned algorithm demonstrated on medical (Magnetic Resonance Imaging / Computed Tomography) volumes
    corecore