1,609 research outputs found

    Object-based 2D-to-3D video conversion for effective stereoscopic content generation in 3D-TV applications

    Get PDF
    Three-dimensional television (3D-TV) has gained increasing popularity in the broadcasting domain, as it enables enhanced viewing experiences in comparison to conventional two-dimensional (2D) TV. However, its application has been constrained due to the lack of essential contents, i.e., stereoscopic videos. To alleviate such content shortage, an economical and practical solution is to reuse the huge media resources that are available in monoscopic 2D and convert them to stereoscopic 3D. Although stereoscopic video can be generated from monoscopic sequences using depth measurements extracted from cues like focus blur, motion and size, the quality of the resulting video may be poor as such measurements are usually arbitrarily defined and appear inconsistent with the real scenes. To help solve this problem, a novel method for object-based stereoscopic video generation is proposed which features i) optical-flow based occlusion reasoning in determining depth ordinal, ii) object segmentation using improved region-growing from masks of determined depth layers, and iii) a hybrid depth estimation scheme using content-based matching (inside a small library of true stereo image pairs) and depth-ordinal based regularization. Comprehensive experiments have validated the effectiveness of our proposed 2D-to-3D conversion method in generating stereoscopic videos of consistent depth measurements for 3D-TV applications

    Low Power Depth Estimation of Rigid Objects for Time-of-Flight Imaging

    Full text link
    Depth sensing is useful in a variety of applications that range from augmented reality to robotics. Time-of-flight (TOF) cameras are appealing because they obtain dense depth measurements with minimal latency. However, for many battery-powered devices, the illumination source of a TOF camera is power hungry and can limit the battery life of the device. To address this issue, we present an algorithm that lowers the power for depth sensing by reducing the usage of the TOF camera and estimating depth maps using concurrently collected images. Our technique also adaptively controls the TOF camera and enables it when an accurate depth map cannot be estimated. To ensure that the overall system power for depth sensing is reduced, we design our algorithm to run on a low power embedded platform, where it outputs 640x480 depth maps at 30 frames per second. We evaluate our approach on several RGB-D datasets, where it produces depth maps with an overall mean relative error of 0.96% and reduces the usage of the TOF camera by 85%. When used with commercial TOF cameras, we estimate that our algorithm can lower the total power for depth sensing by up to 73%

    A depth camera motion analysis framework for tele-rehabilitation : motion capture and person-centric kinematics analysis

    Get PDF
    With increasing importance given to telerehabilitation, there is a growing need for accurate, low-cost, and portable motion capture systems that do not require specialist assessment venues. This paper proposes a novel framework for motion capture using only a single depth camera, which is portable and cost effective compared to most industry-standard optical systems, without compromising on accuracy. Novel signal processing and computer vision algorithms are proposed to determine motion patterns of interest from infrared and depth data. In order to demonstrate the proposed framework’s suitability for rehabilitation, we developed a gait analysis application that depends on the underlying motion capture sub-system. Each subject’s individual kinematics parameters, which are unique to that subject, are calculated and these are stored for monitoring individual progress of the clinical therapy. Experiments were conducted on 14 different subjects, 5 healthy and 9 stroke survivors. The results show very close agreement of the resulting relevant joint angles with a 12-camera based VICON system, a mean error of at most 1.75% in detecting gait events w.r.t the manually generated ground-truth, and signiïŹcant performance improvements in terms of accuracy and execution time compared to a previous Kinect-based system

    Reconhecimento de padrÔes em expressÔes faciais : algoritmos e aplicaçÔes

    Get PDF
    Orientador: HĂ©lio PedriniTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de emoçÔes tem-se tornado um tĂłpico relevante de pesquisa pela comunidade cientĂ­fica, uma vez que desempenha um papel essencial na melhoria contĂ­nua dos sistemas de interação humano-computador. Ele pode ser aplicado em diversas ĂĄreas, tais como medicina, entretenimento, vigilĂąncia, biometria, educação, redes sociais e computação afetiva. HĂĄ alguns desafios em aberto relacionados ao desenvolvimento de sistemas emocionais baseados em expressĂ”es faciais, como dados que refletem emoçÔes mais espontĂąneas e cenĂĄrios reais. Nesta tese de doutorado, apresentamos diferentes metodologias para o desenvolvimento de sistemas de reconhecimento de emoçÔes baseado em expressĂ”es faciais, bem como sua aplicabilidade na resolução de outros problemas semelhantes. A primeira metodologia Ă© apresentada para o reconhecimento de emoçÔes em expressĂ”es faciais ocluĂ­das baseada no Histograma da Transformada Census (CENTRIST). ExpressĂ”es faciais ocluĂ­das sĂŁo reconstruĂ­das usando a AnĂĄlise Robusta de Componentes Principais (RPCA). A extração de caracterĂ­sticas das expressĂ”es faciais Ă© realizada pelo CENTRIST, bem como pelos PadrĂ”es BinĂĄrios Locais (LBP), pela Codificação Local do Gradiente (LGC) e por uma extensĂŁo do LGC. O espaço de caracterĂ­sticas gerado Ă© reduzido aplicando-se a AnĂĄlise de Componentes Principais (PCA) e a AnĂĄlise Discriminante Linear (LDA). Os algoritmos K-Vizinhos mais PrĂłximos (KNN) e MĂĄquinas de Vetores de Suporte (SVM) sĂŁo usados para classificação. O mĂ©todo alcançou taxas de acerto competitivas para expressĂ”es faciais ocluĂ­das e nĂŁo ocluĂ­das. A segunda Ă© proposta para o reconhecimento dinĂąmico de expressĂ”es faciais baseado em Ritmos Visuais (VR) e Imagens da HistĂłria do Movimento (MHI), de modo que uma fusĂŁo de ambos descritores codifique informaçÔes de aparĂȘncia, forma e movimento dos vĂ­deos. Para extração das caracterĂ­sticas, o Descritor Local de Weber (WLD), o CENTRIST, o Histograma de Gradientes Orientados (HOG) e a Matriz de CoocorrĂȘncia em NĂ­vel de Cinza (GLCM) sĂŁo empregados. A abordagem apresenta uma nova proposta para o reconhecimento dinĂąmico de expressĂ”es faciais e uma anĂĄlise da relevĂąncia das partes faciais. A terceira Ă© um mĂ©todo eficaz apresentado para o reconhecimento de emoçÔes audiovisuais com base na fala e nas expressĂ”es faciais. A metodologia envolve uma rede neural hĂ­brida para extrair caracterĂ­sticas visuais e de ĂĄudio dos vĂ­deos. Para extração de ĂĄudio, uma Rede Neural Convolucional (CNN) baseada no log-espectrograma de Mel Ă© usada, enquanto uma CNN construĂ­da sobre a Transformada de Census Ă© empregada para a extração das caracterĂ­sticas visuais. Os atributos audiovisuais sĂŁo reduzidos por PCA e LDA, entĂŁo classificados por KNN, SVM, RegressĂŁo LogĂ­stica (LR) e Gaussian NaĂŻve Bayes (GNB). A abordagem obteve taxas de reconhecimento competitivas, especialmente em dados espontĂąneos. A penĂșltima investiga o problema de detectar a sĂ­ndrome de Down a partir de fotografias. Um descritor geomĂ©trico Ă© proposto para extrair caracterĂ­sticas faciais. Experimentos realizados em uma base de dados pĂșblica mostram a eficĂĄcia da metodologia desenvolvida. A Ășltima metodologia trata do reconhecimento de sĂ­ndromes genĂ©ticas em fotografias. O mĂ©todo visa extrair atributos faciais usando caracterĂ­sticas de uma rede neural profunda e medidas antropomĂ©tricas. Experimentos sĂŁo realizados em uma base de dados pĂșblica, alcançando taxas de reconhecimento competitivasAbstract: Emotion recognition has become a relevant research topic by the scientific community, since it plays an essential role in the continuous improvement of human-computer interaction systems. It can be applied in various areas, for instance, medicine, entertainment, surveillance, biometrics, education, social networks, and affective computing. There are some open challenges related to the development of emotion systems based on facial expressions, such as data that reflect more spontaneous emotions and real scenarios. In this doctoral dissertation, we propose different methodologies to the development of emotion recognition systems based on facial expressions, as well as their applicability in the development of other similar problems. The first is an emotion recognition methodology for occluded facial expressions based on the Census Transform Histogram (CENTRIST). Occluded facial expressions are reconstructed using an algorithm based on Robust Principal Component Analysis (RPCA). Extraction of facial expression features is then performed by CENTRIST, as well as Local Binary Patterns (LBP), Local Gradient Coding (LGC), and an LGC extension. The generated feature space is reduced by applying Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for classification. This method reached competitive accuracy rates for occluded and non-occluded facial expressions. The second proposes a dynamic facial expression recognition based on Visual Rhythms (VR) and Motion History Images (MHI), such that a fusion of both encodes appearance, shape, and motion information of the video sequences. For feature extraction, Weber Local Descriptor (WLD), CENTRIST, Histogram of Oriented Gradients (HOG), and Gray-Level Co-occurrence Matrix (GLCM) are employed. This approach shows a new direction for performing dynamic facial expression recognition, and an analysis of the relevance of facial parts. The third is an effective method for audio-visual emotion recognition based on speech and facial expressions. The methodology involves a hybrid neural network to extract audio and visual features from videos. For audio extraction, a Convolutional Neural Network (CNN) based on log Mel-spectrogram is used, whereas a CNN built on Census Transform is employed for visual extraction. The audio and visual features are reduced by PCA and LDA, and classified through KNN, SVM, Logistic Regression (LR), and Gaussian NaĂŻve Bayes (GNB). This approach achieves competitive recognition rates, especially in a spontaneous data set. The second last investigates the problem of detecting Down syndrome from photographs. A geometric descriptor is proposed to extract facial features. Experiments performed on a public data set show the effectiveness of the developed methodology. The last methodology is about recognizing genetic disorders in photos. This method focuses on extracting facial features using deep features and anthropometric measurements. Experiments are conducted on a public data set, achieving competitive recognition ratesDoutoradoCiĂȘncia da ComputaçãoDoutora em CiĂȘncia da Computação140532/2019-6CNPQCAPE

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Signal Processing and Restoration

    Get PDF
    • 

    corecore