2,581 research outputs found

    Computational Multimedia for Video Self Modeling

    Get PDF
    Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras

    Management and display of four-dimensional environmental data sets using McIDAS

    Get PDF
    Over the past four years, great strides have been made in the areas of data management and display of 4-D meteorological data sets. A survey was conducted of available and planned 4-D meteorological data sources. The data types were evaluated for their impact on the data management and display system. The requirements were analyzed for data base management generated by the 4-D data display system. The suitability of the existing data base management procedures and file structure were evaluated in light of the new requirements. Where needed, new data base management tools and file procedures were designed and implemented. The quality of the basic 4-D data sets was assured. The interpolation and extrapolation techniques of the 4-D data were investigated. The 4-D data from various sources were combined to make a uniform and consistent data set for display purposes. Data display software was designed to create abstract line graphic 3-D displays. Realistic shaded 3-D displays were created. Animation routines for these displays were developed in order to produce a dynamic 4-D presentation. A prototype dynamic color stereo workstation was implemented. A computer functional design specification was produced based on interactive studies and user feedback

    Automated Visual Database Creation For A Ground Vehicle Simulator

    Get PDF
    This research focuses on extracting road models from stereo video sequences taken from a moving vehicle. The proposed method combines color histogram based segmentation, active contours (snakes) and morphological processing to extract road boundary coordinates for conversion into Matlab or Multigen OpenFlight compatible polygonal representations. Color segmentation uses an initial truth frame to develop a color probability density function (PDF) of the road versus the terrain. Subsequent frames are segmented using a Maximum Apostiori Probability (MAP) criteria and the resulting templates are used to update the PDFs. Color segmentation worked well where there was minimal shadowing and occlusion by other cars. A snake algorithm was used to find the road edges which were converted to 3D coordinates using stereo disparity and vehicle position information. The resulting 3D road models were accurate to within 1 meter

    Real-time human action and gesture recognition using skeleton joints information towards medical applications

    Full text link
    Des efforts importants ont Ă©tĂ© faits pour amĂ©liorer la prĂ©cision de la dĂ©tection des actions humaines Ă  l’aide des articulations du squelette. DĂ©terminer les actions dans un environnement bruyant reste une tĂąche difficile, car les coordonnĂ©es cartĂ©siennes des articulations du squelette fournies par la camĂ©ra de dĂ©tection Ă  profondeur dĂ©pendent de la position de la camĂ©ra et de la position du squelette. Dans certaines applications d’interaction homme-machine, la position du squelette et la position de la camĂ©ra ne cessent de changer. La mĂ©thode proposĂ©e recommande d’utiliser des valeurs de position relatives plutĂŽt que des valeurs de coordonnĂ©es cartĂ©siennes rĂ©elles. Les rĂ©cents progrĂšs des rĂ©seaux de neurones Ă  convolution (RNC) nous aident Ă  obtenir une plus grande prĂ©cision de prĂ©diction en utilisant des entrĂ©es sous forme d’images. Pour reprĂ©senter les articulations du squelette sous forme d’image, nous devons reprĂ©senter les informations du squelette sous forme de matrice avec une hauteur et une largeur Ă©gale. Le nombre d’articulations du squelette fournit par certaines camĂ©ras de dĂ©tection Ă  profondeur est limitĂ©, et nous devons dĂ©pendre des valeurs de position relatives pour avoir une reprĂ©sentation matricielle des articulations du squelette. Avec la nouvelle reprĂ©sentation des articulations du squelette et le jeu de donnĂ©es MSR, nous pouvons obtenir des performances semblables Ă  celles de l’état de l’art. Nous avons utilisĂ© le dĂ©calage d’image au lieu de l’interpolation entre les images, ce qui nous aide Ă©galement Ă  obtenir des performances similaires Ă  celle de l’état de l’art.There have been significant efforts in the direction of improving accuracy in detecting human action using skeleton joints. Recognizing human activities in a noisy environment is still challenging since the cartesian coordinate of the skeleton joints provided by depth camera depends on camera position and skeleton position. In a few of the human-computer interaction applications, skeleton position, and camera position keep changing. The proposed method recommends using relative positional values instead of actual cartesian coordinate values. Recent advancements in CNN help us to achieve higher prediction accuracy using input in image format. To represent skeleton joints in image format, we need to represent skeleton information in matrix form with equal height and width. With some depth cameras, the number of skeleton joints provided is limited, and we need to depend on relative positional values to have a matrix representation of skeleton joints. We can show the state-of-the-art prediction accuracy on MSR data with the help of the new representation of skeleton joints. We have used frames shifting instead of interpolation between frames, which helps us achieve state-of-the-art performance

    Arabic cursive text recognition from natural scene images

    Full text link
    © 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

    Dowry Towns of Bohemian Queens - web-based 3D model viewer

    Get PDF
    TĂĄto prĂĄca sa pozerĂĄ na moĆŸnosti zobrazovania obsahu rozĆĄĂ­renej reality na mobilnĂœch zariadeniach v prostredĂ­ webovĂ©ho prehliadača. To znamenĂĄ dostupnosĆ„ len obmedzenĂœch hardwarovĂœch i softwarovĂœch prostriedkov. Je implementovanĂœ algoritmus ORB na detekciu obrazovĂœch prĂ­znakov a optimalizovanĂœ pomocou paralelizovanĂ©ho spracovania obrazu pomocou kniĆŸnice GPU.js. Bolo preskĂșmanĂœch niekoÄŸko algoritmov pre odhad polohy kamery a vĂœpočet projekčnej matice. 3D modely sĂș zobrazenĂ© na obraze z kamery a sĂș preskĂșmavanĂ© moĆŸnosti ich zobrazovania tak aby sa zhodovali s orientĂĄciou scĂ©ny. Na konci sĂș popĂ­sanĂ© scenĂĄre testujĂșce jednoduchosĆ„ pouĆŸĂ­vania ovlĂĄdania vĂœslednĂ©ho widgetu.This thesis looks into the possibilities of displaying augmented reality content on mobile devices in the web browser environment. This means there are limited hardware and software resources available. The ORB algorithm for image features detection is implemented from scratch and optimized with the use of parallelized image processing using the GPU.js library. Several algorithms for pose estimation and projection matrix generation were collected and examined. 3D models are displayed over the camera feed and measures are examined to project it to match the scene's orientation. There are test scenarios described checking whether the controls of the resulting widget is easy to use
    • 

    corecore