3,661 research outputs found
ESTIMASI POSE KEPALA MENGGUNAKAN HISTOGRAM OF ORIENTED GRADIENTS DAN MULTICLASS SUPPORT VECTOR MACHINE
Pose kepala mengindikasi serta memvisualisasi seseorang akan atensi dan ketertarikan akan sesuatu, hal itu memainkan peranan penting di berbagai macam aplikasi. Dengan banyaknya jumlah kelas dari pose kepala membuat tugas dalam mengestimasi ini merupakan tugas yang sulit. Dalam penelitian ini metode yang digunakan dalam mengestimasi pose kepala adalah Histotram of Oriented gradients dan Multiclass Support Vector Machine. Histogram of Oriented gradient digunakan sebagai ektrasi fitur kepada kepala gambar yang akan diestimasi menggunakan fungsi dalam OpenCV dan Multiclass Support Vector Machine dijalankan sebagai pengestimasi pose kepala menggunakan fungsi dari Scikit-learn. Data penelitian yang digunakan adalah Head pose Image database dari INRIA Rhône-Alpes 2004. Database memiliki jumlah gambar sebanyak 2790 buah yang mana akan dibagi menjadi 93 kelas dan menghasilkan 30 gambar pose per kelas yang akan digunakan untuk train dan test. pengujian dilakukan dengan cross validation sebanyak 5-folds dengan rerata akurasi yang didapat adalah 22,5% dan rerata dari fi-score (0,21), precision (0,23), dan recall (0,22).
----
The head pose indicates and visualization a person at attention and interest. It plays an important role in various applications. With the large number of classes of head poses makes the estimation task is a quite difficult. In this research the method uses in head pose estimation is Histogram of Oriented gradients and Support Vector Machine. Histogram of Oriented gradients is uses as feature extraction to the head image to be estimatied using functions in OpenCV, then Multiclass Support Vector Machine is employe as estimating head pose using function in Scikit-learn. research data used is Head pose database INRIA Rhône-Alpes 2004, database has a total of 2790 images which one will divided into 93 classes for head poses produce 30 images per class and used for train and test. Testing with 5-folds cross validation average accuracy is 22,5% with averae of fi-score (0,21), precision (0,23), and recall (0,22)
Linguistics As Structure In Computer Animation: Toward A More Effective Synthesis Of Brow Motion In American Sign Language
Computer-generated three-dimensional animation holds great promise for synthesizing utterances in American Sign Language (ASL) that are not only grammatical, but well tolerated by members of the Deaf community. Unfortunately, animation poses several challenges stemming from the necessity of grappling with massive amounts of data. However, the linguistics of ASL can aid in surmounting the challenge by providing structure and rules for organizing animation data. An exploration of the linguistic and extra linguistic behavior of the brows from an animator’s viewpoint yields a new approach for synthesizing nonmanuals that differs from the conventional animation of anatomy and instead offers a different approach for animating the effects of interacting levels of linguistic function. Results of formal testing with Deaf users have indicated that this is a promising approach
Deliverable D4.7 Evaluation and final results
This deliverable covers all the aspects of evaluation of the overall LinkedTV personalization workflow, as well as re-evaluations of techniques where newer technology and / or algorithmic capacity offer new insight into the general performance. The implicit contextualized personalization workflow, the implicit uncontextualized workflow in the premises of the final LinkedTV application, the advances in context tracking given new technologies emerged and the outlook of video recommendation beyond LinkedTV is measured and analyzed in this document
Merging Augmented Reality with Television Shows to Enhance the Viewer Experience
Nowadays, television no longer has the same effect on viewers as it had decades ago. The “traditional” television has been losing audience over the years in favor of new technologies. The time that was formerly spent watching televi-sion, was replaced by smartphones and tablets, where the viewer has the oppor-tunity to interact with the content that is provided to him, receiving stimuli that television cannot offer on its own. More and more people are looking for new ways to socialize and interact outside the space they are confined to, in order to discuss certain topics and watch videos, or images published by others. This makes the concept of watching television, just for the pleasure of watching, an old-fashioned concept that needs to be adapted to the modern times. This thesis aims to introduce innovative concepts of interactivity in television contexts, and to achieve it, we will explore the possibility of integrating augmented reality (AR) concepts with television shows to enhance the viewer experience. By using AR, we can view objects and information that otherwise would not be possible, simply because they do not exist in our reality or the original movie. This technology is earning an important role in our day-to-day activities, namely in the entertain-ment area. Our goal is to allow viewers to watch and interact with TV shows through a mobile device and use AR elements to present important information and amusing effects by overlaying the video content. With this approach, we hope to introduce a new way of interacting with TV shows so that we can meet the expectations of a new generation of audiences. Taking into account the results we had, this concept can be considered a success and can possibly be one of the next steps in TV show user interaction
Recommended from our members
Correlating Visual Speaker Gestures with Measures of Audience Engagement to Aid Video Browsing
In this thesis, we argue that in the domains of educational lectures and political debates, speaker gestures can be a source of semantic cues for video browsing. We hypothesize that certain human gestures, which can be automatically identified through techniques of computer vision, can convey significant information that are correlated to audience engagement. We present a joint-angle descriptor derived from an automatic upper body pose estimation framework to train an SVM which identifies point and spread poses in extracted video frames of an instructor giving a lecture. Ground-truth is collected in the form of 2500 manually annotated frames covering 20 minutes of a video lecture. Cross validation on the ground-truth data showed classifier F-scores of 0.54 and 0.39 for point and spread poses, respectively. We also derive an attribute for gestures which measures the angular variance of the arm movements from this system (analogous to arm waving). We present a method for tracking hands which succeeds even when left and right hands are clasping and occluding each other. We evaluate on a ground-truth dataset of 698 images with 1301 annotated left and right hands, mostly clasped. Our method performs better than baseline on recall (0.66 vs. 0.53) without sacrificing precision (0.65 for both) toward the goal of recognizing clasped hands. For tracking, it results in an improvement over a baseline method with an F-score of 0.59 vs. 0.48. From this, we are able to derive hand motion-based gesture attributes such as velocity, direction change and extremal pose. In ground-truth studies, we manually annotate and analyze the gestures of two instructors, each in a 75-minute computer science lecture using a 14-bit pose vector. We observe "pedagogical" gestures of punctuation and encouragement in addition to traditional classes of gestures such as deictic and metaphoric. We also introduce a tool to facilitate the manual annotations of gestures in video and present results on their frequencies and co-occurrences. In particular, we find that 5 poses represent 80% of the variation in the annotated ground truth. We demonstrate a correlation between the angular variance of arm movements and the presence of those conjunctions that are used to contrast connected clauses ("but", "neither", etc.) in the accompanying speech. We do this by training an AdaBoost-based binary classifier using decision trees as weak learners. On a ground-truth database of 4243 video clips totaling 3.83 hours, each with subtitles, training on sets of conjunctions indicating contrast produces classifiers capable of achieving 55% accuracy on a balanced test set. We study two different presentation methods: an attribute graph which shows a normalized measure of the visual attributes across an entire video, as well as emphasized subtitles, where individual words are emphasized (resized) based on their accompanying gestures. Results from 12 subjects show supportive ratings given for the browsing aids in the task of providing keywords for video under time constraints. Subjects' keywords are also compared to independent ground-truth, resulting in precisions from 0.50-0.55, even when given less than half real time to view the video. We demonstrate a correlation between gesture attributes and a rigorous method of measuring audience engagement: electroencephalography (EEG). Our 20 subjects watch 61 minutes of video of the 2012 U.S. Presidential Debates while under observation through EEG. After discarding corrupted recordings, we retain 47 minutes worth of EEG data for each subject. The subjects are examined in aggregate and in subgroups according to gender and political affiliation. We find statistically significant correlations between gesture attributes (particularly extremal pose) and our feature of engagement derived from EEG. For all subjects watching all videos, we see a statistically significant correlation between gesture and engagement with a Spearman rank correlation of rho = 0.098 with p < 0.05, Bonferroni corrected. For some stratifications, correlations reach as high as rho = 0.297. From these results, we conclude what gestures can be used to measure engagement
Analysis and Construction of Engaging Facial Forms and Expressions: Interdisciplinary Approaches from Art, Anatomy, Engineering, Cultural Studies, and Psychology
The topic of this dissertation is the anatomical, psychological, and cultural examination of a human face in order to effectively construct an anatomy-driven 3D virtual face customization and action model. In order to gain a broad perspective of all aspects of a face, theories and methodology from the fields of art, engineering, anatomy, psychology, and cultural studies have been analyzed and implemented. The computer generated facial customization and action model were designed based on the collected data. Using this customization system, culturally-specific attractive face in Korean popular culture, “kot-mi-nam (flower-like beautiful guy),” was modeled and analyzed as a case study. The “kot-mi-nam” phenomenon is overviewed in textual, visual, and contextual aspects, which reveals the gender- and sexuality-fluidity of its masculinity. The analysis and the actual development of the model organically co-construct each other requiring an interwoven process. Chapter 1 introduces anatomical studies of a human face, psychological theories of face recognition and an attractive face, and state-of-the-art face construction projects in the various fields. Chapter 2 and 3 present the Bezier curve-based 3D facial customization (BCFC) and Multi-layered Facial Action Model (MFAF) based on the analysis of human anatomy, to achieve a cost-effective yet realistic quality of facial animation without using 3D scanned data. In the experiments, results for the facial customization for gender, race, fat, and age showed that BCFC achieved enhanced performance of 25.20% compared to existing program Facegen , and 44.12% compared to Facial Studio. The experimental results also proved the realistic quality and effectiveness of MFAM compared with blend shape technique by enhancing 2.87% and 0.03% of facial area for happiness and anger expressions per second, respectively. In Chapter 4, according to the analysis based on BCFC, the 3D face of an average kot-mi-nam is close to gender neutral (male: 50.38%, female: 49.62%), and Caucasian (66.42-66.40%). Culturally-specific images can be misinterpreted in different cultures, due to their different languages, histories, and contexts. This research demonstrates that facial images can be affected by the cultural tastes of the makers and can also be interpreted differently by viewers in different cultures
- …