669 research outputs found

    Photorealistic Audio-driven Video Portraits

    Get PDF

    Visualizing Natural Image Statistics

    Get PDF
    Natural image statistics is an important area of research in cognitive sciences and computer vision. Visualization of statistical results can help identify clusters and anomalies as well as analyze deviation, distribution and correlation. Furthermore, they can provide visual abstractions and symbolism for categorized data. In this paper, we begin our study of visualization of image statistics by considering visual representations of power spectra, which are commonly used to visualize different categories of images. We show that they convey a limited amount of statistical information about image categories and their support for analytical tasks is ineffective. We then introduce several new visual representations, which convey different or more information about image statistics. We apply ANOVA to the image statistics to help select statistically more meaningful measurements in our design process. A task-based user evaluation was carried out to compare the new visual representations with the conventional power spectra plots. Based on the results of the evaluation, we made further improvement of visualizations by introducing composite visual representations of image statistics

    Towards Automation and Human Assessment of Objective Skin Quantification

    Get PDF
    The goal of this study is to provide an objective criterion for computerised skin quality assessment. Humans have been impacted by a variety of face features. Utilising eye-tracking technology assists to get a better understanding of human visual behaviour, this research examined the influence of face characteristics on the quantification of skin evaluation and age estimation. The results revealed that when facial features are apparent, individuals do well in age estimation. Also, this research attempts to examine the performance and perception of machine learning algorithms for various skin attributes. Comparison of the traditional machine learning technique to deep learning approaches. Support Vector Machine (SVM) and Convolutional Neural Networks (CNNs) were used to evaluate classification algorithms, with CNNs outperforming SVM. The primary difficulty in training deep learning algorithms is the need of large-scale dataset. This thesis proposed two high-resolution face datasets to address the requirement of face images for research community to study face and skin quality. Additionally, the study of machine-generated skin patches using Generative Adversarial Networks (GANs) is conducted. Dermatologists confirmed the machine-generated images by evaluating the fake and real images. Only 38% accurately predicted the real from fake correctly. Lastly, the performance of human perception and machine algorithm is compared using the heat-map from the eye-tracking experiment and the machine learning prediction on age estimation. The finding indicates that both humans and machines predict in a similar manner

    Deep Architectures for Visual Recognition and Description

    Get PDF
    In recent times, digital media contents are inherently of multimedia type, consisting of the form text, audio, image and video. Several of the outstanding computer Vision (CV) problems are being successfully solved with the help of modern Machine Learning (ML) techniques. Plenty of research work has already been carried out in the field of Automatic Image Annotation (AIA), Image Captioning and Video Tagging. Video Captioning, i.e., automatic description generation from digital video, however, is a different and complex problem altogether. This study compares various existing video captioning approaches available today and attempts their classification and analysis based on different parameters, viz., type of captioning methods (generation/retrieval), type of learning models employed, the desired output description length generated, etc. This dissertation also attempts to critically analyze the existing benchmark datasets used in various video captioning models and the evaluation metrics for assessing the final quality of the resultant video descriptions generated. A detailed study of important existing models, highlighting their comparative advantages as well as disadvantages are also included. In this study a novel approach for video captioning on the Microsoft Video Description (MSVD) dataset and Microsoft Video-to-Text (MSR-VTT) dataset is proposed using supervised learning techniques to train a deep combinational framework, for achieving better quality video captioning via predicting semantic tags. We develop simple shallow CNN (2D and 3D) as feature extractors, Deep Neural Networks (DNNs and Bidirectional LSTMs (BiLSTMs) as tag prediction models and Recurrent Neural Networks (RNNs) (LSTM) model as the language model. The aim of the work was to provide an alternative narrative to generating captions from videos via semantic tag predictions and deploy simpler shallower deep model architectures with lower memory requirements as solution so that it is not very memory extensive and the developed models prove to be stable and viable options when the scale of the data is increased. This study also successfully employed deep architectures like the Convolutional Neural Network (CNN) for speeding up automation process of hand gesture recognition and classification of the sign languages of the Indian classical dance form, ‘Bharatnatyam’. This hand gesture classification is primarily aimed at 1) building a novel dataset of 2D single hand gestures belonging to 27 classes that were collected from (i) Google search engine (Google images), (ii) YouTube videos (dynamic and with background considered) and (iii) professional artists under staged environment constraints (plain backgrounds). 2) exploring the effectiveness of CNNs for identifying and classifying the single hand gestures by optimizing the hyperparameters, and 3) evaluating the impacts of transfer learning and double transfer learning, which is a novel concept explored for achieving higher classification accuracy

    Interpersonal metadiscourse categories in two Egyptian newspapers concerning the 2007 constitutional amendments

    Get PDF
    This research-work aims at studying the use of metadiscourse markers by foreign learners of Arabic to enhance their writing skills. First there is an introduction which explains the role of metadiscourse markers in writing within the framework of the two paradigms of meaning suggested by Halliday, i.e. the textual and the interspersonal. Second two hypotheses on which the research-work is built are spelled out. The first hypothesis assumes that writers will always use the two basic paradigms of meaning suggested by Haliday as far as metadiscourse markers are concerned. The Second one postulates that foreign learners of Arabic acquire solid knowledge of these four categories of markers that fall within the two paradigmatic classifications, their performance in writing will be improved significantly in comparison with those who did not acquire such knowledge. Before examining these two hypotheses the research provides a full-fledged account of four basic types of metadiscourse markers and their equivalents in Arabic. Then it sets out to check the two afore â mentioned hypotheses. In order to verify the two hypotheses mentioned above, two methodologies are used correspondingly to each hypothesis. The first methodology is an empirical one and the second is experimental. The empirical method used in verification of hypothesis one, involves an analysis of a sample of twenty newspaper articles about a given subject representing different styles, cultural backgrounds, personal and political affiliations of writers. In order to verify the second hypothesis, the foreign learners of Arabic will be able to improve their performance in writing significantly by mastering the use of metadiscourse markers, an experimental methodology is applied. Two groups of non-native Arabic learners are selected randomly, one serving as a control class and the other serving as an experimental class. The results show that following a post-test given to both groups, the writing level of the experimental group, who analyzed and learned metadiscourse markers, comes out higher than that of the control group, who did not go through this experience

    Local quality-based matching of faces for watchlist screening applications

    Get PDF
    Video surveillance systems are often exploited by safety organizations for enhanced security and situational awareness. A key application in video surveillance is watchlist screening where target individuals are enrolled to a still-to-video Face Recognition (FR) system using single still images captured a priori under controlled conditions. Watchlist Screening is a very challenging application. Indeed, the latter must provide accurate decisions and timely recognition using limited number of reference faces for the system’s enrolment. This issue is often called the "Single Sample Per Person" (SSPP) problem. Added to that, uncontrolled factors such as variations in illumination pose and occlusion is unpreventable in real case video surveillance which causes the degradation of the FR system’s performance. Another major problem in such applications is the camera interoperability. This means that there is a huge gap between the camera used for taking the still images and the camera used for taking the video surveillance footage in terms of quality and resolution. This issue hinders the classification process then decreases the system‘s performance. Controlled and uniform lighting is indispensable for having good facial captures that contributes in the recognition performance of the system. However, in reality, facial captures are poor in illumination factor and are severely affecting the system’s performance. This is why it is important to implement a FR system which is invariant to illumination changes. The first part of this Thesis consists in investigating different illumination normalization (IN) techniques that are applied at the pre-processing level of the still-to-video FR. Afterwards IN techniques are compared to each other in order to pinpoint the most suitable technique for illumination invariance. In addition, patch-based methods for template matching extracts facial features from different regions which offers more discriminative information and deals with occlusion issues. Thus, local matching is applied for the still-to-video FR system. For that, a profound examination is needed on the manner of applying these IN techniques. Two different approaches were conducted: the global approach which consists in performing IN on the image then performs local matching and the local approach which consists in primarily dividing the images into non overlapping patches then perform on individually on each patch each IN technique. The results obtained after executing these experiments have shown that the Tan and Triggs (TT) and Multi ScaleWeberfaces are likely to offer better illumination invariance for the still-to-video FR system. In addition to that, these outperforming IN techniques applied locally on each patch have shown to improve the performance of the FR compared to the global approach. The performance of a FR system is good when the training data and the operation data are from the same distribution. Unfortunately, in still-to-video FR systems this is not satisfied. The training data are still, high quality, high resolution and frontal images. However, the testing data are video frames, low quality, low resolution and varying head pose images. Thus, the former and the latter do not have the same distribution. To address this domain shift, the second part of this Thesis consists in presenting a new technique of dynamic regional weighting exploiting unsupervised domain adaptation and contextual information based on quality. The main contribution consists in assigning dynamic weights that is specific to a camera domain.This study replaces the static and predefined manner of assigning weights. In order to assess the impact of applying local weights dynamically, results are compared to a baseline (no weights) and static weighting technique. This context based approach has proven to increase the system’s performance compared to the static weighting that is dependent on the dataset and the baseline technique which consists of having no weights. These experiments are conducted and validated using the ChokePoint Dataset. As for the performance of the still-to-video FR system, it is evaluated using performance measures, Receiver operating characteristic (ROC) curve and Precision-Recall (PR) curve analysis

    Understanding the role of visual metaphors in emerging media arts research and practice

    Get PDF
    The concept of metaphor has made significant contributions to expanding our horizons and range of knowledge and experience over the centuries. Using the associations between source and target domains, metaphors enable rich expressions in arts and culture, from literature, painting, and music to user interface design and media arts. Metaphors enable us to integrate disparate entities and bring new perspectives into existence by allowing us to understand and experience one in terms of another. Visual metaphors become significant and effective tools, from visual communication to graphical user interface design. Because visual metaphors are created by transferring some part of the properties of source domain to target domain using various visual grammars and design principles. It enables designers to express and enhance the meaning of the design outcomes, which are the by-product of a metaphorical thinking process. Despite the significance and prevalence of metaphors in the visual domain, previous research has tended to focus on the effectiveness and efficiency that visual metaphors can generate at a surface level. Our understanding of the role and impact of visual metaphor at a conceptual, cognitive level is still limited. Therefore, this dissertation aims to propose a conceptual framework that allows for the creation of a visual metaphor with a better design rationale to transform the significance of a verbal metaphor into a visualization to achieve various design goals. The framework of this study was gradually developed through a series of experimental design projects serving different design goals, such as knowledge representation, aesthetic experiences, and kinesthetic empathy. We further conducted user studies to examine what design considerations are involved and what types of cognitive operations are performed during the transformation process using critical task analysis for validation. With an emphasis on visual metaphor creation, this study also aims to position our work in the field of metaphor research by contributing to a more comprehensive explanation on how a verbal metaphor can be transformed into a visual space. Additionally, this study relates the design theories of metaphor comprehension and generation to the domain of emerging media arts and art and cultural informatics
    • …
    corecore