102 research outputs found

    A simplified and novel technique to retrieve color images from hand-drawn sketch by human

    Get PDF
    With the increasing adoption of human-computer interaction, there is a growing trend of extracting the image through hand-drawn sketches by humans to find out correlated objects from the storage unit. A review of the existing system shows the dominant use of sophisticated and complex mechanisms where the focus is more on accuracy and less on system efficiency. Hence, this proposed system introduces a simplified extraction of the related image using an attribution clustering process and a cost-effective training scheme. The proposed method uses K-means clustering and bag-of-attributes to extract essential information from the sketch. The proposed system also introduces a unique indexing scheme that makes the retrieval process faster and results in retrieving the highest-ranked images. Implemented in MATLAB, the study outcome shows the proposed system offers better accuracy and processing time than the existing feature extraction technique

    IEEE Access Special Section Editorial: Big Data Learning and Discovery

    Full text link

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Recognizing Facial Sketches by Generating Photorealistic Faces Guided by Descriptive Attributes

    No full text

    3D Face Modelling, Analysis and Synthesis

    Get PDF
    Human faces have always been of a special interest to researchers in the computer vision and graphics areas. There has been an explosion in the number of studies around accurately modelling, analysing and synthesising realistic faces for various applications. The importance of human faces emerges from the fact that they are invaluable means of effective communication, recognition, behaviour analysis, conveying emotions, etc. Therefore, addressing the automatic visual perception of human faces efficiently could open up many influential applications in various domains, e.g. virtual/augmented reality, computer-aided surgeries, security and surveillance, entertainment, and many more. However, the vast variability associated with the geometry and appearance of human faces captured in unconstrained videos and images renders their automatic analysis and understanding very challenging even today. The primary objective of this thesis is to develop novel methodologies of 3D computer vision for human faces that go beyond the state of the art and achieve unprecedented quality and robustness. In more detail, this thesis advances the state of the art in 3D facial shape reconstruction and tracking, fine-grained 3D facial motion estimation, expression recognition and facial synthesis with the aid of 3D face modelling. We give a special attention to the case where the input comes from monocular imagery data captured under uncontrolled settings, a.k.a. \textit{in-the-wild} data. This kind of data are available in abundance nowadays on the internet. Analysing these data pushes the boundaries of currently available computer vision algorithms and opens up many new crucial applications in the industry. We define the four targeted vision problems (3D facial reconstruction &\& tracking, fine-grained 3D facial motion estimation, expression recognition, facial synthesis) in this thesis as the four 3D-based essential systems for the automatic facial behaviour understanding and show how they rely on each other. Finally, to aid the research conducted in this thesis, we collect and annotate a large-scale videos dataset of monocular facial performances. All of our proposed methods demonstarte very promising quantitative and qualitative results when compared to the state-of-the-art methods

    Variations and Application Conditions Of the Data Type »Image« - The Foundation of Computational Visualistics

    Get PDF
    Few years ago, the department of computer science of the University Magdeburg invented a completely new diploma programme called 'computational visualistics', a curriculum dealing with all aspects of computational pictures. Only isolated aspects had been studied so far in computer science, particularly in the independent domains of computer graphics, image processing, information visualization, and computer vision. So is there indeed a coherent domain of research behind such a curriculum? The answer to that question depends crucially on a data structure that acts as a mediator between general visualistics and computer science: the data structure "image". The present text investigates that data structure, its components, and its application conditions, and thus elaborates the very foundations of computational visualistics as a unique and homogenous field of research. Before concentrating on that data structure, the theory of pictures in general and the definition of pictures as perceptoid signs in particular are closely examined. This includes an act-theoretic consideration about resemblance as the crucial link between image and object, the communicative function of context building as the central concept for comparing pictures and language, and several modes of reflection underlying the relation between image and image user. In the main chapter, the data structure "image" is extendedly analyzed under the perspectives of syntax, semantics, and pragmatics. While syntactic aspects mostly concern image processing, semantic questions form the core of computer graphics and computer vision. Pragmatic considerations are particularly involved with interactive pictures but also extend to the field of information visualization and even to computer art. Four case studies provide practical applications of various aspects of the analysis

    Actor & Avatar: A Scientific and Artistic Catalog

    Get PDF
    What kind of relationship do we have with artificial beings (avatars, puppets, robots, etc.)? What does it mean to mirror ourselves in them, to perform them or to play trial identity games with them? Actor & Avatar addresses these questions from artistic and scholarly angles. Contributions on the making of "technical others" and philosophical reflections on artificial alterity are flanked by neuroscientific studies on different ways of perceiving living persons and artificial counterparts. The contributors have achieved a successful artistic-scientific collaboration with extensive visual material

    Data-driven modelling of perceptual properties of 3D shapes

    Get PDF
    The recent surge in 3D content generation has led to the evolution of difficult to search, organise and re-use massive online 3D visual content libraries. We explore crowdsourcing and machine learning techniques to help alleviate these difficulties by focusing on the visual perceptual properties of 3D shapes. We study “style similarity” and “aesthetics” as two fundamental perceptual properties of 3D shapes and build data-driven models. We rely on crowdsourcing platforms to collect large number of human judgements on style matching and aesthetics of 3D shapes. The judgement data collected directly from humans is used to learn metrics of style matching and aesthetics. Our style similarity measure can be used to compute style distance between a pair of input 3D shapes. In contrast to previous work, we incorporate colour and texture in addition to geometric features to build a colour and texture aware style similarity metric. We also experiment with learning objective and personalised style metrics 3D shapes. The application prototypes we build demonstrate the use of style based search and scene composition. Further, our style distance metric is built iteratively to consume lesser amount of human style judgement data compared to previous methods. We study the problem of building a data-driven model of 3D shape aesthetics in two steps. We first focus on designing a study to crowdsource human aesthetics judgement data. We then formulate a deep learning based strategy to learn a measure of 3D shape aesthetics from collected data. The results of the study in first step helped us choose an appropriate shape representation i.e. voxels as an input to deep neural networks for learning a measure of visual aesthetics. In the same crowdsourcing study, we experiment with the use of polygonal, volumetric, and point based shape representations to create shape stimuli to collect and compare human shape aesthetics judgements. On analysis of the collected data we found that that humans can reliably distinguish more aesthetic shape in a pair even from coarser shape representations such as voxels. This observation implies that detailed shape representations are not needed to compare aesthetics in pairs. The aesthetic value of a 3D shape has traditionally been explored in terms of specific visual features (or handcrafted features) such as curvature and symmetry. For example, more symmetric and curved shapes are considered aesthetic compared to less curved and symmetric shapes. We call such properties as pre-existing notion (or rules) of aesthetics. In order to develop a measure of perceptual aesthetics of 3D shapes which is independent of any pre-existing notion or shape features, we train deep neural networks directly on human aesthetics judgement data. We demonstrate the usefulness of the learned measure by designing applications to rank a collection of shapes based on their aesthetics scores and interactively build scenes using shapes with high aesthetics scores. The overarching goal of this thesis is to demonstrate the use of machine learning and crowdsourcing approaches to build data-driven models of visual perceptual properties of 3D shapes for applications in search, organisation, scene composition, and visualisation of 3D shape data present in ever increasing online 3D shape content libraries. We believe that our exploration of perceptual properties of 3D shapes will motivate further research by looking into other important perceptual properties related to our vision system and will also fuel development of techniques to automatically enhance such properties of a given 3D shape
    • …
    corecore