39 research outputs found

    Learning from Multiple Sources for Video Summarisation

    Get PDF
    Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagerybased features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. On the other hand, non-visual data sources such as weather reports and traffic sensory signals are readily accessible but are not explored jointly to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whist associating these heterogeneous data sources, and derive effective mechanism to tolerate with missing and incomplete data from different sources. We show that the proposed multi-source learning framework not only achieves better video content clustering than state-of-the-art methods, but also is capable of accurately inferring missing non-visual semantics from previously unseen videos. In addition, a comprehensive user study is conducted to validate the quality of video summarisation generated using the proposed multi-source model

    Representations and representation learning for image aesthetics prediction and image enhancement

    Get PDF
    With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement

    Visual complexity in human-machine interaction = Visuelle KomplexitÀt in der Mensch-Maschine Interaktion

    Get PDF
    Visuelle KomplexitĂ€t wird oft als der Grad an Detail oder Verworrenheit in einem Bild definiert (Snodgrass & Vanderwart, 1980). Diese hat Einfluss auf viele Bereiche des menschlichen Lebens, darunter auch solche, die die Interaktion mit Technologie invol-vieren. So wurden Effekte visueller KomplexitĂ€t etwa im Straßenverkehr (Edquist et al., 2012; Mace & Pollack, 1983) oder bei der Interaktion mit Software (Alemerien & Magel, 2014) oder Webseiten (Deng & Poole, 2010; Tuch et al., 2011) nachgewie-sen. Obwohl die Erforschung visueller KomplexitĂ€t bereits bis auf die Gestaltpsycho-logen zurĂŒckgeht, welche etwa mit dem Gestaltprinzip der PrĂ€gnanz die Bedeutung von SimplizitĂ€t und KomplexitĂ€t im Wahrnehmungsprozess verankerten (Koffka, 1935; Wertheimer, 1923), sind weder die Einflussfaktoren visueller KomplexitĂ€t, noch die ZusammenhĂ€nge mit Blickbewegungen oder mentaler Beanspruchung bisher ab-schließend erforscht. Diese Punkte adressiert die vorliegende Arbeit mithilfe von vier empirischen Forschungsarbeiten. In Studie 1 wird anhand der KomplexitĂ€t von Videos in Leitwarten sowie der Effekte auf subjektive, physiologische und Leistungsparameter mentaler Beanspruchung die Bedeutung des Konstruktes im Bereich der Mensch-Maschine Interaktion untersucht. Studie 2 betrachtet die dimensionale Struktur und die Bedeutung verschiedener Ein-flussfaktoren visueller KomplexitĂ€t genauer, wobei unterschiedliches Stimulusmaterial genutzt wird. In Studie 3 werden mithilfe eines experimentellen Ansatzes die Auswir-kungen von Einflussfaktoren visueller KomplexitĂ€t auf subjektive Bewertungen sowie eine Auswahl okularer Parameter untersucht. Als Stimuli dienen dabei einfache, schwarz-weiße Formenmuster. Zudem werden verschiedene computationale und oku-lare Parameter genutzt, um anhand dieser KomplexitĂ€tsbewertungen vorherzusagen. Dieser Ansatz wird in Studie 4 auf Screenshots von Webseiten ĂŒbertragen, um die Aussagekraft in einem anwendungsnahen Bereich zu untersuchen. Neben vorangegangenen Forschungsarbeiten legen insbesondere die gefundenen ZusammenhĂ€nge mit mentaler Beanspruchung nahe, dass visuelle KomplexitĂ€t ein relevantes Konstrukt im Bereich der Mensch-Maschine Interaktion darstellt. Dabei haben insbesondere quantitative und strukturelle, aber potentiell auch weitere Aspekte Einfluss auf die Bewertung visueller KomplexitĂ€t sowie auf das Blickverhalten der Be-trachter. Die gewonnenen Ergebnisse erlauben darĂŒber hinaus RĂŒckschlĂŒsse auf die ZusammenhĂ€nge mit computationalen Maßen, welche in Kombination mit okularen Parametern gut fĂŒr die Vorhersage von KomplexitĂ€tsbewertungen geeignet sind. Die Erkenntnisse aus den durchgefĂŒhrten Studien werden im Kontext vorheriger For-schungsarbeiten diskutiert. Daraus wird ein integratives Forschungsmodell visueller KomplexitĂ€t in der Mensch-Maschine-Interaktion abgeleitet

    Robust object detection under partial occlusion

    Full text link
    This thesis focuses on the problem of object detection under partial occlusion in complex scenes through exploring new bottom-up and top-down detection models to cope with object discontinuities and ambiguity caused by partial occlusion and allow for a more robust and adaptive detection of varied objects from different scenes

    Finding Objects of Interest in Images using Saliency and Superpixels

    Get PDF
    The ability to automatically find objects of interest in images is useful in the areas of compression, indexing and retrieval, re-targeting, and so on. There are two classes of such algorithms – those that find any object of interest with no prior knowledge, independent of the task, and those that find specific objects of interest known a priori. The former class of algorithms tries to detect objects in images that stand-out, i.e. are salient, by virtue of being different from the rest of the image and consequently capture our attention. The detection is generic in this case as there is no specific object we are trying to locate. The latter class of algorithms detects specific known objects of interest and often requires training using features extracted from known examples. In this thesis we address various aspects of finding objects of interest under the topics of saliency detection and object detection. We present two saliency detection algorithms that rely on the principle of center-surround contrast. These two algorithms are shown to be superior to several state-of-the-art techniques in terms of precision and recall measures with respect to a ground truth. They output full-resolution saliency maps, are simpler to implement, and are computationally more efficient than most existing algorithms. We further establish the relevance of our saliency detection algorithms by using them for the known applications of object segmentation and image re-targeting. We first present three different techniques for salient object segmentation using our saliency maps that are based on clustering, graph-cuts, and geodesic distance based labeling. We then demonstrate the use of our saliency maps for a popular technique of content-aware image resizing and compare the result with that of existing methods. Our saliency maps prove to be a much more effective replacement for conventional gradient maps for providing automatic content-awareness. Just as it is important to find regions of interest in images, it is also important to find interesting images within a large collection of images. We therefore extend the notion of saliency detection in images to image databases. We propose an algorithm for finding salient images in a database. Apart from finding such images we also present two novel techniques for creating visually appealing summaries in the form of collages and mosaics. Finally, we address the problem of finding specific known objects of interest in images. Specifically, we deal with the feature extraction step that is a pre-requisite for any technique in this domain. In this context, we first present a superpixel segmentation algorithm that outperforms previous algorithms in terms quantitative measures of under-segmentation error and boundary recall. Our superpixel segmentation algorithm also offers several other advantages over existing algorithms like compactness, uniform size, control on the number of superpixels, and computational efficiency. We prove the effectiveness of our superpixels by deploying them in existing algorithms, specifically, an object class detection technique and a graph based algorithm, and improving their performance. We also present the result of using our superpixels in a technique for detecting mitochondria in noisy medical images

    Text–to–Video: Image Semantics and NLP

    Get PDF
    When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die grĂ¶ĂŸte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. DarĂŒber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter ĂŒbertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische LĂŒcke zu schließen und eine semantisch nahe Übersetzung von natĂŒrlicher Sprache in eine entsprechend sinngemĂ€ĂŸe visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfĂŒgbaren Daten gefĂŒhrt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknĂŒpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekrĂ€ftigen Daten zur VerfĂŒgung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, Ă€sthetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer ĂŒber visuelle DomĂ€nen hinweg und identifiziert verschiedene Bedeutungen fĂŒr mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive Ă€sthetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt Ă€sthetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse ĂŒber den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsĂ€chlich die emotionale Wahrnehmung zu verĂ€ndern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natĂŒrlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen fĂŒr einfache Textteile sowie fĂŒr komplette HandlungsstrĂ€nge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der AbhĂ€ngigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohĂ€rente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    UnterstĂŒtzung des Editierens von Graphen in Visuellen ReprĂ€sentationen

    Get PDF
    The goal of this thesis is to provide solutions for supporting the direct editing of graphs in visual representations for analyzing graphs. For that, a conceptual view on the user's tasks is established first. On this basis, several novel approaches to "visually edit" the different data aspects of graphs - the graph's structure and associated attribute values - are introduced. Thereby, different visual graph representations suitable for communicating the data are considered.Das Ziel der vorliegenden Dissertation ist, Lösungen zur UnterstĂŒtzung des direkten Editierens von Graphen in visuellen ReprĂ€sentationen zur Analyse von Graphen bereitzustellen. DafĂŒr wird zunĂ€chst eine konzeptuelle Sicht auf die Aufgaben des Nutzers entwickelt. Auf dieser Basis werden anschließend mehrere neue Verfahren eingefĂŒhrt, welche das "visuelle Editieren" der verschiedenen Datenaspekte von Graphen - der Struktur sowie dazu assoziierte Attributwerte - ermöglichen. Dabei werden verschiedene visuelle GraphreprĂ€sentationen berĂŒcksichtigt, welche die Daten in geeigneter Form kommunizieren
    corecore