Search CORE

39 research outputs found

Learning from Multiple Sources for Video Summarisation

Author: Gong S
Loy CC
Zhu X
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/02/2015
Field of study

Many visual surveillance tasks, e.g.video summarisation, is conventionally accomplished through analysing imagerybased features. Relying solely on visual cues for public surveillance video understanding is unreliable, since visual observations obtained from public space CCTV video data are often not sufficiently trustworthy and events of interest can be subtle. On the other hand, non-visual data sources such as weather reports and traffic sensory signals are readily accessible but are not explored jointly to complement visual data for video content analysis and summarisation. In this paper, we present a novel unsupervised framework to learn jointly from both visual and independently-drawn non-visual data sources for discovering meaningful latent structure of surveillance video data. In particular, we investigate ways to cope with discrepant dimension and representation whist associating these heterogeneous data sources, and derive effective mechanism to tolerate with missing and incomplete data from different sources. We show that the proposed multi-source learning framework not only achieves better video content clustering than state-of-the-art methods, but also is capable of accurately inferring missing non-visual semantics from previously unseen videos. In addition, a comprehensive user study is conducted to validate the quality of video summarisation generated using the proposed multi-source model

arXiv.org e-Print Archive

Queen Mary Research Online

Representations and representation learning for image aesthetics prediction and image enhancement

Author: Kucer Michal
Publication venue: RIT Scholar Works
Publication date: 01/04/2020
Field of study

With the continual improvement in cell phone cameras and improvements in the connectivity of mobile devices, we have seen an exponential increase in the images that are captured, stored and shared on social media. For example, as of July 1st 2017 Instagram had over 715 million registered users which had posted just shy of 35 billion images. This represented approximately seven and nine-fold increase in the number of users and photos present on Instagram since 2012. Whether the images are stored on personal computers or reside on social networks (e.g. Instagram, Flickr), the sheer number of images calls for methods to determine various image properties, such as object presence or appeal, for the purpose of automatic image management and curation. One of the central problems in consumer photography centers around determining the aesthetic appeal of an image and motivates us to explore questions related to understanding aesthetic preferences, image enhancement and the possibility of using such models on devices with constrained resources. In this dissertation, we present our work on exploring representations and representation learning approaches for aesthetic inference, composition ranking and its application to image enhancement. Firstly, we discuss early representations that mainly consisted of expert features, and their possibility to enhance Convolutional Neural Networks (CNN). Secondly, we discuss the ability of resource-constrained CNNs, and the different architecture choices (inputs size and layer depth) in solving various aesthetic inference tasks: binary classification, regression, and image cropping. We show that if trained for solving fine-grained aesthetics inference, such models can rival the cropping performance of other aesthetics-based croppers, however they fall short in comparison to models trained for composition ranking. Lastly, we discuss our work on exploring and identifying the design choices in training composition ranking functions, with the goal of using them for image composition enhancement

RIT Scholar Works

Visual complexity in human-machine interaction = Visuelle Komplexität in der Mensch-Maschine Interaktion

Author: Ries Fabian
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 08/03/2021
Field of study

Visuelle Komplexität wird oft als der Grad an Detail oder Verworrenheit in einem Bild definiert (Snodgrass & Vanderwart, 1980). Diese hat Einfluss auf viele Bereiche des menschlichen Lebens, darunter auch solche, die die Interaktion mit Technologie invol-vieren. So wurden Effekte visueller Komplexität etwa im Straßenverkehr (Edquist et al., 2012; Mace & Pollack, 1983) oder bei der Interaktion mit Software (Alemerien & Magel, 2014) oder Webseiten (Deng & Poole, 2010; Tuch et al., 2011) nachgewie-sen. Obwohl die Erforschung visueller Komplexität bereits bis auf die Gestaltpsycho-logen zurückgeht, welche etwa mit dem Gestaltprinzip der Prägnanz die Bedeutung von Simplizität und Komplexität im Wahrnehmungsprozess verankerten (Koffka, 1935; Wertheimer, 1923), sind weder die Einflussfaktoren visueller Komplexität, noch die Zusammenhänge mit Blickbewegungen oder mentaler Beanspruchung bisher ab-schließend erforscht. Diese Punkte adressiert die vorliegende Arbeit mithilfe von vier empirischen Forschungsarbeiten. In Studie 1 wird anhand der Komplexität von Videos in Leitwarten sowie der Effekte auf subjektive, physiologische und Leistungsparameter mentaler Beanspruchung die Bedeutung des Konstruktes im Bereich der Mensch-Maschine Interaktion untersucht. Studie 2 betrachtet die dimensionale Struktur und die Bedeutung verschiedener Ein-flussfaktoren visueller Komplexität genauer, wobei unterschiedliches Stimulusmaterial genutzt wird. In Studie 3 werden mithilfe eines experimentellen Ansatzes die Auswir-kungen von Einflussfaktoren visueller Komplexität auf subjektive Bewertungen sowie eine Auswahl okularer Parameter untersucht. Als Stimuli dienen dabei einfache, schwarz-weiße Formenmuster. Zudem werden verschiedene computationale und oku-lare Parameter genutzt, um anhand dieser Komplexitätsbewertungen vorherzusagen. Dieser Ansatz wird in Studie 4 auf Screenshots von Webseiten übertragen, um die Aussagekraft in einem anwendungsnahen Bereich zu untersuchen. Neben vorangegangenen Forschungsarbeiten legen insbesondere die gefundenen Zusammenhänge mit mentaler Beanspruchung nahe, dass visuelle Komplexität ein relevantes Konstrukt im Bereich der Mensch-Maschine Interaktion darstellt. Dabei haben insbesondere quantitative und strukturelle, aber potentiell auch weitere Aspekte Einfluss auf die Bewertung visueller Komplexität sowie auf das Blickverhalten der Be-trachter. Die gewonnenen Ergebnisse erlauben darüber hinaus Rückschlüsse auf die Zusammenhänge mit computationalen Maßen, welche in Kombination mit okularen Parametern gut für die Vorhersage von Komplexitätsbewertungen geeignet sind. Die Erkenntnisse aus den durchgeführten Studien werden im Kontext vorheriger For-schungsarbeiten diskutiert. Daraus wird ein integratives Forschungsmodell visueller Komplexität in der Mensch-Maschine-Interaktion abgeleitet

KITopen

Robust object detection under partial occlusion

Author: QIU YOUHAI
Publication venue: Deakin University, Institute for Frontier Materials, GTP Research
Publication date: 01/02/2016
Field of study

This thesis focuses on the problem of object detection under partial occlusion in complex scenes through exploring new bottom-up and top-down detection models to cope with object discontinuities and ambiguity caused by partial occlusion and allow for a more robust and adaptive detection of varied objects from different scenes

Deakin Research Online

Finding Objects of Interest in Images using Saliency and Superpixels

Author: Achanta Radhakrishna
Publication venue: Lausanne, EPFL
Publication date: 21/10/2010
Field of study

The ability to automatically find objects of interest in images is useful in the areas of compression, indexing and retrieval, re-targeting, and so on. There are two classes of such algorithms – those that find any object of interest with no prior knowledge, independent of the task, and those that find specific objects of interest known a priori. The former class of algorithms tries to detect objects in images that stand-out, i.e. are salient, by virtue of being different from the rest of the image and consequently capture our attention. The detection is generic in this case as there is no specific object we are trying to locate. The latter class of algorithms detects specific known objects of interest and often requires training using features extracted from known examples. In this thesis we address various aspects of finding objects of interest under the topics of saliency detection and object detection. We present two saliency detection algorithms that rely on the principle of center-surround contrast. These two algorithms are shown to be superior to several state-of-the-art techniques in terms of precision and recall measures with respect to a ground truth. They output full-resolution saliency maps, are simpler to implement, and are computationally more efficient than most existing algorithms. We further establish the relevance of our saliency detection algorithms by using them for the known applications of object segmentation and image re-targeting. We first present three different techniques for salient object segmentation using our saliency maps that are based on clustering, graph-cuts, and geodesic distance based labeling. We then demonstrate the use of our saliency maps for a popular technique of content-aware image resizing and compare the result with that of existing methods. Our saliency maps prove to be a much more effective replacement for conventional gradient maps for providing automatic content-awareness. Just as it is important to find regions of interest in images, it is also important to find interesting images within a large collection of images. We therefore extend the notion of saliency detection in images to image databases. We propose an algorithm for finding salient images in a database. Apart from finding such images we also present two novel techniques for creating visually appealing summaries in the form of collages and mosaics. Finally, we address the problem of finding specific known objects of interest in images. Specifically, we deal with the feature extraction step that is a pre-requisite for any technique in this domain. In this context, we first present a superpixel segmentation algorithm that outperforms previous algorithms in terms quantitative measures of under-segmentation error and boundary recall. Our superpixel segmentation algorithm also offers several other advantages over existing algorithms like compactness, uniform size, control on the number of superpixels, and computational efficiency. We prove the effectiveness of our superpixels by deploying them in existing algorithms, specifically, an object class detection technique and a graph based algorithm, and improving their performance. We also present the result of using our superpixels in a technique for detecting mitochondria in noisy medical images

Infoscience - École polytechnique fédérale de Lausanne

Text–to–Video: Image Semantics and NLP

Author: Schwarz Katharina
Publication venue: Universität Tübingen
Publication date: 01/01/2018
Field of study

When aiming at automatically translating an arbitrary text into a visual story, the main challenge consists in finding a semantically close visual representation whereby the displayed meaning should remain the same as in the given text. Besides, the appearance of an image itself largely influences how its meaningful information is transported towards an observer. This thesis now demonstrates that investigating in both, image semantics as well as the semantic relatedness between visual and textual sources enables us to tackle the challenging semantic gap and to find a semantically close translation from natural language to a corresponding visual representation. Within the last years, social networking became of high interest leading to an enormous and still increasing amount of online available data. Photo sharing sites like Flickr allow users to associate textual information with their uploaded imagery. Thus, this thesis exploits this huge knowledge source of user generated data providing initial links between images and words, and other meaningful data. In order to approach visual semantics, this work presents various methods to analyze the visual structure as well as the appearance of images in terms of meaningful similarities, aesthetic appeal, and emotional effect towards an observer. In detail, our GPU-based approach efficiently finds visual similarities between images in large datasets across visual domains and identifies various meanings for ambiguous words exploring similarity in online search results. Further, we investigate in the highly subjective aesthetic appeal of images and make use of deep learning to directly learn aesthetic rankings from a broad diversity of user reactions in social online behavior. To gain even deeper insights into the influence of visual appearance towards an observer, we explore how simple image processing is capable of actually changing the emotional perception and derive a simple but effective image filter. To identify meaningful connections between written text and visual representations, we employ methods from Natural Language Processing (NLP). Extensive textual processing allows us to create semantically relevant illustrations for simple text elements as well as complete storylines. More precisely, we present an approach that resolves dependencies in textual descriptions to arrange 3D models correctly. Further, we develop a method that finds semantically relevant illustrations to texts of different types based on a novel hierarchical querying algorithm. Finally, we present an optimization based framework that is capable of not only generating semantically relevant but also visually coherent picture stories in different styles.Bei der automatischen Umwandlung eines beliebigen Textes in eine visuelle Geschichte, besteht die größte Herausforderung darin eine semantisch passende visuelle Darstellung zu finden. Dabei sollte die Bedeutung der Darstellung dem vorgegebenen Text entsprechen. Darüber hinaus hat die Erscheinung eines Bildes einen großen Einfluß darauf, wie seine bedeutungsvollen Inhalte auf einen Betrachter übertragen werden. Diese Dissertation zeigt, dass die Erforschung sowohl der Bildsemantik als auch der semantischen Verbindung zwischen visuellen und textuellen Quellen es ermöglicht, die anspruchsvolle semantische Lücke zu schließen und eine semantisch nahe Übersetzung von natürlicher Sprache in eine entsprechend sinngemäße visuelle Darstellung zu finden. Des Weiteren gewann die soziale Vernetzung in den letzten Jahren zunehmend an Bedeutung, was zu einer enormen und immer noch wachsenden Menge an online verfügbaren Daten geführt hat. Foto-Sharing-Websites wie Flickr ermöglichen es Benutzern, Textinformationen mit ihren hochgeladenen Bildern zu verknüpfen. Die vorliegende Arbeit nutzt die enorme Wissensquelle von benutzergenerierten Daten welche erste Verbindungen zwischen Bildern und Wörtern sowie anderen aussagekräftigen Daten zur Verfügung stellt. Zur Erforschung der visuellen Semantik stellt diese Arbeit unterschiedliche Methoden vor, um die visuelle Struktur sowie die Wirkung von Bildern in Bezug auf bedeutungsvolle Ähnlichkeiten, ästhetische Erscheinung und emotionalem Einfluss auf einen Beobachter zu analysieren. Genauer gesagt, findet unser GPU-basierter Ansatz effizient visuelle Ähnlichkeiten zwischen Bildern in großen Datenmengen quer über visuelle Domänen hinweg und identifiziert verschiedene Bedeutungen für mehrdeutige Wörter durch die Erforschung von Ähnlichkeiten in Online-Suchergebnissen. Des Weiteren wird die höchst subjektive ästhetische Anziehungskraft von Bildern untersucht und "deep learning" genutzt, um direkt ästhetische Einordnungen aus einer breiten Vielfalt von Benutzerreaktionen im sozialen Online-Verhalten zu lernen. Um noch tiefere Erkenntnisse über den Einfluss des visuellen Erscheinungsbildes auf einen Betrachter zu gewinnen, wird erforscht, wie alleinig einfache Bildverarbeitung in der Lage ist, tatsächlich die emotionale Wahrnehmung zu verändern und ein einfacher aber wirkungsvoller Bildfilter davon abgeleitet werden kann. Um bedeutungserhaltende Verbindungen zwischen geschriebenem Text und visueller Darstellung zu ermitteln, werden Methoden des "Natural Language Processing (NLP)" verwendet, die der Verarbeitung natürlicher Sprache dienen. Der Einsatz umfangreicher Textverarbeitung ermöglicht es, semantisch relevante Illustrationen für einfache Textteile sowie für komplette Handlungsstränge zu erzeugen. Im Detail wird ein Ansatz vorgestellt, der Abhängigkeiten in Textbeschreibungen auflöst, um 3D-Modelle korrekt anzuordnen. Des Weiteren wird eine Methode entwickelt die, basierend auf einem neuen hierarchischen Such-Anfrage Algorithmus, semantisch relevante Illustrationen zu Texten verschiedener Art findet. Schließlich wird ein optimierungsbasiertes Framework vorgestellt, das nicht nur semantisch relevante, sondern auch visuell kohärente Bildgeschichten in verschiedenen Bildstilen erzeugen kann

Publikationsserver der Universität Tübingen

Change blindness: eradication of gestalt strategies

Author: Goddard Paul
Wilson Steve
Publication venue: 'Pion Ltd'
Publication date: 01/08/2011
Field of study

Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

University of Lincoln Institutional Repository

Unterstützung des Editierens von Graphen in Visuellen Repräsentationen

Author: Gladisch Stefan (gnd: 1109925840)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

The goal of this thesis is to provide solutions for supporting the direct editing of graphs in visual representations for analyzing graphs. For that, a conceptual view on the user's tasks is established first. On this basis, several novel approaches to "visually edit" the different data aspects of graphs - the graph's structure and associated attribute values - are introduced. Thereby, different visual graph representations suitable for communicating the data are considered.Das Ziel der vorliegenden Dissertation ist, Lösungen zur Unterstützung des direkten Editierens von Graphen in visuellen Repräsentationen zur Analyse von Graphen bereitzustellen. Dafür wird zunächst eine konzeptuelle Sicht auf die Aufgaben des Nutzers entwickelt. Auf dieser Basis werden anschließend mehrere neue Verfahren eingeführt, welche das "visuelle Editieren" der verschiedenen Datenaspekte von Graphen - der Struktur sowie dazu assoziierte Attributwerte - ermöglichen. Dabei werden verschiedene visuelle Graphrepräsentationen berücksichtigt, welche die Daten in geeigneter Form kommunizieren

Rostocker Dokumentenserver

Recommended from our members

Face Detection Using Single Cascade of Customized Features Discriminators

Author: Hammuda Ayman Omar
Publication venue: CU Scholar
Publication date: 01/01/2012
Field of study

Face detection has become an important and helpful tool for camera and video processing. Useful human-computer interaction (HCI) applications such as drivers assistant system that prevents accidents and saves pedestrian lives when drivers attention is absent, needs a head pose estimator. A head pose estimator cannot function without face detector. There has been a considerable amount of literature to address the problem. The most significant results obtained on uptight frontal face detection which is a sub-problem of a larger problem of face detection. There are other types of sub-problems that has been studied with least significant advancements that the upright frontal face detection had accomplished. The problem of multi-pose detection is still under study and it remains hard. A solution to this large scale of the problem (multi-pose face detection) is critical in head pose accuracy. This thesis suggests a multi-pose face detection algorithm for uncontrolled environments. The detector is designed to be used in building head pose estimator for a human-computer interaction application. The observed design of the detector has to implement a cascade of classifiers. Each classifier has to address at least one certain area of the problem. The design have to maintain speed and an acceptable detection rate. These requirements can be satisfied by constructing the cascade to implement fast and simple classifiers at first stages of the cascade. A novel use of the integral image as a fast filter was invented to be placed at the start of the detection process. Included in the cascade, classifiers that are trained on special designed features aimed to solve part of the problem. One special unique classifier is a data mining based classifier that uses a modified version of the Maximal Frequent Itemset Algorithm (MAFIA) [2] for feature extraction. Special features classifiers use the extracted facial features information extracted from a new knowledge-based classifier/filter that was created with the capacity to locate to an acceptable ac- curacy the location of eyes, mouth and nose using a suite of approaches including discreet local minima and geometric measures. The extracted facial features were used to estimate head pose and extract classifier features accordingly to enhance detection rates. A cascade of classifiers based on fast and simple contrast features was used to refine and speed up the detection process. To further improve speed some components were parallelized. As an attempt to overcome some of the fundamental challenges of face detection, lighting correction and noise reduction were implemented based on the information extracted from images. Results are reported on the FDDB [12] benchmark showed 5.22% detection rate with 2000 false positives while OpenCV implementation of Viola-Jones [19] face detector showed 65.92 detection rate with 2010 false positives. This comparison is flawed; because Viola-Jones is an upright face detector and even though FDDB [12] includes a number on non-frontal faces and profiles the majority of the faces are frontal. The two solutions address two different problems that reflect large differences in difficulty. A standard benchmark testset and evaluation system as FDDB [12] benchmark and com- parable results from the same class of the problem at the time of writing this document was not available. The key points to building good face detector in general are; (1) resolving speed issues using fast techniques (e.g. integral image) at the start of the cascade and a powerful design, (2) using a huge number of different strong and weak features, and (3) eliminating variations (i.e. pose , noise and lighting variations). The algorithm was also tested on MIT+CMU upfront faces testset and reported 43.56% detection rate with 504 false positives

CU Scholar Institutional Repository