Search CORE

23 research outputs found

Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

Author: Fragkiadaki Aikaterini
Publication venue: ScholarlyCommons
Publication date: 01/01/2013
Field of study

Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

ScholarlyCommons@Penn

Quantum Cuts: A Quantum Mechanical Spectral Graph Partitioning Method for Salient Object Detection

Author: Aytekin Caglar
Publication venue: Tampere University of Technology
Publication date: 01/01/2016
Field of study

The increasing number of cameras, their availability to the end user and the social media platforms gave rise to the massive repositories of today’s Big Data. The largest portion of this data corresponds to unstructured image and video collections. This fact motivates the development of algorithms that would help efficient management and organization of the Big Data. This processing usually involves high level Computer Vision tasks such as object detection and recognition whose accuracy and complexity are therefore crucial. Salient object detection, which can be defined as highlighting the regions that visually stand out from the rest of the environment, can both reduce the complexity and improve the accuracy of object detection and recognition. Thus, recently there has been a growing interest in this topic. This interest is also due to many other applications of salient object detection such as media compression and summarization.This thesis focuses on this crucial problem and presents novel approaches and methods for salient object detection in digital media, using the principles of Quantum Mechanics. The contributions of this thesis can be categorized chronologically into three parts. First part is constituted of a direct application of ideas originally proposed for describing the wave nature of particles in Quantum Mechanics and expressed through Schrödinger’s Equation, to salient object detection in images. The significance of this contribution is the fact that, to the best of our knowledge, this is the first study that proposes a realizable quantum mechanical system for salient object proposals yielding an instantaneous speed in a possible physical implementation in the quantum scale.The second and main contribution of this thesis, is a spectral graph based salient object detection method, namely Quantum-Cuts. Despite the success of spectral graph based methods in many Computer Vision tasks, traditional approaches on applications of spectral graph partitioning methods offer little for the salient object detection problem which can be mapped as a foreground segmentation problem using graphs. Thus, Quantum-Cuts adopts a novel approach to spectral graph partitioning by integrating quantum mechanical concepts to Spectral Graph Theory. In particular, the probabilistic interpretation of quantum mechanical wave-functions and the unary potential fields in Quantum Mechanics when combined with the pairwise graph affinities that are widely used in Spectral Graph Theory, results into a unique optimization problem that formulates salient object detection. The optimal solution of a relaxed version of this problem is obtained via Quantum-Cuts and is proven to efficiently represent salient object regions in images.The third part of the contributions cover improvements on Quantum-Cuts by analyzing the main factors that affect its performance in salient object detection. Particularly, both unsupervised and supervised approaches are adopted in improving the exploited graph representation. The extensions on Quantum-Cuts led to computationally efficient algorithms that perform superior to the state-of-the-art in salient object detectio

Trepo - Institutional Repository of Tampere University

Learning to segment in images and videos with different forms of supervision

Author: Khoreva Anna
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

Much progress has been made in image and video segmentation over the last years. To a large extent, the success can be attributed to the strong appearance models completely learned from data, in particular using deep learning methods. However, to perform best these methods require large representative datasets for training with expensive pixel-level annotations, which in case of videos are prohibitive to obtain. Therefore, there is a need to relax this constraint and to consider alternative forms of supervision, which are easier and cheaper to collect. In this thesis, we aim to develop algorithms for learning to segment in images and videos with different levels of supervision. First, we develop approaches for training convolutional networks with weaker forms of supervision, such as bounding boxes or image labels, for object boundary estimation and semantic/instance labelling tasks. We propose to generate pixel-level approximate groundtruth from these weaker forms of annotations to train a network, which allows to achieve high-quality results comparable to the full supervision quality without any modifications of the network architecture or the training procedure. Second, we address the problem of the excessive computational and memory costs inherent to solving video segmentation via graphs. We propose approaches to improve the runtime and memory efficiency as well as the output segmentation quality by learning from the available training data the best representation of the graph. In particular, we contribute with learning must-link constraints, the topology and edge weights of the graph as well as enhancing the graph nodes - superpixels - themselves. Third, we tackle the task of pixel-level object tracking and address the problem of the limited amount of densely annotated video data for training convolutional networks. We introduce an architecture which allows training with static images only and propose an elaborate data synthesis scheme which creates a large number of training examples close to the target domain from the given first frame mask. With the proposed techniques we show that densely annotated consequent video data is not necessary to achieve high-quality temporally coherent video segmentation results. In summary, this thesis advances the state of the art in weakly supervised image segmentation, graph-based video segmentation and pixel-level object tracking and contributes with the new ways of training convolutional networks with a limited amount of pixel-level annotated training data.In der Bild- und Video-Segmentierung wurden im Laufe der letzten Jahre große Fortschritte erzielt. Dieser Erfolg beruht weitgehend auf starken Appearance Models, die vollständig aus Daten gelernt werden, insbesondere mit Deep Learning Methoden. Für beste Performanz benötigen diese Methoden jedoch große repräsentative Datensätze für das Training mit teuren Annotationen auf Pixelebene, die bei Videos unerschwinglich sind. Deshalb ist es notwendig, diese Einschränkung zu überwinden und alternative Formen des überwachten Lernens in Erwägung zu ziehen, die einfacher und kostengünstiger zu sammeln sind. In dieser Arbeit wollen wir Algorithmen zur Segmentierung von Bildern und Videos mit verschiedenen Ebenen des überwachten Lernens entwickeln. Zunächst entwickeln wir Ansätze zum Training eines faltenden Netzwerkes (convolutional network) mit schwächeren Formen des überwachten Lernens, wie z.B. Begrenzungsrahmen oder Bildlabel, für Objektbegrenzungen und Semantik/Instanz- Klassifikationsaufgaben. Wir schlagen vor, aus diesen schwächeren Formen von Annotationen eine annähernde Ground Truth auf Pixelebene zu generieren, um ein Netzwerk zu trainieren, das hochwertige Ergebnisse ermöglicht, die qualitativ mit denen bei voll überwachtem Lernen vergleichbar sind, und dies ohne Änderung der Netzwerkarchitektur oder des Trainingsprozesses. Zweitens behandeln wir das Problem des beträchtlichen Rechenaufwands und Speicherbedarfs, das der Segmentierung von Videos mittels Graphen eigen ist. Wir schlagen Ansätze vor, um sowohl die Laufzeit und Speichereffizienz als auch die Qualität der Segmentierung zu verbessern, indem aus den verfügbaren Trainingsdaten die beste Darstellung des Graphen gelernt wird. Insbesondere leisten wir einen Beitrag zum Lernen mit must-link Bedingungen, zur Topologie und zu Kantengewichten des Graphen sowie zu verbesserten Superpixeln. Drittens gehen wir die Aufgabe des Objekt-Tracking auf Pixelebene an und befassen uns mit dem Problem der begrenzten Menge von dicht annotierten Videodaten zum Training eines faltenden Netzwerkes. Wir stellen eine Architektur vor, die das Training nur mit statischen Bildern ermöglicht, und schlagen ein aufwendiges Schema zur Datensynthese vor, das aus der gegebenen ersten Rahmenmaske eine große Anzahl von Trainingsbeispielen ähnlich der Zieldomäne schafft. Mit den vorgeschlagenen Techniken zeigen wir, dass dicht annotierte zusammenhängende Videodaten nicht erforderlich sind, um qualitativ hochwertige zeitlich kohärente Resultate der Segmentierung von Videos zu erhalten. Zusammenfassend lässt sich sagen, dass diese Arbeit den Stand der Technik in schwach überwachter Segmentierung von Bildern, graphenbasierter Segmentierung von Videos und Objekt-Tracking auf Pixelebene weiter entwickelt, und mit neuen Formen des Trainings faltender Netzwerke bei einer begrenzten Menge von annotierten Trainingsdaten auf Pixelebene einen Beitrag leistet

Universaar

Acronym

MPG.PuRe

Cosaliency detection based on intrasaliency prior transfer and deep intersaliency mining

Author: Han Jungong
Han Junwei
Shao Ling
Zhang Dingwen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/11/2015
Field of study

As an interesting and emerging topic, cosaliency detection aims at simultaneously extracting common salient objects in multiple related images. It differs from the conventional saliency detection paradigm in which saliency detection for each image is determined one by one independently without taking advantage of the homogeneity in the data pool of multiple related images. In this paper, we propose a novel cosaliency detection approach using deep learning models. Two new concepts, called intrasaliency prior transfer and deep intersaliency mining, are introduced and explored in the proposed work. For the intrasaliency prior transfer, we build a stacked denoising autoencoder (SDAE) to learn the saliency prior knowledge from auxiliary annotated data sets and then transfer the learned knowledge to estimate the intrasaliency for each image in cosaliency data sets. For the deep intersaliency mining, we formulate it by using the deep reconstruction residual obtained in the highest hidden layer of a self-trained SDAE. The obtained deep intersaliency can extract more intrinsic and general hidden patterns to discover the homogeneity of cosalient objects in terms of some higher level concepts. Finally, the cosaliency maps are generated by weighted integration of the proposed intrasaliency prior, deep intersaliency, and traditional shallow intersaliency. Comprehensive experiments over diverse publicly available benchmark data sets demonstrate consistent performance gains of the proposed method over the state-of-the-art cosaliency detection methods

Northumbria Research Link

Crossref

Warwick Research Archives Portal Repository

Lancaster E-Prints

University of East Anglia digital repository

A brief survey of visual saliency detection

Author: Guo Jie
Hussain Sumaira
Jian Muwei
Ullah Inam
Wang Xing
Yin Yilong
Yu Hui
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2020
Field of study

Portsmouth University Research Portal (Pure)

Recommended from our members

A prior regularized multi-layer graph ranking model for image saliency computation

Author: Jiang Bo
Ma Jixin
Tang Jin
Tu Zhengzheng
Xiao Yun
Publication venue: 'Elsevier BV'
Publication date: 13/11/2018
Field of study

Bottom-up saliency detection has been widely studied in many applications, such as image retrieval, object recognition, image compression and so on. Saliency detection via manifold ranking (MR) can identify the most salient and important area from an image efficiently. One limitation of the MR model is that it fails to consider the prior information in its ranking process. To overcome this limitation, we propose a prior regularized multi-layer graph ranking model (RegMR), which uses the prior calculating by boundary connectivity. We employ the foreground possibility in the first stage and background possibility in the second stage based on a multi-layer graph. We compare our model with fifteen state-of-the-art methods. Experiments show that our model performs well than all other methods on four public databases on PR-curves, F-measure and so on

Greenwich Academic Literature Archive