Search CORE

558 research outputs found

DC-image for real time compressed video matching

Author: Ahmed Amr
Bekhet Saddam
Hunter Andrew
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

This chapter presents a suggested framework for video matching based on local features extracted from the DC-image of MPEG compressed videos, without full decompression. In addition, the relevant arguments and supporting evidences are discussed. Several local feature detectors will be examined to select the best for matching using the DC-image. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and computation complexity. The second experiment compares between using local features and global features regarding compressed video matching with respect to the DC-image. The results confirmed that the use of DC-image, despite its highly reduced size, it is promising as it produces higher matching precision, compared to the full I-frame. Also, SIFT, as a local feature, outperforms most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the real-time margin which leaves a space for further optimizations that can be done to improve this computation complexity

Video matching using DC-image and local features

Author: Ahmed Amr
Bekhet Saddam
Hunter Andrew
Publication venue: Newswood Limited/International Association of Engineers
Publication date: 01/01/2013
Field of study

This paper presents a suggested framework for video matching based on local features extracted from the DCimage of MPEG compressed videos, without decompression. The relevant arguments and supporting evidences are discussed for developing video similarity techniques that works directly on compressed videos, without decompression, and especially utilising small size images. Two experiments are carried to support the above. The first is comparing between the DC-image and I-frame, in terms of matching performance and the corresponding computation complexity. The second experiment compares between using local features and global features in video matching, especially in the compressed domain and with the small size images. The results confirmed that the use of DC-image, despite its highly reduced size, is promising as it produces at least similar (if not better) matching precision, compared to the full I-frame. Also, using SIFT, as a local feature, outperforms precision of most of the standard global features. On the other hand, its computation complexity is relatively higher, but it is still within the realtime margin. There are also various optimisations that can be done to improve this computation complexity

Edge Hill University Research Information Repository

TRECVid 2011 Experiments at Dublin City University

Author: Foley Colum
Guo Jinlin
Gurrin Cathal
Hopfgartner Frank
Scott David
Smeaton Alan F.
Publication venue
Publication date: 01/01/2012
Field of study

This year the iAd-DCU team participated in three of the assigned TRECVid 2011 tasks; Semantic Indexing (SIN), Interactive Known-Item Search (KIS) and Multimedia Event Detection (MED). For the SIN task we presented three full runs using global features, local features and fusion of global, local features and relationships between concepts respectively. The evaluation results show that local features achieve better performance, with marginal gains found when introducing global features and relationships between concepts. With regard to our KIS submission, similar to our 2010 KIS experiments, we have implemented an iPad interface to a KIS video search tool. The aim of this year’s experimentation was to evaluate different display methodologies for KIS interaction. For this work, we integrate a clustering element for keyframes, which operates over MPEG-7 features using k-means clustering. In addition, we employ concept detection, not simply for search, but as a means of choosing most representative keyframes for ranked items. For our experiments we compare the baseline non-clustering system to a clustering system on a topic by topic basis. Finally, for the first time this year the iAd group at DCU has been involved in the MED Task. Two techniques are compared, employing low-level features directly and using concepts as intermediate representations. Evaluation results show promising initial results when performing event detection using concepts as intermediate representations

Enlighten

Medical Image Modality Classification using Feature Weighted Clustering Approach.

Author: Chandra Bhavik Anil
Publication venue
Publication date: 01/12/2010
Field of study

Sistem Dapat Semula Imej Perubatan merupakan satu bidang yang amat penting bagi pembekal penjagaan kesihatan. Medical Image Retrieval System is an area of great importance to the healthcare providers

Repository@USM

Multimedia: information representation and access

Author: Brown Evan
Little Suzanne
Rüger Stefan
Publication venue: 'Facet Publishing'
Publication date: 23/06/2011
Field of study

[About the book] Information retrieval (IR) is a complex human activity supported by sophisticated systems. Information science has contributed much to the design and evaluation of previous generations of IR system development and to our general understanding of how such systems should be designed and yet, due to the increasing success and diversity of IR systems, many recent textbooks concentrate on IR systems themselves and ignore the human side of searching for information. This book is the first text to provide an information science perspective on IR

Feature extraction using MPEG-CDVS and Deep Learning with application to robotic navigation and image classification

Author: PORTO BUARQUE DE GUSMAO PEDRO
Publication venue: country:Italy
Publication date: 01/01/2017
Field of study

The main contributions of this thesis are the evaluation of MPEG Compact Descriptor for Visual Search in the context of indoor robotic navigation and the introduction of a new method for training Convolutional Neural Networks with applications to object classification. The choice for image descriptor in a visual navigation system is not straightforward. Visual descriptors must be distinctive enough to allow for correct localisation while still offering low matching complexity and short descriptor size for real-time applications. MPEG Compact Descriptor for Visual Search is a low complexity image descriptor that offers several levels of compromises between descriptor distinctiveness and size. In this work, we describe how these trade-offs can be used for efficient loop-detection in a typical indoor environment. We first describe a probabilistic approach to loop detection based on the standard’s suggested similarity metric. We then evaluate the performance of CDVS compression modes in terms of matching speed, feature extraction, and storage requirements and compare them with the state of the art SIFT descriptor for five different types of indoor floors. During the second part of this thesis we focus on the new paradigm to machine learning and computer vision called Deep Learning. Under this paradigm visual features are no longer extracted using fine-grained, highly engineered feature extractor, but rather using a Convolutional Neural Networks (CNN) that extracts hierarchical features learned directly from data at the cost of long training periods. In this context, we propose a method for speeding up the training of Convolutional Neural Networks (CNN) by exploiting the spatial scaling property of convolutions. This is done by first training a pre-train CNN of smaller kernel resolutions for a few epochs, followed by properly rescaling its kernels to the target’s original dimensions and continuing training at full resolution. We show that the overall training time of a target CNN architecture can be reduced by exploiting the spatial scaling property of convolutions during early stages of learning. Moreover, by rescaling the kernels at different epochs, we identify a trade-off between total training time and maximum obtainable accuracy. Finally, we propose a method for choosing when to rescale kernels and evaluate our approach on recent architectures showing savings in training times of nearly 20% while test set accuracy is preserved

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Novel Efficient Algorithm for Locating and Tracking Object Parts in Low Resolution Videos

Author: Agah Arvin
Johnson David O.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/04/2011
Field of study

This is the published version. Copyright De GruyterIn this paper, a novel efficient algorithm is presented for locating and tracking object parts in low resolution videos using Lowe's SIFT keypoints with a nearest neighbor object detection approach. Our interest lies in using this information as one step in the process of automatically programming service, household, or personal robots to perform the skills that are being taught in easily obtainable instructional videos. In the reported experiments, the system looked for 14 parts of inanimate and animate objects in 40 natural outdoor scenes. The scenes were frames from a low-resolution instructional video on cleaning golf clubs containing 2,405 frames of 180 by 240 pixels. The system was trained using 39 frames that were half-way between the test frames. Despite the low resolution quality of the instructional video and occluded training samples, the system achieved a recall of 49 % with a precision of 71 % and an Fl of 0.58, which is better than that achieved by less demanding applications. In order to verify that the reported results were not dependent on the specific video, the proposed technique was applied to another video and the results are reported

Directory of Open Access Journals

COST292 experimental framework for TRECVID 2008

Author: Aginako N.
Alatan A.
Alexandre L. A.
Avrithis Y.
Benois-Pineau J.
Chandramouli K.
Corvaglia M.
Damnjanovic U.
Dimou A.
Esen E.
Fatemi N.
Goya J.
Guerrini F.
Hanjalic A.
Jarina R.
Kapsalas P.
King P.
Kompatsiaris I.
Makris L.
Mansencal B.
Mezaris V.
Migliorati P.
Moumtzidou A.
Mylonas Ph.
Naci U.
Nikolopoulos S.
Paralic M.
Piatrik T.
Pinheiro A. M. G.
Poulin F.
Raileanu L.
Saracoglu A.
Spyrou E.
Tolias G.
Vrochidis S.
Zhang Q.
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/01/2008
Field of study

In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos

Archivio istituzionale della ricerca - Università di Brescia