114 research outputs found

    Photorealistic retrieval of occluded facial information using a performance-driven face model

    Get PDF
    Facial occlusions can cause both human observers and computer algorithms to fail in a variety of important tasks such as facial action analysis and expression classification. This is because the missing information is not reconstructed accurately enough for the purpose of the task in hand. Most current computer methods that are used to tackle this problem implement complex three-dimensional polygonal face models that are generally timeconsuming to produce and unsuitable for photorealistic reconstruction of missing facial features and behaviour. In this thesis, an image-based approach is adopted to solve the occlusion problem. A dynamic computer model of the face is used to retrieve the occluded facial information from the driver faces. The model consists of a set of orthogonal basis actions obtained by application of principal component analysis (PCA) on image changes and motion fields extracted from a sequence of natural facial motion (Cowe 2003). Examples of occlusion affected facial behaviour can then be projected onto the model to compute coefficients of the basis actions and thus produce photorealistic performance-driven animations. Visual inspection shows that the PCA face model recovers aspects of expressions in those areas occluded in the driver sequence, but the expression is generally muted. To further investigate this finding, a database of test sequences affected by a considerable set of artificial and natural occlusions is created. A number of suitable metrics is developed to measure the accuracy of the reconstructions. Regions of the face that are most important for performance-driven mimicry and that seem to carry the best information about global facial configurations are revealed using Bubbles, thus in effect identifying facial areas that are most sensitive to occlusions. Recovery of occluded facial information is enhanced by applying an appropriate scaling factor to the respective coefficients of the basis actions obtained by PCA. This method improves the reconstruction of the facial actions emanating from the occluded areas of the face. However, due to the fact that PCA produces bases that encode composite, correlated actions, such an enhancement also tends to affect actions in non-occluded areas of the face. To avoid this, more localised controls for facial actions are produced using independent component analysis (ICA). Simple projection of the data onto an ICA model is not viable due to the non-orthogonality of the extracted bases. Thus occlusion-affected mimicry is first generated using the PCA model and then enhanced by accordingly manipulating the independent components that are subsequently extracted from the mimicry. This combination of methods yields significant improvements and results in photorealistic reconstructions of occluded facial actions

    Detecting semantic concepts in digital photographs: low-level features vs. non-homogeneous data fusion

    Get PDF
    Semantic concepts, such as faces, buildings, and other real world objects, are the most preferred instrument that humans use to navigate through and retrieve visual content from large multimedia databases. Semantic annotation of visual content in large collections is therefore essential if ease of access and use is to be ensured. Classification of images into broad categories such as indoor/outdoor, building/non-building, urban/landscape, people/no-people, etc., allows us to obtain the semantic labels without the full knowledge of all objects in the scene. Inferring the presence of high-level semantic concepts from low-level visual features is a research topic that has been attracting a significant amount of interest lately. However, the power of lowlevel visual features alone has been shown to be limited when faced with the task of semantic scene classification in heterogeneous, unconstrained, broad-topic image collections. Multi-modal fusion or combination of information from different modalities has been identified as one possible way of overcoming the limitations of single-mode approaches. In the field of digital photography, the incorporation of readily available camera metadata, i.e. information about the image capture conditions stored in the EXIF header of each image, along with the GPS information, offers a way to move towards a better understanding of the imaged scene. In this thesis we focus on detection of semantic concepts such as artificial text in video and large buildings in digital photographs, and examine how fusion of low-level visual features with selected camera metadata, using a Support Vector Machine as an integration device, affects the performance of the building detector in a genuine personal photo collection. We implemented two approaches to detection of buildings that combine content-based and the context-based information, and an approach to indoor/outdoor classification based exclusively on camera metadata. An outdoor detection rate of 85.6% was obtained using camera metadata only. The first approach to building detection, based on simple edge orientation-based features extracted at three different scales, has been tested on a dataset of 1720 outdoor images, with a classification accuracy of 88.22%. The second approach integrates the edge orientation-based features with the camera metadata-based features, both at the feature and at the decision level. The fusion approaches have been evaluated using an unconstrained dataset of 8000 genuine consumer photographs. The experiments demonstrate that the fusion approaches outperform the visual features-only approach by of 2-3% on average regardless of the operating point chosen, while all the performance measures are approximately 4% below the upper limit of performance. The early fusion approach consistently improves all performance measures

    Resolution enhancement of low-quality videos using a high-resolution frame

    Full text link

    Exploiting Textons Distributions on Spatial Hierarchy for Scene Classification

    Get PDF
    This paper proposes a method to recognize scene categories using bags of visual words obtained by hierarchically partitioning into subregion the input images. Specifically, for each subregion the Textons distribution and the extension of the corresponding subregion are taken into account. The bags of visual words computed on the subregions are weighted and used to represent the whole scene. The classification of scenes is carried out by discriminative methods (i.e., SVM, KNN). A similarity measure based on Bhattacharyya coefficient is proposed to establish similarities between images, represented as hierarchy of bags of visual words. Experimental tests, using fifteen different scene categories, show that the proposed approach achieves good performances with respect to the state-of-the-art methods

    Saliency for Image Description and Retrieval

    Get PDF
    We live in a world where we are surrounded by ever increasing numbers of images. More often than not, these images have very little metadata by which they can be indexed and searched. In order to avoid information overload, techniques need to be developed to enable these image collections to be searched by their content. Much of the previous work on image retrieval has used global features such as colour and texture to describe the content of the image. However, these global features are insufficient to accurately describe the image content when different parts of the image have different characteristics. This thesis initially discusses how this problem can be circumvented by using salient interest regions to select the areas of the image that are most interesting and generate local descriptors to describe the image characteristics in that region. The thesis discusses a number of different saliency detectors that are suitable for robust retrieval purposes and performs a comparison between a number of these region detectors. The thesis then discusses how salient regions can be used for image retrieval using a number of techniques, but most importantly, two techniques inspired from the field of textual information retrieval. Using these robust retrieval techniques, a new paradigm in image retrieval is discussed, whereby the retrieval takes place on a mobile device using a query image captured by a built-in camera. This paradigm is demonstrated in the context of an art gallery, in which the device can be used to find more information about particular images. The final chapter of the thesis discusses some approaches to bridging the semantic gap in image retrieval. The chapter explores ways in which un-annotated image collections can be searched by keyword. Two techniques are discussed; the first explicitly attempts to automatically annotate the un-annotated images so that the automatically applied annotations can be used for searching. The second approach does not try to explicitly annotate images, but rather, through the use of linear algebra, it attempts to create a semantic space in which images and keywords are positioned such that images are close to the keywords that represent them within the space

    Content-driven superpixels and their applications

    No full text
    This thesis develops a new superpixel algorithm that displays excellent visual reconstruction of the original image. It achieves high stability across multiple random initialisations, achieved by producing superpixels directly corresponding to local image complexity. This is achieved by growing superpixels and dividing them on image variation. The existing analysis was not sufficient to take these properties into account so new measures of oversegmentation provide new insight into the optimum superpixel representation. As a consequence of the algorithm, it was discovered that CDS has properties that have eluded previous attempts, such as initialisation invariance and stability. The completely unsupervised nature of CDS makes them highly suitable for tasks such as application to a database containing images of unknown complexity. These new superpixel properties have allowed new applications for superpixel pre-processing to be produced. These are image segmentation; image compression; scene classification; and focus detection. In addition, a new method of objectively analysing regions of focus has been developed using Light-Field photography

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Effective design, configuration, and use of digital CCTV

    Get PDF
    It is estimated that there are five million CCTV cameras in use today. CCTV is used by a wide range of organisations and for an increasing number of purposes. Despite this, there has been little research to establish whether these systems are fit for purpose. This thesis takes a socio-technical approach to determine whether CCTV is effective, and if not, how it could be made more effective. Humancomputer interaction (HCI) knowledge and methods have been applied to improve this understanding and what is needed to make CCTV effective; this was achieved in an extensive field study and two experiments. In Study 1, contextual inquiry was used to identify the security goals, tasks, technology and factors which affected operator performance and the causes at 14 security control rooms. The findings revealed a number of factors which interfered with task performance, such as: poor camera positioning, ineffective workstation setups, difficulty in locating scenes, and the use of low-quality CCTV recordings. The impact of different levels of video quality on identification and detection performance was assessed in two experiments using a task-focused methodology. In Study 2, 80 participants identified 64 face images taken from four spatially compressed video conditions (32, 52, 72, and 92 Kbps). At a bit rate quality of 52 Kbps (MPEG-4), the number of faces correctly identified reached significance. In Study 3, 80 participants each detected 32 events from four frame rate CCTV video conditions (1, 5, 8, and 12 fps). Below 8 frames per second, correct detections and task confidence ratings decreased significantly. These field and empirical research findings are presented in a framework using a typical CCTV deployment scenario, which has been validated through an expert review. The contributions and limitations of this thesis are reviewed, and suggestions for how the framework should be further developed are provided

    MedLAN: Compact mobile computing system for wireless information access in emergency hospital wards

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.As the need for faster, safer and more efficient healthcare delivery increases, medical consultants seek new ways of implementing a high quality telemedical system, using innovative technology. Until today, teleconsultation (the most common application of Telemedicine) was performed by transferring the patient from the Accidents and Emergency ward, to a specially equipped room, or by moving large and heavy machinery to the place where the patient resided. Both these solutions were unpractical, uneconomical and potentially dangerous. At the same time wireless networks became increasingly useful in point-of-care areas such as hospitals, because of their ease of use, low cost of installation and increased flexibility. This thesis presents an integrated system called MedLAN dedicated for use inside the A&E hospital wards. Its purpose is to wirelessly support high-quality live video, audio, high-resolution still images and networks support from anywhere there is WLAN coverage. It is capable of transmitting all of the above to a consultant residing either inside or outside the hospital, or even to an external place, thorough the use of the Internet. To implement that, it makes use of the existing IEEE 802.11b wireless technology. Initially, this thesis demonstrates that for specific scenarios (such as when using WLANs), DICOM specifications should be adjusted to accommodate for the reduced WLAN bandwidth. Near lossless compression has been used to send still images through the WLANs and the results have been evaluated by a number of consultants to decide whether they retain their diagnostic value. The thesis further suggests improvements on the existing 802.11b protocol. In particular, as the typical hospital environment suffers from heavy RF reflections, it suggests that an alternative method of modulation (OFDM) can be embedded in the 802.11b hardware to reduce the multipath effect, increase the throughput and thus the video quality sent by the MedLAN system. Finally, realising that the trust between a patient and a doctor is fundamental this thesis proposes a series of simple actions aiming at securing the MedLAN system. Additionally, a concrete security system is suggested, that encapsulates the existing WEP security protocol, over IPSec
    corecore