55 research outputs found

    Video Image Segmentation and Object Detection Using Markov Random Field Model

    Get PDF
    In this dissertation, the problem of video object detection has been addressed. Initially this is accomplished by the existing method of temporal segmentation. It has been observed that the Video Object Plane (VOP) generated by temporal segmentation has a strong limitation in the sense that for slow moving video object it exhibits either poor performance or fails. Therefore, the problem of object detection is addressed in case of slow moving video objects and fast moving video objects as well. The object is detected while integrating the spatial segmentation as well as temporal segmentation. In order to take care of the temporal pixel distribution in to account for spatial segmentation of frames, the spatial segmentation of frames has been formulated in spatio-temporal framework. A compound MRF model is proposed to model the video sequence. This model takes care of the spatial and the temporal distributions as well. Besides taking in to account the pixel distributions in temporal directions, compound MRF models have been proposed to model the edges in the temporal direction. This model has been named as edgebased model. Further more the differences in the successive images have been modeled by MRF and this is called as the change based model. This change based model enhanced the performance of the proposed scheme. The spatial segmentation problem is formulated as a pixel labeling problem in spatio-temporal framework. The pixel labels estimation problem is formulated using Maximum a posteriori (MAP) criterion. The segmentation is achieved in supervised mode where we have selected the model parameters in a trial and error basis. The MAP estimates of the labels have been obtained by a proposed Hybrid Algorithm is devised by integrating that global as well as local convergent criterion. Temporal segmentation of frames have been obtained where we do not assume to have the availability of reference frame. The spatial and temporal segmentation have been integrated to obtain the Video Object Plane (VOP) and hence object detection In order to reduce the computational burden an evolutionary approach based scheme has been proposed. In this scheme the first frame is segmented and segmentation of other frames are obtained using the segmentation of the first frame. The computational burden is much less as compared to the previous proposed scheme. Entropy based adaptive thresholding scheme is proposed to enhance the accuracy of temporal segmentation. The object detection is achieved by integrating spatial as well as the improved temporal segmentation results

    Energy efficient hardware acceleration of multimedia processing tools

    Get PDF
    The world of mobile devices is experiencing an ongoing trend of feature enhancement and generalpurpose multimedia platform convergence. This trend poses many grand challenges, the most pressing being their limited battery life as a consequence of delivering computationally demanding features. The envisaged mobile application features can be considered to be accelerated by a set of underpinning hardware blocks Based on the survey that this thesis presents on modem video compression standards and their associated enabling technologies, it is concluded that tight energy and throughput constraints can still be effectively tackled at algorithmic level in order to design re-usable optimised hardware acceleration cores. To prove these conclusions, the work m this thesis is focused on two of the basic enabling technologies that support mobile video applications, namely the Shape Adaptive Discrete Cosine Transform (SA-DCT) and its inverse, the SA-IDCT. The hardware architectures presented in this work have been designed with energy efficiency in mind. This goal is achieved by employing high level techniques such as redundant computation elimination, parallelism and low switching computation structures. Both architectures compare favourably against the relevant pnor art in the literature. The SA-DCT/IDCT technologies are instances of a more general computation - namely, both are Constant Matrix Multiplication (CMM) operations. Thus, this thesis also proposes an algorithm for the efficient hardware design of any general CMM-based enabling technology. The proposed algorithm leverages the effective solution search capability of genetic programming. A bonus feature of the proposed modelling approach is that it is further amenable to hardware acceleration. Another bonus feature is an early exit mechanism that achieves large search space reductions .Results show an improvement on state of the art algorithms with future potential for even greater savings

    Extracting semantic video objects

    Get PDF
    Dagan Feng2000-2001 > Academic research: refereed > Publication in refereed journalVersion of RecordPublishe

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Feature based dynamic intra-video indexing

    Get PDF
    A thesis submitted in partial fulfillment for the degree of Doctor of PhilosophyWith the advent of digital imagery and its wide spread application in all vistas of life, it has become an important component in the world of communication. Video content ranging from broadcast news, sports, personal videos, surveillance, movies and entertainment and similar domains is increasing exponentially in quantity and it is becoming a challenge to retrieve content of interest from the corpora. This has led to an increased interest amongst the researchers to investigate concepts of video structure analysis, feature extraction, content annotation, tagging, video indexing, querying and retrieval to fulfil the requirements. However, most of the previous work is confined within specific domain and constrained by the quality, processing and storage capabilities. This thesis presents a novel framework agglomerating the established approaches from feature extraction to browsing in one system of content based video retrieval. The proposed framework significantly fills the gap identified while satisfying the imposed constraints of processing, storage, quality and retrieval times. The output entails a framework, methodology and prototype application to allow the user to efficiently and effectively retrieved content of interest such as age, gender and activity by specifying the relevant query. Experiments have shown plausible results with an average precision and recall of 0.91 and 0.92 respectively for face detection using Haar wavelets based approach. Precision of age ranges from 0.82 to 0.91 and recall from 0.78 to 0.84. The recognition of gender gives better precision with males (0.89) compared to females while recall gives a higher value with females (0.92). Activity of the subject has been detected using Hough transform and classified using Hiddell Markov Model. A comprehensive dataset to support similar studies has also been developed as part of the research process. A Graphical User Interface (GUI) providing a friendly and intuitive interface has been integrated into the developed system to facilitate the retrieval process. The comparison results of the intraclass correlation coefficient (ICC) shows that the performance of the system closely resembles with that of the human annotator. The performance has been optimised for time and error rate

    Singing voice resynthesis using concatenative-based techniques

    Get PDF
    Tese de Doutoramento. Engenharia InformĂĄtica. Faculdade de Engenharia. Universidade do Porto. 201

    On the sample consensus robust estimation paradigm: comprehensive survey and novel algorithms with applications.

    Get PDF
    Master of Science in Statistics and Computer Science.University of KwaZulu-Natal, Durban 2016.This study begins with a comprehensive survey of existing variants of the Random Sample Consensus (RANSAC) algorithm. Then, five new ones are contributed. RANSAC, arguably the most popular robust estimation algorithm in computer vision, has limitations in accuracy, efficiency and repeatability. Research into techniques for overcoming these drawbacks, has been active for about two decades. In the last one-and-half decade, nearly every single year had at least one variant published: more than ten, in the last two years. However, many existing variants compromise two attractive properties of the original RANSAC: simplicity and generality. Some introduce new operations, resulting in loss of simplicity, while many of those that do not introduce new operations, require problem-specific priors. In this way, they trade off generality and introduce some complexity, as well as dependence on other steps of the workflow of applications. Noting that these observations may explain the persisting trend, of finding only the older, simpler variants in ‘mainstream’ computer vision software libraries, this work adopts an approach that preserves the two mentioned properties. Modification of the original algorithm, is restricted to only search strategy replacement, since many drawbacks of RANSAC are consequences of the search strategy it adopts. A second constraint, serving the purpose of preserving generality, is that this ‘ideal’ strategy, must require no problem-specific priors. Such a strategy is developed, and reported in this dissertation. Another limitation, yet to be overcome in literature, but is successfully addressed in this study, is the inherent variability, in RANSAC. A few theoretical discoveries are presented, providing insights on the generic robust estimation problem. Notably, a theorem proposed as an original contribution of this research, reveals insights, that are foundational to newly proposed algorithms. Experiments on both generic and computer-vision-specific data, show that all proposed algorithms, are generally more accurate and more consistent, than RANSAC. Moreover, they are simpler in the sense that, they do not require some of the input parameters of RANSAC. Interestingly, although non-exhaustive in search like the typical RANSAC-like algorithms, three of these new algorithms, exhibit absolute non-randomness, a property that is not claimed by any existing variant. One of the proposed algorithms, is fully automatic, eliminating all requirements of user-supplied input parameters. Two of the proposed algorithms, are implemented as contributed alternatives to the homography estimation function, provided in MATLAB’s computer vision toolbox, after being shown to improve on the performance of M-estimator Sample Consensus (MSAC). MSAC has been the choice in all releases of the toolbox, including the latest 2015b. While this research is motivated by computer vision applications, the proposed algorithms, being generic, can be applied to any model-fitting problem from other scientific fields
    • 

    corecore