102 research outputs found

    Use of Coherent Point Drift in computer vision applications

    Get PDF
    This thesis presents the novel use of Coherent Point Drift in improving the robustness of a number of computer vision applications. CPD approach includes two methods for registering two images - rigid and non-rigid point set approaches which are based on the transformation model used. The key characteristic of a rigid transformation is that the distance between points is preserved, which means it can be used in the presence of translation, rotation, and scaling. Non-rigid transformations - or affine transforms - provide the opportunity of registering under non-uniform scaling and skew. The idea is to move one point set coherently to align with the second point set. The CPD method finds both the non-rigid transformation and the correspondence distance between two point sets at the same time without having to use a-priori declaration of the transformation model used. The first part of this thesis is focused on speaker identification in video conferencing. A real-time, audio-coupled video based approach is presented, which focuses more on the video analysis side, rather than the audio analysis that is known to be prone to errors. CPD is effectively utilised for lip movement detection and a temporal face detection approach is used to minimise false positives if face detection algorithm fails to perform. The second part of the thesis is focused on multi-exposure and multi-focus image fusion with compensation for camera shake. Scale Invariant Feature Transforms (SIFT) are first used to detect keypoints in images being fused. Subsequently this point set is reduced to remove outliers, using RANSAC (RANdom Sample Consensus) and finally the point sets are registered using CPD with non-rigid transformations. The registered images are then fused with a Contourlet based image fusion algorithm that makes use of a novel alpha blending and filtering technique to minimise artefacts. The thesis evaluates the performance of the algorithm in comparison to a number of state-of-the-art approaches, including the key commercial products available in the market at present, showing significantly improved subjective quality in the fused images. The final part of the thesis presents a novel approach to Vehicle Make & Model Recognition in CCTV video footage. CPD is used to effectively remove skew of vehicles detected as CCTV cameras are not specifically configured for the VMMR task and may capture vehicles at different approaching angles. A LESH (Local Energy Shape Histogram) feature based approach is used for vehicle make and model recognition with the novelty that temporal processing is used to improve reliability. A number of further algorithms are used to maximise the reliability of the final outcome. Experimental results are provided to prove that the proposed system demonstrates an accuracy in excess of 95% when tested on real CCTV footage with no prior camera calibration

    Telemedicine

    Get PDF
    Telemedicine is a rapidly evolving field as new technologies are implemented for example for the development of wireless sensors, quality data transmission. Using the Internet applications such as counseling, clinical consultation support and home care monitoring and management are more and more realized, which improves access to high level medical care in underserved areas. The 23 chapters of this book present manifold examples of telemedicine treating both theoretical and practical foundations and application scenarios

    MedLAN: Compact mobile computing system for wireless information access in emergency hospital wards

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.As the need for faster, safer and more efficient healthcare delivery increases, medical consultants seek new ways of implementing a high quality telemedical system, using innovative technology. Until today, teleconsultation (the most common application of Telemedicine) was performed by transferring the patient from the Accidents and Emergency ward, to a specially equipped room, or by moving large and heavy machinery to the place where the patient resided. Both these solutions were unpractical, uneconomical and potentially dangerous. At the same time wireless networks became increasingly useful in point-of-care areas such as hospitals, because of their ease of use, low cost of installation and increased flexibility. This thesis presents an integrated system called MedLAN dedicated for use inside the A&E hospital wards. Its purpose is to wirelessly support high-quality live video, audio, high-resolution still images and networks support from anywhere there is WLAN coverage. It is capable of transmitting all of the above to a consultant residing either inside or outside the hospital, or even to an external place, thorough the use of the Internet. To implement that, it makes use of the existing IEEE 802.11b wireless technology. Initially, this thesis demonstrates that for specific scenarios (such as when using WLANs), DICOM specifications should be adjusted to accommodate for the reduced WLAN bandwidth. Near lossless compression has been used to send still images through the WLANs and the results have been evaluated by a number of consultants to decide whether they retain their diagnostic value. The thesis further suggests improvements on the existing 802.11b protocol. In particular, as the typical hospital environment suffers from heavy RF reflections, it suggests that an alternative method of modulation (OFDM) can be embedded in the 802.11b hardware to reduce the multipath effect, increase the throughput and thus the video quality sent by the MedLAN system. Finally, realising that the trust between a patient and a doctor is fundamental this thesis proposes a series of simple actions aiming at securing the MedLAN system. Additionally, a concrete security system is suggested, that encapsulates the existing WEP security protocol, over IPSec

    Distributed video through telecommunication networks using fractal image compression techniques

    Get PDF
    The research presented in this thesis investigates the use of fractal compression techniques for a real time video distribution system. The motivation for this work was that the method has some useful properties which satisfy many requirements for video compression. In addition, as a novel technique, the fractal compression method has a great potential. In this thesis, we initially develop an understanding of the state of the art in image and video compression and describe the mathematical concepts and basic terminology of the fractal compression algorithm. Several schemes which aim to the improve of the algorithm, for still images are then examined. Amongst these, two novel contributions are described. The first is the partitioning of the image into sections which resulted insignificant reduction of the compression time. In the second, the use of the median metric as alternative to the RMS was considered but was not finally adopted, since the RMS proved to be a more efficient measure. The extension of the fractal compression algorithm from still images to image sequences is then examined and three different schemes to reduce the temporal redundancy of the video compression algorithm are described. The reduction in the execution time of the compression algorithm that can be obtained by the techniques described is significant although real time execution has not yet been achieved. Finally, the basic concepts of distributed programming and networks, as basic elements of a video distribution system, are presented and the hardware and software components of a fractal video distribution system are described. The implementation of the fractal compression algorithm on a TMS320C40 is also considered for speed benefits and it is found that a relatively large number of processors are needed for real time execution

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Combined Industry, Space and Earth Science Data Compression Workshop

    Get PDF
    The sixth annual Space and Earth Science Data Compression Workshop and the third annual Data Compression Industry Workshop were held as a single combined workshop. The workshop was held April 4, 1996 in Snowbird, Utah in conjunction with the 1996 IEEE Data Compression Conference, which was held at the same location March 31 - April 3, 1996. The Space and Earth Science Data Compression sessions seek to explore opportunities for data compression to enhance the collection, analysis, and retrieval of space and earth science data. Of particular interest is data compression research that is integrated into, or has the potential to be integrated into, a particular space or earth science data information system. Preference is given to data compression research that takes into account the scien- tist's data requirements, and the constraints imposed by the data collection, transmission, distribution and archival systems

    Structure out of sound

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1993.Vita.Includes bibliographical references (p. 155-170).Michael Jerome Hawley.Ph.D

    Automatic social role recognition and its application in structuring multiparty interactions

    Get PDF
    Automatic processing of multiparty interactions is a research domain with important applications in content browsing, summarization and information retrieval. In recent years, several works have been devoted to find regular patterns which speakers exhibit in a multiparty interaction also known as social roles. Most of the research in literature has generally focused on recognition of scenario specific formal roles. More recently, role coding schemes based on informal social roles have been proposed in literature, defining roles based on the behavior speakers have in the functioning of a small group interaction. Informal social roles represent a flexible classification scheme that can generalize across different scenarios of multiparty interaction. In this thesis, we focus on automatic recognition of informal social roles and exploit the influence of informal social roles on speaker behavior for structuring multiparty interactions. To model speaker behavior, we systematically explore various verbal and non verbal cues extracted from turn taking patterns, vocal expression and linguistic style. The influence of social roles on the behavior cues exhibited by a speaker is modeled using a discriminative approach based on conditional random fields. Experiments performed on several hours of meeting data reveal that classification using conditional random fields improves the role recognition performance. We demonstrate the effectiveness of our approach by evaluating it on previously unseen scenarios of multiparty interaction. Furthermore, we also consider whether formal roles and informal roles can be automatically predicted by the same verbal and nonverbal features. We exploit the influence of social roles on turn taking patterns to improve speaker diarization under distant microphone condition. Our work extends the Hidden Markov model (HMM)- Gaussian mixture model (GMM) speaker diarization system, and is based on jointly estimating both the speaker segmentation and social roles in an audio recording. We modify the minimum duration constraint in HMM-GMM diarization system by using role information to model the expected duration of speaker's turn. We also use social role n-grams as prior information to model speaker interaction patterns. Finally, we demonstrate the application of social roles for the problem of topic segmentation in meetings. We exploit our findings that social roles can dynamically change in conversations and use this information to predict topic changes in meetings. We also present an unsupervised method for topic segmentation which combines social roles and lexical cohesion. Experimental results show that social roles improve performance of both speaker diarization and topic segmentation

    Leveraging audio-visual speech effectively via deep learning

    Get PDF
    The rising popularity of neural networks, combined with the recent proliferation of online audio-visual media, has led to a revolution in the way machines encode, recognize, and generate acoustic and visual speech. Despite the ubiquity of naturally paired audio-visual data, only a limited number of works have applied recent advances in deep learning to leverage the duality between audio and video within this domain. This thesis considers the use of neural networks to learn from large unlabelled datasets of audio-visual speech to enable new practical applications. We begin by training a visual speech encoder that predicts latent features extracted from the corresponding audio on a large unlabelled audio-visual corpus. We apply the trained visual encoder to improve performance on lip reading in real-world scenarios. Following this, we extend the idea of video learning from audio by training a model to synthesize raw speech directly from raw video, without the need for text transcriptions. Remarkably, we find that this framework is capable of reconstructing intelligible audio from videos of new, previously unseen speakers. We also experiment with a separate speech reconstruction framework, which leverages recent advances in sequence modeling and spectrogram inversion to improve the realism of the generated speech. We then apply our research in video-to-speech synthesis to advance the state-of-the-art in audio-visual speech enhancement, by proposing a new vocoder-based model that performs particularly well under extremely noisy scenarios. Lastly, we aim to fully realize the potential of paired audio-visual data by proposing two novel frameworks that leverage acoustic and visual speech to train two encoders that learn from each other simultaneously. We leverage these pre-trained encoders for deepfake detection, speech recognition, and lip reading, and find that they consistently yield improvements over training from scratch.Open Acces

    MedLAN : compact mobile computing system for wireless information access in emergency hospital wards

    Get PDF
    As the need for faster, safer and more efficient healthcare delivery increases, medical consultants seek new ways of implementing a high quality telemedical system, using innovative technology. Until today, teleconsultation (the most common application of Telemedicine) was performed by transferring the patient from the Accidents and Emergency ward, to a specially equipped room, or by moving large and heavy machinery to the place where the patient resided. Both these solutions were unpractical, uneconomical and potentially dangerous. At the same time wireless networks became increasingly useful in point-of-care areas such as hospitals, because of their ease of use, low cost of installation and increased flexibility. This thesis presents an integrated system called MedLAN dedicated for use inside the A;E hospital wards. Its purpose is to wirelessly support high-quality live video, audio, high-resolution still images and networks support from anywhere there is WLAN coverage. It is capable of transmitting all of the above to a consultant residing either inside or outside the hospital, or even to an external place, thorough the use of the Internet. To implement that, it makes use of the existing IEEE 802.11b wireless technology. Initially, this thesis demonstrates that for specific scenarios (such as when using WLANs), DICOM specifications should be adjusted to accommodate for the reduced WLAN bandwidth. Near lossless compression has been used to send still images through the WLANs and the results have been evaluated by a number of consultants to decide whether they retain their diagnostic value. The thesis further suggests improvements on the existing 802.11b protocol. In particular, as the typical hospital environment suffers from heavy RF reflections, it suggests that an alternative method of modulation (OFDM) can be embedded in the 802.11b hardware to reduce the multipath effect, increase the throughput and thus the video quality sent by the MedLAN system. Finally, realising that the trust between a patient and a doctor is fundamental this thesis proposes a series of simple actions aiming at securing the MedLAN system. Additionally, a concrete security system is suggested, that encapsulates the existing WEP security protocol, over IPSec.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore