354 research outputs found

    Enhanced Gradient-Based Local Feature Descriptors by Saliency Map for Egocentric Action Recognition

    Get PDF
    Egocentric video analysis is an important tool in healthcare that serves a variety of purposes, such as memory aid systems and physical rehabilitation, and feature extraction is an indispensable process for such analysis. Local feature descriptors have been widely applied due to their simple implementation and reasonable efficiency and performance in applications. This paper proposes an enhanced spatial and temporal local feature descriptor extraction method to boost the performance of action classification. The approach allows local feature descriptors to take advantage of saliency maps, which provide insights into visual attention. The effectiveness of the proposed method was validated and evaluated by a comparative study, whose results demonstrated an improved accuracy of around 2%

    An improved classification approach for echocardiograms embedding temporal information

    Get PDF
    Cardiovascular disease is an umbrella term for all diseases of the heart. At present, computer-aided echocardiogram diagnosis is becoming increasingly beneficial. For echocardiography, different cardiac views can be acquired depending on the location and angulations of the ultrasound transducer. Hence, the automatic echocardiogram view classification is the first step for echocardiogram diagnosis, especially for computer-aided system and even for automatic diagnosis in the future. In addition, heart views classification makes it possible to label images especially for large-scale echo videos, provide a facility for database management and collection. This thesis presents a framework for automatic cardiac viewpoints classification of echocardiogram video data. In this research, we aim to overcome the challenges facing this investigation while analyzing, recognizing and classifying echocardiogram videos from 3D (2D spatial and 1D temporal) space. Specifically, we extend 2D KAZE approach into 3D space for feature detection and propose a histogram of acceleration as feature descriptor. Subsequently, feature encoding follows before the application of SVM to classify echo videos. In addition, comparison with the state of the art methodologies also takes place, including 2D SIFT, 3D SIFT, and optical flow technique to extract temporal information sustained in the video images. As a result, the performance of 2D KAZE, 2D KAZE with Optical Flow, 3D KAZE, Optical Flow, 2D SIFT and 3D SIFT delivers accuracy rate of 89.4%, 84.3%, 87.9%, 79.4%, 83.8% and 73.8% respectively for the eight view classes of echo videos

    Irish Machine Vision and Image Processing Conference Proceedings 2017

    Get PDF

    Machine Learning for Multimedia Communications

    Get PDF
    Machine learning is revolutionizing the way multimedia information is processed and transmitted to users. After intensive and powerful training, some impressive efficiency/accuracy improvements have been made all over the transmission pipeline. For example, the high model capacity of the learning-based architectures enables us to accurately model the image and video behavior such that tremendous compression gains can be achieved. Similarly, error concealment, streaming strategy or even user perception modeling have widely benefited from the recent learningoriented developments. However, learning-based algorithms often imply drastic changes to the way data are represented or consumed, meaning that the overall pipeline can be affected even though a subpart of it is optimized. In this paper, we review the recent major advances that have been proposed all across the transmission chain, and we discuss their potential impact and the research challenges that they raise

    Video Quality Metrics

    Get PDF

    Reduced reference image and video quality assessments: review of methods

    Get PDF
    With the growing demand for image and video-based applications, the requirements of consistent quality assessment metrics of image and video have increased. Different approaches have been proposed in the literature to estimate the perceptual quality of images and videos. These approaches can be divided into three main categories; full reference (FR), reduced reference (RR) and no-reference (NR). In RR methods, instead of providing the original image or video as a reference, we need to provide certain features (i.e., texture, edges, etc.) of the original image or video for quality assessment. During the last decade, RR-based quality assessment has been a popular research area for a variety of applications such as social media, online games, and video streaming. In this paper, we present review and classification of the latest research work on RR-based image and video quality assessment. We have also summarized different databases used in the field of 2D and 3D image and video quality assessment. This paper would be helpful for specialists and researchers to stay well-informed about recent progress of RR-based image and video quality assessment. The review and classification presented in this paper will also be useful to gain understanding of multimedia quality assessment and state-of-the-art approaches used for the analysis. In addition, it will help the reader select appropriate quality assessment methods and parameters for their respective applications
    corecore