88 research outputs found

    Categorical colormap optimization with visualization case studies

    Get PDF
    Mapping a set of categorical values to different colors is an elementary technique in data visualization. Users of visualization software routinely rely on the default colormaps provided by a system, or colormaps suggested by software such as ColorBrewer. In practice, users often have to select a set of colors in a semantically meaningful way (e.g., based on conventions, color metaphors, and logological associations), and consequently would like to ensure their perceptual differentiation is optimized. In this paper, we present an algorithmic approach for maximizing the perceptual distances among a set of given colors. We address two technical problems in optimization, i.e., (i) the phenomena of local maxima that halt the optimization too soon, and (ii) the arbitrary reassignment of colors that leads to the loss of the original semantic association. We paid particular attention to different types of constraints that users may wish to impose during the optimization process. To demonstrate the effectiveness of this work, we tested this technique in two case studies. To reach out to a wider range of users, we also developed a web application called Colourmap Hospital

    Supervised classification of bradykinesia for Parkinson’s disease diagnosis from smartphone videos

    Get PDF
    Slowness of movement, known as bradykinesia, in an important early symptom of Parkinson’s disease. This symptom is currently assessed subjectively by clinical experts. However, expert assessment has been shown to be subject to inter-rater variability. We propose a low-cost, contactless system using smartphone videos to automatically determine the presence of bradykinesia. Using 70 videos recorded in a pilot study, we predict the presence of bradykinesia with an estimated test accuracy of 0.79 and the presence of Parkinson’s disease diagnosis with estimated test accuracy 0.63. Even on a small set of pilot data this accuracy is comparable to that recorded by blinded human experts

    A novel robust reversible watermarking scheme for protecting authenticity and integrity of medical images

    Get PDF
    It is of great importance in telemedicine to protect authenticity and integrity of medical images. They are mainly addressed by two technologies, which are region of interest (ROI) lossless watermarking and reversible watermarking. However, the former causes biases on diagnosis by distorting region of none interest (RONI) and introduces security risks by segmenting image spatially for watermark embedding. The latter fails to provide reliable recovery function for the tampered areas when protecting image integrity. To address these issues, a novel robust reversible watermarking scheme is proposed in this paper. In our scheme, a reversible watermarking method is designed based on recursive dither modulation (RDM) to avoid biases on diagnosis. In addition, RDM is combined with Slantlet transform and singular value decomposition to provide a reliable solution for protecting image authenticity. Moreover, ROI and RONI are divided for watermark generation to design an effective recovery function under limited embedding capacity. Finally, watermarks are embedded into whole medical images to avoid the risks caused by segmenting image spatially. Experimental results demonstrate that our proposed lossless scheme not only has remarkable imperceptibility and sufficient robustness, but also provides reliable authentication, tamper detection, localization and recovery functions, which outperforms existing schemes for protecting medical image

    Facial expression recognition in dynamic sequences: An integrated approach

    Get PDF
    Automatic facial expression analysis aims to analyse human facial expressions and classify them into discrete categories. Methods based on existing work are reliant on extracting information from video sequences and employ either some form of subjective thresholding of dynamic information or attempt to identify the particular individual frames in which the expected behaviour occurs. These methods are inefficient as they require either additional subjective information, tedious manual work or fail to take advantage of the information contained in the dynamic signature from facial movements for the task of expression recognition. In this paper, a novel framework is proposed for automatic facial expression analysis which extracts salient information from video sequences but does not rely on any subjective preprocessing or additional user-supplied information to select frames with peak expressions. The experimental framework demonstrates that the proposed method outperforms static expression recognition systems in terms of recognition rate. The approach does not rely on action units (AUs) and therefore, eliminates errors which are otherwise propagated to the final result due to incorrect initial identification of AUs. The proposed framework explores a parametric space of over 300 dimensions and is tested with six state-of-the-art machine learning techniques. Such robust and extensive experimentation provides an important foundation for the assessment of the performance for future work. A further contribution of the paper is offered in the form of a user study. This was conducted in order to investigate the correlation between human cognitive systems and the proposed framework for the understanding of human emotion classification and the reliability of public databases

    On the limitations of visual-semantic embedding networks for image-to-text information retrieval

    No full text
    Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed that the pre-trained network, UNITER, achieved 61.5% on average Recall@5 for the task of retrieving all relevant descriptions. The traditional networks, VSRN, SCAN, and VSE++, achieved 50.3%, 47.1%, and 29.4% on average Recall@5, respectively, for the same task. An additional analysis was performed on image–text pairs from the top 25 worst-performing classes using a subset of the Flickr30K-based dataset to identify the limitations of the performance of the best-performing models, VSRN and UNITER. These limitations are discussed from the perspective of image scenes, image objects, image semantics, and basic functions of neural networks. This paper discusses the strengths and limitations of VSE networks to guide further research into the topic of using VSE networks for cross-modal information retrieval tasks

    A novel attention model across heterogeneous features for stuttering event detection

    No full text
    Stuttering is a prevalent speech disorder affecting millions worldwide. To provide an automatic and objective stuttering assessment tool, Stuttering Event Detection (SED) is under extensive investigation for advanced speech research and applications. Despite significant progress achieved by various machine learning and deep learning models, SED directly from speech signal is still challenging due to stuttering speech’s heterogeneous and overlapped nature. This paper presents a novel SED approach using multi-feature fusion and attention mechanisms. The model utilises multiple acoustic features extracted based on different pitch, time-domain, frequency domain, and automatic speech recognition feature to detect stuttering core behaviours more accurately and reliably. In addition, we exploit both spatial and temporal attention mechanisms as well as Bidirectional Long Short-Term Memory (BI-LSTM) modules to learn better representations to improve the SED performance. The experimental evaluation and analysis convincingly demonstrate that our proposed model surpasses the state-of-the-art models on two popular stuttering datasets, with 4% and 3% overall F1 scores, respectively. The superior results indicate the consistency of our proposed method, supported by both multi-feature and attention mechanisms in different stuttering events datasets.</p

    Stuttering detection using atrous convolutional neural networks

    No full text
    Stuttering is a neurodevelopmental speech disorder that affects 70 million people worldwide, approximately 1% of the whole population. People who stutter (PWS) have common speech symptoms such as block, interjection, repetition, and prolongation. The speech-language pathologists (SLPs) commonly observe these four groups of symptoms to evaluate stuttering severity. The evaluation process is tedious and time-consuming for (SLP) and (PWS). Therefore, this paper proposes a new model for stuttering events detection that may help (SLP) to evaluate stuttering severity. Our model is based on a log mel spectrogram and 2D atrous convolutional network designed to learn spectral and temporal features. We rigorously evaluate the performance of our model on two stuttering datasets (UCLASS and FluencyBank) using common speech metrics, i.e. F1-score, recall, and the area under the curve (AUC). Our experimental results indicate that our model outperforms state-of-the-art methods in prolongation with an F1 of 52% and 44.5% on the UCLASS and FluencyBank datasets, respectively. Also, we gain 5% and 3% margins on the UCLASS and FluencyBank datasets for fluent class

    A multi-modal transformer approach for football event classification

    No full text
    Video understanding has been enhanced by the use of multi-modal networks. However, recent multi-modal video analysis models have limited applicability to sports videos due to their specialised nature. This paper proposes a novel attention-based multi-modal neural network for sports event classification featuring a multi-stage fusion training strategy. The proposed multi-modal neural network integrates three modalities, including an image sequence modality, an audio modality and a newly proposed sports formation modality, to improve the sports video classification performance. Empirical results show that the proposed model outperforms the state-of-the-art transformer-based video method by 4.43% on top-1 accuracy on Soccernet-V2 dataset.</p

    Spectrogram transformers for audio classification

    No full text
    Audio classification is an important task in the machine learning field with a wide range of applications. Since the last decade, deep learning based methods have been widely used and the transformer-based models are becoming new paradigm for audio classification. In this paper, we present Spectrogram Transformers, which are a group of transformer-based models for audio classification. Based on the fundamental semantics of audio spectrogram, we design two mechanisms to extract temporal and frequency features from audio spectrogram, named time-dimension sampling and frequency-dimension sampling. These discriminative representations are then enhanced by various combinations of attention block architectures, including Tempo-ral Only (TO) attention, Temporal-Frequency sequential (TFS) attention, Temporal-Frequency Parallel (TFP) attention, and Two-stream Temporal-Frequency (TSTF) attention, to extract the sound record signatures to serve the classification task. Our experiments demonstrate that these Transformer models outper-form the state-of-the-art methods on ESC-50 dataset without pre-training stage. Furthermore, our method also shows great efficiency compared with other leading methods

    Current advances on deep learning-based human action recognition from videos: a survey

    No full text
    Human action recognition (HAR) from RGB videos is essential and challenging in the computer vision field due to its wide range of real-world applications in fields of human behaviour analysis, human-computer interactions, robotics and surveillance etc. Since the breakthrough and fast development of deep learning technology, the performance of HAR based on deep neural networks has been significantly improved in this decade. In this survey, we discuss the growing use of deep learning for HAR, such as representative two-stream and 3D CNNs, and particularly highlight most recent success achieved by using attention and transformers. We will provide our perspective on the new trend of designing innovative deep learning methods. In addition, we also present popular HAR datasets developed in recent years and benchmark accuracy achieved by current advancement in deep learning. This draws research attention to the challenges of HAR by identifying performance gaps when applying the deep learning methods on large HAR datasets. Further, this survey sheds light on the development of new methods and facilitates qualitative comparison with state of the art
    • …
    corecore