97 research outputs found
Categorical colormap optimization with visualization case studies
Mapping a set of categorical values to different colors is an elementary technique in data visualization. Users of visualization software routinely rely on the default colormaps provided by a system, or colormaps suggested by software such as ColorBrewer.
In practice, users often have to select a set of colors in a semantically meaningful way (e.g., based on conventions, color metaphors,
and logological associations), and consequently would like to ensure their perceptual differentiation is optimized. In this paper, we
present an algorithmic approach for maximizing the perceptual distances among a set of given colors. We address two technical problems in optimization, i.e., (i) the phenomena of local maxima that halt the optimization too soon, and (ii) the arbitrary reassignment
of colors that leads to the loss of the original semantic association. We paid particular attention to different types of constraints that
users may wish to impose during the optimization process. To demonstrate the effectiveness of this work, we tested this technique in
two case studies. To reach out to a wider range of users, we also developed a web application called Colourmap Hospital
Supervised classification of bradykinesia for Parkinson’s disease diagnosis from smartphone videos
Slowness of movement, known as bradykinesia,
in an important early symptom of Parkinson’s disease. This symptom is currently assessed subjectively by clinical experts. However, expert assessment has been shown to be subject to inter-rater variability. We propose a low-cost, contactless system using smartphone videos to automatically determine
the presence of bradykinesia. Using 70 videos recorded in a pilot study, we predict the presence of bradykinesia with an
estimated test accuracy of 0.79 and the presence of Parkinson’s disease diagnosis with estimated test accuracy 0.63. Even on
a small set of pilot data this accuracy is comparable to that recorded by blinded human experts
A novel robust reversible watermarking scheme for protecting authenticity and integrity of medical images
It is of great importance in telemedicine to protect authenticity and
integrity of medical images. They are mainly addressed by two technologies, which
are region of interest (ROI) lossless watermarking and reversible watermarking.
However, the former causes biases on diagnosis by distorting region of none interest
(RONI) and introduces security risks by segmenting image spatially for watermark
embedding. The latter fails to provide reliable recovery function for the tampered
areas when protecting image integrity. To address these issues, a novel robust
reversible watermarking scheme is proposed in this paper. In our scheme, a reversible
watermarking method is designed based on recursive dither modulation (RDM) to
avoid biases on diagnosis. In addition, RDM is combined with Slantlet transform and
singular value decomposition to provide a reliable solution for protecting image
authenticity. Moreover, ROI and RONI are divided for watermark generation to
design an effective recovery function under limited embedding capacity. Finally,
watermarks are embedded into whole medical images to avoid the risks caused by
segmenting image spatially. Experimental results demonstrate that our proposed
lossless scheme not only has remarkable imperceptibility and sufficient robustness,
but also provides reliable authentication, tamper detection, localization and recovery
functions, which outperforms existing schemes for protecting medical image
Facial expression recognition in dynamic sequences: An integrated approach
Automatic facial expression analysis aims to analyse human facial expressions and classify them into discrete categories. Methods based on existing
work are reliant on extracting information from video sequences and employ either some form of subjective thresholding of dynamic information or
attempt to identify the particular individual frames in which the expected
behaviour occurs. These methods are inefficient as they require either additional subjective information, tedious manual work or fail to take advantage
of the information contained in the dynamic signature from facial movements
for the task of expression recognition.
In this paper, a novel framework is proposed for automatic facial expression analysis which extracts salient information from video sequences
but does not rely on any subjective preprocessing or additional user-supplied
information to select frames with peak expressions. The experimental framework demonstrates that the proposed method outperforms static expression
recognition systems in terms of recognition rate. The approach does not rely on action units (AUs) and therefore, eliminates errors which are otherwise
propagated to the final result due to incorrect initial identification of AUs.
The proposed framework explores a parametric space of over 300 dimensions
and is tested with six state-of-the-art machine learning techniques. Such
robust and extensive experimentation provides an important foundation for
the assessment of the performance for future work. A further contribution
of the paper is offered in the form of a user study. This was conducted in
order to investigate the correlation between human cognitive systems and the
proposed framework for the understanding of human emotion classification
and the reliability of public databases
Rainy environment identification based on channel state information for autonomous vehicles
We introduce an innovative deep learning approach specifically designed for the environment identification of intelligent vehicles under rainy conditions in this paper. In the construction of wireless vehicular communication networks, an innovative approach is proposed that incorporates additional multipath components to simulate the impact of raindrop scattering on the vehicle-to-vehicle (V2V) channel, thereby emulating the channel characteristics of vehicular environments under rainy conditions and an equalization strategy in OFDM-based systems is proposed at the receiver end to counteract channel distortion. Then, a rainy environment identification method for autonomous vehicles is proposed. The core of this method lies in utilizing the Channel State Information (CSI) shared within the vehicular network to accurately identify the diverse rainy environments in which the vehicle operates without relying on traditional sensors. The environmental identification task is considered as a multi-class classification problem and a dedicated Convolutional Neural Network (CNN) model is proposed. This CNN model uses the CSI estimated from CAM exchanged in vehicle-to-vehicle (V2V) communication as training features. Simulation results showed that our method achieved an accuracy rate of 95.7% in recognizing various rainy environments, which significantly surpasses existing classical classification models. Moreover, it only took microseconds to predict with high accuracy, surpassing the performance limitations of traditional sensing systems under adverse weather conditions. This breakthrough ensures that intelligent vehicles can rapidly and accurately adjust driving parameters even in complex weather conditions like rain to autonomous drive safely and reliably.</p
A multiscale framework for capturing oscillation dynamics of autonomous vehicles in data-driven car-following models
Recent advancements in machine learning-based car-following models have shown promise in leveraging vehicle trajectory data to accurately reproduce real-world driving behaviour in simulations. However, existing data-driven car-following models only explicitly consider individual vehicle trajectories for model training, overlooking broader traffic phenomena. This limitation hinders their ability to accurately capture the oscillation dynamics of vehicle platoons, which are critical for simulating and evaluating mesoscopic and macroscopic traffic phenomena such as congestion propagation, stop-and-go, string stability and hysteresis. To fill this gap, our study introduces a hybrid physical model-driven and data-driven framework, Multiscale Car-Following (MultiscaleCF), aimed at explicitly capturing mesoscopic oscillation dynamics within data-driven car-following models. MultiscaleCF offers two methodological advancements in the development of machine learning-based car-following models: the recursive simulation of a platoon of vehicles to reduce compound error and mesoscopic feature engineering using domain-specific attributes. Evaluated using the OpenACC database, the MultiscaleCF framework exhibited a simultaneous improvement in both microscopic and mesoscopic traffic simulation patterns. It outperforms the baseline model in microscopic trajectory prediction accuracy by up to 21%. For oscillation dynamics, it outperforms the baseline model by 42%, 32%, 29% and 42% in duration, amplitude, intensity, and hysteresis magnitude, respectively. </p
On the limitations of visual-semantic embedding networks for image-to-text information retrieval
Visual-semantic embedding (VSE) networks create joint image–text representations to map images and texts in a shared embedding space to enable various information retrieval-related tasks, such as image–text retrieval, image captioning, and visual question answering. The most recent state-of-the-art VSE-based networks are: VSE++, SCAN, VSRN, and UNITER. This study evaluates the performance of those VSE networks for the task of image-to-text retrieval and identifies and analyses their strengths and limitations to guide future research on the topic. The experimental results on Flickr30K revealed that the pre-trained network, UNITER, achieved 61.5% on average Recall@5 for the task of retrieving all relevant descriptions. The traditional networks, VSRN, SCAN, and VSE++, achieved 50.3%, 47.1%, and 29.4% on average Recall@5, respectively, for the same task. An additional analysis was performed on image–text pairs from the top 25 worst-performing classes using a subset of the Flickr30K-based dataset to identify the limitations of the performance of the best-performing models, VSRN and UNITER. These limitations are discussed from the perspective of image scenes, image objects, image semantics, and basic functions of neural networks. This paper discusses the strengths and limitations of VSE networks to guide further research into the topic of using VSE networks for cross-modal information retrieval tasks
A novel attention model across heterogeneous features for stuttering event detection
Stuttering is a prevalent speech disorder affecting millions worldwide. To provide an automatic and objective stuttering assessment tool, Stuttering Event Detection (SED) is under extensive investigation for advanced speech research and applications. Despite significant progress achieved by various machine learning and deep learning models, SED directly from speech signal is still challenging due to stuttering speech’s heterogeneous and overlapped nature. This paper presents a novel SED approach using multi-feature fusion and attention mechanisms. The model utilises multiple acoustic features extracted based on different pitch, time-domain, frequency domain, and automatic speech recognition feature to detect stuttering core behaviours more accurately and reliably. In addition, we exploit both spatial and temporal attention mechanisms as well as Bidirectional Long Short-Term Memory (BI-LSTM) modules to learn better representations to improve the SED performance. The experimental evaluation and analysis convincingly demonstrate that our proposed model surpasses the state-of-the-art models on two popular stuttering datasets, with 4% and 3% overall F1 scores, respectively. The superior results indicate the consistency of our proposed method, supported by both multi-feature and attention mechanisms in different stuttering events datasets.</p
Stuttering detection using atrous convolutional neural networks
Stuttering is a neurodevelopmental speech disorder that affects 70 million people worldwide, approximately 1% of the whole population. People who stutter (PWS) have common speech symptoms such as block, interjection, repetition, and prolongation. The speech-language pathologists (SLPs) commonly observe these four groups of symptoms to evaluate stuttering severity. The evaluation process is tedious and time-consuming for (SLP) and (PWS). Therefore, this paper proposes a new model for stuttering events detection that may help (SLP) to evaluate stuttering severity. Our model is based on a log mel spectrogram and 2D atrous convolutional network designed to learn spectral and temporal features. We rigorously evaluate the performance of our model on two stuttering datasets (UCLASS and FluencyBank) using common speech metrics, i.e. F1-score, recall, and the area under the curve (AUC). Our experimental results indicate that our model outperforms state-of-the-art methods in prolongation with an F1 of 52% and 44.5% on the UCLASS and FluencyBank datasets, respectively. Also, we gain 5% and 3% margins on the UCLASS and FluencyBank datasets for fluent class
A multi-modal transformer approach for football event classification
Video understanding has been enhanced by the use of multi-modal networks. However, recent multi-modal video analysis models have limited applicability to sports videos due to their specialised nature. This paper proposes a novel attention-based multi-modal neural network for sports event classification featuring a multi-stage fusion training strategy. The proposed multi-modal neural network integrates three modalities, including an image sequence modality, an audio modality and a newly proposed sports formation modality, to improve the sports video classification performance. Empirical results show that the proposed model outperforms the state-of-the-art transformer-based video method by 4.43% on top-1 accuracy on Soccernet-V2 dataset.</p
- …