1,656 research outputs found

    Influence of study design on digital pathology image quality evaluation : the need to define a clinical task

    Get PDF
    Despite the current rapid advance in technologies for whole slide imaging, there is still no scientific consensus on the recommended methodology for image quality assessment of digital pathology slides. For medical images in general, it has been recommended to assess image quality in terms of doctors’ success rates in performing a specific clinical task while using the images (clinical image quality, cIQ). However, digital pathology is a new modality, and already identifying the appropriate task is difficult. In an alternative common approach, humans are asked to do a simpler task such as rating overall image quality (perceived image quality, pIQ), but that involves the risk of nonclinically relevant findings due to an unknown relationship between the pIQ and cIQ. In this study, we explored three different experimental protocols: (1) conducting a clinical task (detecting inclusion bodies), (2) rating image similarity and preference, and (3) rating the overall image quality. Additionally, within protocol 1, overall quality ratings were also collected (task-aware pIQ). The experiments were done by diagnostic veterinary pathologists in the context of evaluating the quality of hematoxylin and eosin-stained digital pathology slides of animal tissue samples under several common image alterations: additive noise, blurring, change in gamma, change in color saturation, and JPG compression. While the size of our experiments was small and prevents drawing strong conclusions, the results suggest the need to define a clinical task. Importantly, the pIQ data collected under protocols 2 and 3 did not always rank the image alterations the same as their cIQ from protocol 1, warning against using conventional pIQ to predict cIQ. At the same time, there was a correlation between the cIQ and task-aware pIQ ratings from protocol 1, suggesting that the clinical experiment context (set by specifying the clinical task) may affect human visual attention and bring focus to their criteria of image quality. Further research is needed to assess whether and for which purposes (e.g., preclinical testing) task-aware pIQ ratings could substitute cIQ for a given clinical task

    Vision-Based Assessment of Parkinsonism and Levodopa-Induced Dyskinesia with Deep Learning Pose Estimation

    Full text link
    Objective: To apply deep learning pose estimation algorithms for vision-based assessment of parkinsonism and levodopa-induced dyskinesia (LID). Methods: Nine participants with Parkinson's disease (PD) and LID completed a levodopa infusion protocol, where symptoms were assessed at regular intervals using the Unified Dyskinesia Rating Scale (UDysRS) and Unified Parkinson's Disease Rating Scale (UPDRS). A state-of-the-art deep learning pose estimation method was used to extract movement trajectories from videos of PD assessments. Features of the movement trajectories were used to detect and estimate the severity of parkinsonism and LID using random forest. Communication and drinking tasks were used to assess LID, while leg agility and toe tapping tasks were used to assess parkinsonism. Feature sets from tasks were also combined to predict total UDysRS and UPDRS Part III scores. Results: For LID, the communication task yielded the best results for dyskinesia (severity estimation: r = 0.661, detection: AUC = 0.930). For parkinsonism, leg agility had better results for severity estimation (r = 0.618), while toe tapping was better for detection (AUC = 0.773). UDysRS and UPDRS Part III scores were predicted with r = 0.741 and 0.530, respectively. Conclusion: This paper presents the first application of deep learning for vision-based assessment of parkinsonism and LID and demonstrates promising performance for the future translation of deep learning to PD clinical practices. Significance: The proposed system provides insight into the potential of computer vision and deep learning for clinical application in PD.Comment: 8 pages, 1 figure. Under revie

    Quantifying neurodegeneration from medical images with machine learning and graph theory

    Get PDF
    Neurodegeneration (or brain atrophy) is part of the pathological cascade of Alzheimer’s disease (AD) and is strongly associated with cognitive decline. In clinics, atrophy is measured through visual assessments of specific brain regions on medical images according to established rating scales. In this thesis, we developed a model based on recurrent convolutional neural networks (AVRA: Automatic visual ratings of atrophy) that could predict scores from magnetic resonance images (MRI) according to commonly used clinical rating scales, namely: Scheltens’ scale for medial temporal atrophy (MTA), Pasquier’s frontal subscale of global cortical atrophy (GCA-F), and Koedam’s posterior atrophy (PA) scale. AVRA was trained on over 2000 images rated by a single neuroradiologist and demonstrated similar inter-rater agreement levels on all three scales to what has reported between two "human raters" in previous studies. We further applied different versions of AVRA, trained systematically on data with different levels of heterogeneity, in external data from multiple European memory clinics. We observed a general performance drop in the out-of-distribution (OOD) data compared to test sets sampled from the same cohort as the training data. By training AVRA on data from multiple sources, we show that the performance in external cohorts generally increased. AVRA demonstrated a notably low agreement in one memory clinic, despite good quality images, which suggests that it may be challenging to assess how well a machine learning model generalizes to OOD data. For additional validation of our model, we compared AVRA’s MTA ratings to two external radiologists’ and the volumes of the hippocampi and inferior lateral ventricles. The images came from a longitudinal cohort that comprised individuals with subjective cognitive decline (SCD) and mild cognitive impairment (MCI) followed up over six years. AVRA showed substantial agreement to one of the radiologists, and lower rating agreement to the other. The two radiologists also showed low agreement between each other. All sets of ratings were strongly associated with the subcortical volumes, suggesting that all three raters were reliable. We further observed that individuals with SCD and (probably) underlying AD pathology had a faster MTA progression than MCI patients with non-AD biomarker profile. Finally, we evaluated a method to quantify patterns of atrophy through the use of graph theory. We compared structural gray matter networks between groups of healthy controls and AD patients, con- structed from different subsamples and with different network construc- tion methods. Our experiments suggested that structural gray matter networks may not be very stable. Our networks required more than 150 subjects/group to show convergence in the included network properties, which is a greater sample size than used in the majority of the studies applying these methods. The different graph construction methods did not yield consistent differences between the control and AD networks, which may explain why findings have been inconsistent across previous studies. To conclude, we demonstrated that a machine learning model can successfully learn to mimic a radiologist’s assessment of atrophy without intra-rater variability. The challenge going forward is to assert model consistency across clinics, scanners and image quality—nuisances that humans are better at ignoring than deep learning models

    Describing Subjective Experiment Consistency by pp-Value P-P Plot

    Full text link
    There are phenomena that cannot be measured without subjective testing. However, subjective testing is a complex issue with many influencing factors. These interplay to yield either precise or incorrect results. Researchers require a tool to classify results of subjective experiment as either consistent or inconsistent. This is necessary in order to decide whether to treat the gathered scores as quality ground truth data. Knowing if subjective scores can be trusted is key to drawing valid conclusions and building functional tools based on those scores (e.g., algorithms assessing the perceived quality of multimedia materials). We provide a tool to classify subjective experiment (and all its results) as either consistent or inconsistent. Additionally, the tool identifies stimuli having irregular score distribution. The approach is based on treating subjective scores as a random variable coming from the discrete Generalized Score Distribution (GSD). The GSD, in combination with a bootstrapped G-test of goodness-of-fit, allows to construct pp-value P-P plot that visualizes experiment's consistency. The tool safeguards researchers from using inconsistent subjective data. In this way, it makes sure that conclusions they draw and tools they build are more precise and trustworthy. The proposed approach works in line with expectations drawn solely on experiment design descriptions of 21 real-life multimedia quality subjective experiments.Comment: 11 pages, 3 figures. Accepted to 28th ACM International Conference on Multimedia (MM '20). For associated data sets, source codes and documentation, see https://github.com/Qub3k/subjective-exp-consistency-chec

    Quality-aware Content Adaptation in Digital Video Streaming

    Get PDF
    User-generated video has attracted a lot of attention due to the success of Video Sharing Sites such as YouTube and Online Social Networks. Recently, a shift towards live consumption of these videos is observable. The content is captured and instantly shared over the Internet using smart mobile devices such as smartphones. Large-scale platforms arise such as YouTube.Live, YouNow or Facebook.Live which enable the smartphones of users to livestream to the public. These platforms achieve the distribution of tens of thousands of low resolution videos to remote viewers in parallel. Nonetheless, the providers are not capable to guarantee an efficient collection and distribution of high-quality video streams. As a result, the user experience is often degraded, and the needed infrastructure installments are huge. Efficient methods are required to cope with the increasing demand for these video streams; and an understanding is needed how to capture, process and distribute the videos to guarantee a high-quality experience for viewers. This thesis addresses the quality awareness of user-generated videos by leveraging the concept of content adaptation. Two types of content adaptation, the adaptive video streaming and the video composition, are discussed in this thesis. Then, a novel approach for the given scenario of a live upload from mobile devices, the processing of video streams and their distribution is presented. This thesis demonstrates that content adaptation applied to each step of this scenario, ranging from the upload to the consumption, can significantly improve the quality for the viewer. At the same time, if content adaptation is planned wisely, the data traffic can be reduced while keeping the quality for the viewers high. The first contribution of this thesis is a better understanding of the perceived quality in user-generated video and its influencing factors. Subjective studies are performed to understand what affects the human perception, leading to the first of their kind quality models. Developed quality models are used for the second contribution of this work: novel quality assessment algorithms. A unique attribute of these algorithms is the usage of multiple features from different sensors. Whereas classical video quality assessment algorithms focus on the visual information, the proposed algorithms reduce the runtime by an order of magnitude when using data from other sensors in video capturing devices. Still, the scalability for quality assessment is limited by executing algorithms on a single server. This is solved with the proposed placement and selection component. It allows the distribution of quality assessment tasks to mobile devices and thus increases the scalability of existing approaches by up to 33.71% when using the resources of only 15 mobile devices. These three contributions are required to provide a real-time understanding of the perceived quality of the video streams produced on mobile devices. The upload of video streams is the fourth contribution of this work. It relies on content and mechanism adaptation. The thesis introduces the first prototypically evaluated adaptive video upload protocol (LiViU) which transcodes multiple video representations in real-time and copes with changing network conditions. In addition, a mechanism adaptation is integrated into LiViU to react to changing application scenarios such as streaming high-quality videos to remote viewers or distributing video with a minimal delay to close-by recipients. A second type of content adaptation is discussed in the fifth contribution of this work. An automatic video composition application is presented which enables live composition from multiple user-generated video streams. The proposed application is the first of its kind, allowing the in-time composition of high-quality video streams by inspecting the quality of individual video streams, recording locations and cinematographic rules. As a last contribution, the content-aware adaptive distribution of video streams to mobile devices is introduced by the Video Adaptation Service (VAS). The VAS analyzes the video content streamed to understand which adaptations are most beneficial for a viewer. It maximizes the perceived quality for each video stream individually and at the same time tries to produce as little data traffic as possible - achieving data traffic reduction of more than 80%

    EEG-based classification of video quality perception using steady state visual evoked potentials (SSVEPs)

    Get PDF
    Objective. Recent studies exploit the neural signal recorded via electroencephalography (EEG) to get a more objective measurement of perceived video quality. Most of these studies capitalize on the event-related potential component P3. We follow an alternative approach to the measurement problem investigating steady state visual evoked potentials (SSVEPs) as EEG correlates of quality changes. Unlike the P3, SSVEPs are directly linked to the sensory processing of the stimuli and do not require long experimental sessions to get a sufficient signal-to-noise ratio. Furthermore, we investigate the correlation of the EEG-based measures with the outcome of the standard behavioral assessment. Approach. As stimulus material, we used six gray-level natural images in six levels of degradation that were created by coding the images with the HM10.0 test model of the high efficiency video coding (H.265/MPEG-HEVC) using six different compression rates. The degraded images were presented in rapid alternation with the original images. In this setting, the presence of SSVEPs is a neural marker that objectively indicates the neural processing of the quality changes that are induced by the video coding. We tested two different machine learning methods to classify such potentials based on the modulation of the brain rhythm and on time-locked components, respectively. Main results. Results show high accuracies in classification of the neural signal over the threshold of the perception of the quality changes. Accuracies significantly correlate with the mean opinion scores given by the participants in the standardized degradation category rating quality assessment of the same group of images. Significance. The results show that neural assessment of video quality based on SSVEPs is a viable complement of the behavioral one and a significantly fast alternative to methods based on the P3 component.BMBF, 01GQ0850, Bernstein Fokus Neurotechnologie - Nichtinvasive Neurotechnologie für Mensch-Maschine Interaktio
    • …
    corecore