1,664 research outputs found
Artificial Intelligence for Suicide Assessment using Audiovisual Cues: A Review
Death by suicide is the seventh leading death cause worldwide. The recent
advancement in Artificial Intelligence (AI), specifically AI applications in
image and voice processing, has created a promising opportunity to
revolutionize suicide risk assessment. Subsequently, we have witnessed
fast-growing literature of research that applies AI to extract audiovisual
non-verbal cues for mental illness assessment. However, the majority of the
recent works focus on depression, despite the evident difference between
depression symptoms and suicidal behavior and non-verbal cues. This paper
reviews recent works that study suicide ideation and suicide behavior detection
through audiovisual feature analysis, mainly suicidal voice/speech acoustic
features analysis and suicidal visual cues. Automatic suicide assessment is a
promising research direction that is still in the early stages. Accordingly,
there is a lack of large datasets that can be used to train machine learning
and deep learning models proven to be effective in other, similar tasks.Comment: Manuscript submitted to Arificial Intelligence Reviews (2022
The Verbal and Non Verbal Signals of Depression -- Combining Acoustics, Text and Visuals for Estimating Depression Level
Depression is a serious medical condition that is suffered by a large number
of people around the world. It significantly affects the way one feels, causing
a persistent lowering of mood. In this paper, we propose a novel
attention-based deep neural network which facilitates the fusion of various
modalities. We use this network to regress the depression level. Acoustic, text
and visual modalities have been used to train our proposed network. Various
experiments have been carried out on the benchmark dataset, namely, Distress
Analysis Interview Corpus - a Wizard of Oz (DAIC-WOZ). From the results, we
empirically justify that the fusion of all three modalities helps in giving the
most accurate estimation of depression level. Our proposed approach outperforms
the state-of-the-art by 7.17% on root mean squared error (RMSE) and 8.08% on
mean absolute error (MAE).Comment: 10 pages including references, 2 figure
MLGaze: Machine Learning-Based Analysis of Gaze Error Patterns in Consumer Eye Tracking Systems
Analyzing the gaze accuracy characteristics of an eye tracker is a critical
task as its gaze data is frequently affected by non-ideal operating conditions
in various consumer eye tracking applications. In this study, gaze error
patterns produced by a commercial eye tracking device were studied with the
help of machine learning algorithms, such as classifiers and regression models.
Gaze data were collected from a group of participants under multiple conditions
that commonly affect eye trackers operating on desktop and handheld platforms.
These conditions (referred here as error sources) include user distance, head
pose, and eye-tracker pose variations, and the collected gaze data were used to
train the classifier and regression models. It was seen that while the impact
of the different error sources on gaze data characteristics were nearly
impossible to distinguish by visual inspection or from data statistics, machine
learning models were successful in identifying the impact of the different
error sources and predicting the variability in gaze error levels due to these
conditions. The objective of this study was to investigate the efficacy of
machine learning methods towards the detection and prediction of gaze error
patterns, which would enable an in-depth understanding of the data quality and
reliability of eye trackers under unconstrained operating conditions. Coding
resources for all the machine learning methods adopted in this study were
included in an open repository named MLGaze to allow researchers to replicate
the principles presented here using data from their own eye trackers.Comment: https://github.com/anuradhakar49/MLGaz
Objective and automated assessment of surgical technical skills with IoT systems: A systematic literature review
The assessment of surgical technical skills to be acquired by novice surgeons has been traditionally done by an expert surgeon and is therefore of a subjective nature. Nevertheless, the recent advances on IoT, the possibility of incorporating sensors into objects and environments in order to collect large amounts of data, and the progress on machine learning are facilitating a more objective and automated assessment of surgical technical skills. This paper presents a systematic literature review of papers published after 2013 discussing the objective and automated assessment of surgical technical skills. 101 out of an initial list of 537 papers were analyzed to identify: 1) the sensors used; 2) the data collected by these sensors and the relationship between these data, surgical technical skills and surgeons' levels of expertise; 3) the statistical methods and algorithms used to process these data; and 4) the feedback provided based on the outputs of these statistical methods and algorithms. Particularly, 1) mechanical and electromagnetic sensors are widely used for tool tracking, while inertial measurement units are widely used for body tracking; 2) path length, number of sub-movements, smoothness, fixation, saccade and total time are the main indicators obtained from raw data and serve to assess surgical technical skills such as economy, efficiency, hand tremor, or mind control, and distinguish between two or three levels of expertise (novice/intermediate/advanced surgeons); 3) SVM (Support Vector Machines) and Neural Networks are the preferred statistical methods and algorithms for processing the data collected, while new opportunities are opened up to combine various algorithms and use deep learning; and 4) feedback is provided by matching performance indicators and a lexicon of words and visualizations, although there is considerable room for research in the context of feedback and visualizations, taking, for example, ideas from learning analytics.This work was supported in part by the FEDER/Ministerio de Ciencia, Innovación y Universidades;Agencia Estatal de Investigación, through the Smartlet Project under Grant TIN2017-85179-C3-1-R, and in part by the Madrid Regional Government through the e-Madrid-CM Project under Grant S2018/TCS-4307, a project which is co-funded by the European Structural Funds (FSE and FEDER). Partial support has also been received from the European Commission through Erasmus + Capacity Building in the Field of Higher Education projects, more specifically through projects LALA (586120-EPP-1-2017-1-ES-EPPKA2-CBHE-JP), InnovaT (598758-EPP-1-2018-1-AT-EPPKA2-CBHE-JP), and PROF-XXI (609767-EPP-1-2019-1-ES-EPPKA2-CBHE-JP)
Human behaviour-based automatic depression analysis using hand-crafted statistics and deep learned spectral features
Depression is a serious mental disorder that affects millions of people all over the world. Traditional clinical diagnosis methods are subjective, complicated and need extensive participation of experts. Audio-visual automatic depression analysis systems predominantly base their predictions on very brief sequential segments, sometimes as little as one frame. Such data contains much redundant information, causes a high computational load, and negatively affects the detection accuracy. Final decision making at the sequence level is then based on the fusion of frame or segment level predictions. However, this approach loses longer term behavioural correlations, as the behaviours themselves are abstracted away by the frame-level predictions. We propose to on the one hand use automatically detected human behaviour primitives such as Gaze directions, Facial action units (AU), etc. as low-dimensional multi-channel time series data, which can then be used to create two sequence descriptors. The first calculates the sequence-level statistics of the behaviour primitives and the second casts the problem as a Convolutional Neural Network problem operating on a spectral representation of the multichannel behaviour signals. The results of depression detection (binary classification) and severity estimation (regression) experiments conducted on the AVEC 2016 DAIC-WOZ database show that both methods achieved significant improvement compared to the previous state of the art in terms of the depression severity estimation
Proceedings of the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory
This book is a collection of 15 reviewed technical reports summarizing the presentations at the 2011 Joint Workshop of Fraunhofer IOSB and Institute for Anthropomatics, Vision and Fusion Laboratory. The covered topics include image processing, optical signal processing, visual inspection, pattern recognition and classification, human-machine interaction, world and situation modeling, autonomous system localization and mapping, information fusion, and trust propagation in sensor networks
Classification of Drivers' Workload Using Physiological Signals in Conditional Automation
The use of automation in cars is increasing. In future vehicles, drivers will no longer be in charge of the main driving task and may be allowed to perform a secondary task. However, they might be requested to regain control of the car if a hazardous situation occurs (i.e., conditionally automated driving). Performing a secondary task might increase drivers' mental workload and consequently decrease the takeover performance if the workload level exceeds a certain threshold. Knowledge about the driver's mental state might hence be useful for increasing safety in conditionally automated vehicles. Measuring drivers' workload continuously is essential to support the driver and hence limit the number of accidents in takeover situations. This goal can be achieved using machine learning techniques to evaluate and classify the drivers' workload in real-time. To evaluate the usefulness of physiological data as an indicator for workload in conditionally automated driving, three physiological signals from 90 subjects were collected during 25 min of automated driving in a fixed-base simulator. Half of the participants performed a verbal cognitive task to induce mental workload while the other half only had to monitor the environment of the car. Three classifiers, sensor fusion and levels of data segmentation were compared. Results show that the best model was able to successfully classify the condition of the driver with an accuracy of 95%. In some cases, the model benefited from sensors' fusion. Increasing the segmentation level (e.g., size of the time window to compute physiological indicators) increased the performance of the model for windows smaller than 4 min, but decreased for windows larger than 4 min. In conclusion, the study showed that a high level of drivers' mental workload can be accurately detected while driving in conditional automation based on 4-min recordings of respiration and skin conductance
Multimodal Data Analysis of Dyadic Interactions for an Automated Feedback System Supporting Parent Implementation of Pivotal Response Treatment
abstract: Parents fulfill a pivotal role in early childhood development of social and communication
skills. In children with autism, the development of these skills can be delayed. Applied
behavioral analysis (ABA) techniques have been created to aid in skill acquisition.
Among these, pivotal response treatment (PRT) has been empirically shown to foster
improvements. Research into PRT implementation has also shown that parents can be
trained to be effective interventionists for their children. The current difficulty in PRT
training is how to disseminate training to parents who need it, and how to support and
motivate practitioners after training.
Evaluation of the parents’ fidelity to implementation is often undertaken using video
probes that depict the dyadic interaction occurring between the parent and the child during
PRT sessions. These videos are time consuming for clinicians to process, and often result
in only minimal feedback for the parents. Current trends in technology could be utilized to
alleviate the manual cost of extracting data from the videos, affording greater
opportunities for providing clinician created feedback as well as automated assessments.
The naturalistic context of the video probes along with the dependence on ubiquitous
recording devices creates a difficult scenario for classification tasks. The domain of the
PRT video probes can be expected to have high levels of both aleatory and epistemic
uncertainty. Addressing these challenges requires examination of the multimodal data
along with implementation and evaluation of classification algorithms. This is explored
through the use of a new dataset of PRT videos.
The relationship between the parent and the clinician is important. The clinician can
provide support and help build self-efficacy in addition to providing knowledge and
modeling of treatment procedures. Facilitating this relationship along with automated
feedback not only provides the opportunity to present expert feedback to the parent, but
also allows the clinician to aid in personalizing the classification models. By utilizing a
human-in-the-loop framework, clinicians can aid in addressing the uncertainty in the
classification models by providing additional labeled samples. This will allow the system
to improve classification and provides a person-centered approach to extracting
multimodal data from PRT video probes.Dissertation/ThesisDoctoral Dissertation Computer Science 201
- …