172 research outputs found
VEMI Lab 2021
The Virtual Environments and Multimodal Interaction (VEMI) Lab embodies an inclusive, collaborative, and multi-disciplinary approach to hands-on research and education. By bringing together students and faculty from more than a dozen majors and disciplines, VEMI is uniquely positioned to advance computing and STEM initiatives both here at the university as well as in broader communities throughout Maine and nationwide
Mining multimodal sequential patterns : a case study on affect detection
Temporal data from multimodal interaction such as speech and bio-signals cannot be easily analysed without a preprocessing phase through which some key characteristics of the signals are extracted. Typically, standard statistical signal features such as average values are calculated prior to the analysis and, subsequently, are presented either to a multimodal fusion mechanism or a computational model of the interaction. This paper proposes a feature extraction methodology which is based on frequent sequence mining within and across multiple modalities of user input. The proposed method is applied for the fusion of physiological signals and gameplay information in a game survey dataset. The obtained sequences are analysed and used as predictors of user affect resulting in computational models of equal or higher accuracy compared to the models built on standard statistical features.peer-reviewe
Deep multimodal fusion : combining discrete events and continuous signals
Multimodal datasets often feature a combination of continuous signals and a series of discrete events. For instance, when
studying human behaviour it is common to annotate actions
performed by the participant over several other modalities
such as video recordings of the face or physiological signals.
These events are nominal, not frequent and are not sampled
at a continuous rate while signals are numeric and often
sampled at short fixed intervals. This fundamentally different nature complicates the analysis of the relation among
these modalities which is often studied after each modality
has been summarised or reduced.
This paper investigates a novel approach to model the
relation between such modality types bypassing the need
for summarising each modality independently of each other.
For that purpose, we introduce a deep learning model based
on convolutional neural networks that is adapted to process
multiple modalities at different time resolutions we name
deep multimodal fusion. Furthermore, we introduce and
compare three alternative methods (convolution, training
and pooling fusion) to integrate sequences of events with
continuous signals within this model. We evaluate deep multimodal fusion using a game user dataset where player physiological signals are recorded in parallel with game events.
Results suggest that the proposed architecture can appropriately capture multimodal information as it yields higher
prediction accuracies compared to single-modality models.
In addition, it appears that pooling fusion, based on a novel
filter-pooling method provides the more effective fusion approach for the investigated types of data.peer-reviewe
Towards player’s affective and behavioral visual cues as drives to game adaptation
Recent advances in emotion and affect recognition can play a crucial role in game technology. Moving from the typical game controls
to controls generated from free gestures is already in the market. Higher level controls, however, can also be motivated by player’s
affective and cognitive behavior itself, during gameplay. In this paper, we explore player’s behavior, as captured by computer vision
techniques, and player’s details regarding his own experience and profile. The objective of the current research is game adaptation
aiming at maximizing player enjoyment. To this aim, the ability to infer player engagement and frustration, along with the degree of
challenge imposed by the game is explored. The estimated levels of the induced metrics can feed an engine’s artificial intelligence,
allowing for game adaptation.This research was supported by the FP7 ICT project SIREN
(project no: 258453)peer-reviewe
Supervised contrastive learning for affect modelling
Affect modeling is viewed, traditionally, as the process of mapping
measurable affect manifestations from multiple modalities of user
input to affect labels. That mapping is usually inferred through endto-
end (manifestation-to-affect) machine learning processes. What
if, instead, one trains general, subject-invariant representations that
consider affect information and then uses such representations to
model affect? In this paper we assume that affect labels form an integral
part, and not just the training signal, of an affect representation
and we explore how the recent paradigm of contrastive learning
can be employed to discover general high-level affect-infused representations
for the purpose of modeling affect.We introduce three
different supervised contrastive learning approaches for training
representations that consider affect information. In this initial study we test the proposed methods for arousal prediction in the RECOLA
dataset based on user information from multiple modalities. Results
demonstrate the representation capacity of contrastive learning
and its efficiency in boosting the accuracy of affect models. Beyond
their evidenced higher performance compared to end-to-end
arousal classification, the resulting representations are general purpose
and subject-agnostic, as training is guided though general
affect information available in any multimodal corpus.peer-reviewe
Classifying head movements in video-recorded conversations based on movement velocity, acceleration and jerk
This paper is about the automatic annotation of head movements in videos of face-to-face conversations. Manual annotation of gestures is resource consuming, and modelling gesture behaviours in different types of communicative settings requires many types of annotated data. Therefore, developing methods for automatic annotation is crucial. We present an approach where an SVM classifier learns to classify head movements based on measurements of velocity, acceleration, and the third derivative of position with respect to time, jerk. Consequently, annotations of head movements are added to new video data. The results of the automatic annotation are evaluated against manual annotations in the same data and show an accuracy of 73.47% with respect to these. The results also show that using jerk improves accuracy.peer-reviewe
Vision-Based Navigation of Autonomous Vehicle in Roadway Environments with Unexpected Hazards
69A3551747117Vision-based navigation of autonomous vehicles primarily depends on the Deep Neural Network (DNN) based systems in which the controller obtains input from sensors/detectors, such as cameras and produces a vehicle control output, such as a steering wheel angle to navigate the vehicle safely in a roadway traffic environment. Typically, these DNN-based systems of the autonomous vehicle are trained through supervised learning; however, recent studies show that a trained DNN-based system can be compromised by perturbation or adversarial inputs. Similarly, this perturbation can be introduced into the DNN-based systems of autonomous vehicle by unexpected roadway hazards, such as debris and roadblocks. In this study, we first introduce a roadway hazardous environment (both intentional and unintentional roadway hazards) that can compromise the DNN-based navigational system of an autonomous vehicle, and produces an incorrect steering wheel angle, which can cause crashes resulting in fatality and injury. Then, we develop a DNN-based autonomous vehicle driving system using object detection and semantic segmentation to mitigate the adverse effect of this type of hazardous environment, which helps the autonomous vehicle to navigate safely around such hazards. We find that our developed DNN-based autonomous vehicle driving system including hazardous object detection and semantic segmentation improves the navigational ability of an autonomous vehicle to avoid a potential hazard by 21% compared to the traditional DNN-based autonomous vehicle driving system
Smart Compaction for Infrastructure Materials
69A3551847103Compaction is a process of rearranging material particles by various mechanical loadings to densify the materials and form a stable pavement structure. Current methods to assess the compaction quality rely heavily on engineers' experiences or post-compaction methods at selected spots. The experience-based method is prone to cause compaction problems and pavement distresses, particularly when new materials are implemented. Due to the complicated interactions between the compactors and materials, the compaction mechanism of the particulate materials is still unclear. This gap hinders the improvement of compaction quality and the development of intelligent construction. This project was undertaken to investigate the compaction mechanism of the infrastructure material from the mesoscale (particle scale) and develop an innovative compaction monitoring method that determines the compaction condition based on particle kinematics. With the development of sensing technologies, wireless particle-size sensors have become available in research and industry for monitoring particle behaviors during compaction. A wireless sensor, SmartRock, was applied in the project and collected the mesoscale behaviors during compaction. Several lab and field compaction projects were carried out using asphalt mixtures and granular materials, various compaction machines, and pavement structures. It was found that internal particle kinematic behavior is closely correlated to material densification during compaction. The lab and field compaction can be reasonably connected by the particle rotation, and similar three-stage compaction patterns were identified. Three machine learning models were built to predict the compaction condition and the density of the asphalt pavement both in the lab and in the field. The reasonable predictions confirm that the machine learning algorithm is appropriate for compaction prediction. The density results from the pavement cores further verify the applicability and robustness of the intelligent model for compaction prediction. Future studies are still needed to evaluate the model's robustness based on more mixture varieties and field applications
Smart Mobile Platform for Model Updating and Life Cycle Assessment of Bridges
69A3551847103Mobile sensing is an alternative paradigm that offers numerous advantages compared to the conventional stationary sensor networks. Mobile sensors have low setup costs, collect spatial information efficiently, and require no dedicated sensors to any particular structure. Most importantly, they can capture comprehensive spatial information using few sensors. The advantages of mobile sensing combined with the ubiquity of smartphones with the internet of things (IoT) connectivity have motivated researchers to think of cars+smart phones as large-scale sensor networks that can contribute to the health assessment of structures. Working with mobile sensors has several challenges. The signals collected within a vehicle\u2019s cabin are contaminated by the vehicle suspension dynamics; therefore, the extraction of bridge vibration from signals collected within a vehicle is not an easy task. Additionally, mobile sensors simultaneously measure vibration data in time while scanning over a large set of points in space, which creates a different data structure compared with fixed sensors. Since collected data are mixed in time and space, they contain spatial discontinuities. When these challenges are addressed, mobile sensing is a promising data resource enabling crowdsourcing and an opportunity to extract information about infrastructure conditions at an unprecedented rate and resolution. In this regard, deep learning-based frameworks have been developed in this project to (a) resolve the dynamic behavior of a vehicle by estimating input forces to which it is subjected from responses acquired from within a vehicle and (b) learn underlying partial differential equations governing the underlying dynamics of a system from recorded data
- …