7,349 research outputs found
Multimodal Speech Emotion Recognition Using Audio and Text
Speech emotion recognition is a challenging task, and extensive reliance has
been placed on models that use audio features in building well-performing
classifiers. In this paper, we propose a novel deep dual recurrent encoder
model that utilizes text data and audio signals simultaneously to obtain a
better understanding of speech data. As emotional dialogue is composed of sound
and spoken content, our model encodes the information from audio and text
sequences using dual recurrent neural networks (RNNs) and then combines the
information from these sources to predict the emotion class. This architecture
analyzes speech data from the signal level to the language level, and it thus
utilizes the information within the data more comprehensively than models that
focus on audio features. Extensive experiments are conducted to investigate the
efficacy and properties of the proposed model. Our proposed model outperforms
previous state-of-the-art methods in assigning data to one of four emotion
categories (i.e., angry, happy, sad and neutral) when the model is applied to
the IEMOCAP dataset, as reflected by accuracies ranging from 68.8% to 71.8%.Comment: 7 pages, Accepted as a conference paper at IEEE SLT 201
A Simple Web Platform Solution for M-Learning
Nowadays the role of educational platforms is more than obvious, thanks to websites and modern platforms like Microsoft SharePoint designed for e-learning. We consider that the next generation of learning platforms will be m-learning platforms. These kind of platforms offer first of all mobility for the potential users of PDAs, pocket PCs, smart phones and other modern mobile devices, discovered and developed in last years. One of the most important aspect of these manners of e-learning is the display mode. Classic systems like personal computers have a bigger screen, modern portable devices have a few inches screens and the problem is to adapt the structure of websites and platforms for pocket PC screens and in the same time to develop the capability to produce same experience and usefulness to all users.Platform, M-learning, Discussion Forum, Search Engine, JavaScript, IIS, Port Forwarding
Wireless Handheld Computers in the Preclinical Undergraduate Curriculum
This report presents the results of a pilot project using wireless PDAs as teaching tools in an undergraduate medical curriculum. This technology was used to foster a transition from a passive to an interactive learning environment in the classroom and provided a solution for the implementation of computer-based exams for a large class. Wayne State Medical School recently provided model e570 Toshiba PocketPCs® (personal digital assistants or PDAs), network interface cards, and application software developed by CampusMobility® to 20 sophomore medical students. The pilot group of preclinical students used the PDAs to access web-based course content, for communication, scheduling, to participate in interactive teaching sessions, and to complete course evaluations. Another part of this pilot has been to utilize the PDAs for computer-based exams in a wireless environment. Server authentication that restricted access during the exams and a proctoring console to monitor and record the PDA screens will be described in this report. Results of a student satisfaction survey will be present
Multi-Scale Attention with Dense Encoder for Handwritten Mathematical Expression Recognition
Handwritten mathematical expression recognition is a challenging problem due
to the complicated two-dimensional structures, ambiguous handwriting input and
variant scales of handwritten math symbols. To settle this problem, we utilize
the attention based encoder-decoder model that recognizes mathematical
expression images from two-dimensional layouts to one-dimensional LaTeX
strings. We improve the encoder by employing densely connected convolutional
networks as they can strengthen feature extraction and facilitate gradient
propagation especially on a small training set. We also present a novel
multi-scale attention model which is employed to deal with the recognition of
math symbols in different scales and save the fine-grained details that will be
dropped by pooling operations. Validated on the CROHME competition task, the
proposed method significantly outperforms the state-of-the-art methods with an
expression recognition accuracy of 52.8% on CROHME 2014 and 50.1% on CROHME
2016, by only using the official training dataset
- …