13 research outputs found
Expressive Speech Synthesis for Critical Situations
Presence of appropriate acoustic cues of affective features in the synthesized speech can be a prerequisite for the proper evaluation of the semantic content by the message recipient. In the recent work the authors have focused on the research of expressive speech synthesis capable of generating naturally sounding synthetic speech at various levels of arousal. Automatic information and warning systems can be used to inform, warn, instruct and navigate people in dangerous, critical situations, and increase the effectiveness of crisis management and rescue operations. One of the activities in the frame of the EU SF project CRISIS was called "Extremely expressive (hyper-expressive) speech synthesis for urgent warning messages generation''. It was aimed at research and development of speech synthesizers with high naturalness and intelligibility capable of generating messages with various expressive loads. The synthesizers will be applicable to generate public alert and warning messages in case of fires, floods, state security threats, etc. Early warning in relation to the situations mentioned above can be made thanks to fire and flood spread forecasting; modeling thereof is covered by other activities of the CRISIS project. The most important part needed for the synthesizer building is the expressive speech database. An original method is proposed to create such a database. The current version of the expressive speech database is introduced and first experiments with expressive synthesizers developed with this database are presented and discussed
Prediction of Stress Level from Speech – from Database to Regressor
The term stress can designate a number of situations and affective reactions. This work focuses on the immediate stress reaction caused by, for example, threat, danger, fear, or great concern. Could measuring stress from speech be a viable fast and non-invasive method? The article describes the development of a system predicting stress from voice – from the creation of the database, and preparation of the training data to the design and tests of the regressor. StressDat, an acted database of speech under stress in Slovak, was designed. After publishing the methodology during its development in [1], this work describes the final form, annotation, and basic acoustic analyses of the data. The utterances presenting various stress-inducing scenarios were acted at three intended stress levels. The annotators used a "stress thermometer" to rate the perceived stress in the utterance on a scale from 0 to 100. Thus, data with a resolution suitable for training the regressor was obtained. Several regressors were trained, tested and compared. On the test-set, the stress estimation works well (R square = 0.72, Concordance Correlation Coefficient = 0.83) but practical application will require much larger volumes of specific training data. StressDat was made publicly available
Methodology of VFR night flying
Night is defined, for aviation purposes, as the period of darkness from the end of evening civil twilight to the beginning of morning civil twilight. Night flying is risky and more dangerous, comparing with flying during daylight, so it is essential to seek training with a flight instructor specifically for night flying. This article looks briefly at some underlying principles and practices, including: illusions, planning considerations, and handling emergencies.</span
RHOA and mDia1 promotes apoptosis of breast cancer cells via a high dose of doxorubicin treatment
Transforming RhoA proteins (RHOA) and their downstream Diaphanous homolog 1 proteins (DIAPH1) or mDia1 participate in the regulation of actin cytoskeleton which plays critical role in cells, i.e., morphologic changes and apoptosis
StressDat – DATABASE OF SPEECH UNDER STRESS IN SLOVAK
The paper describes methodology for creating a Slovak database of speech
under stress and pilot observations. While the relationship between stress and speech
characteristics can be utilized in a wide domain of speech technology applications, its
research suffers from the lack of suitable databases, particularly in conversational speech. We
propose a novel procedure to record acted speech in the home of actors and using their own
smartphones. We describe both the collection of speech material under three levels of stress
and the subsequent annotation of stress levels in this material. First observations suggest
a reasonable inter-annotator agreement, as well as interesting avenues for the relationship
between the intended stress levels and those perceived in speech
Research of the Photo-Optical Method Application for Measuring Selected Data on the Movement of a Parachute for Type M-282
Testing in the field of parachute technology provides space for the application of new and innovative methods of measuring operating and functional parameters. The main aim of the paper is to present the results of research for the verification of the photo-optical method of measuring the vertical speed of the M-282 parachutes, and for its use in testing, collecting, and investigating motion data in parachuting. As part of this measuring technology, twelve jumps were performed. It was verified that the experiment was completed for the M-282 parachute according to the regulation of SAE AS 8015B “Minimum Performance Standard Parachute Assemblies and Components”. An analysis of the influencing factors and quantification of their influence on the uncertainty of the measurement results was also performed. The results of the measurement achieved by using the photo-optical method were compared with the measurement with the electronic variometer FLYTEC 4030. The vertical speed of the M-282 parachute (4.655 m·s−1) defined by the photo-optical method is significantly similar to the vertical speed of the M-282 parachute (4.662 m·s−1) defined by FLYTEC 4030. We can state that the process of identifying the vertical speed of the parachute by the photo-optical method was correct. This is a suitable method of evaluating motion data in the operation of M-282 type parachutes. In the following research for generalization of the methodology, we assume the performance of more than 60 experimental jumps using different types of parachutes, digital sensors (cameras), and a photo-optical method to examine motion data and formulate recommendations for testing, investigative applications, individualized training programs, and aspects of parachuting injury prevention
Mapping Discrete Emotions in the Dimensional Space: An Acoustic Approach
A frequently used procedure to examine the relationship between categorical and dimensional descriptions of emotions is to ask subjects to place verbal expressions representing emotions
in a continuous multidimensional emotional space. This work chooses a different approach. It
aims at creating a system predicting the values of Activation and Valence (AV) directly from the
sound of emotional speech utterances without the use of its semantic content or any other additional
information. The system uses X-vectors to represent sound characteristics of the utterance and
Support Vector Regressor for the estimation the AV values. The system is trained on a pool of three
publicly available databases with dimensional annotation of emotions. The quality of regression is
evaluated on the test sets of the same databases. Mapping of categorical emotions to the dimensional
space is tested on another pool of eight categorically annotated databases. The aim of the work
was to test whether in each unseen database the predicted values of Valence and Activation will
place emotion-tagged utterances in the AV space in accordance with expectations based on Russell’s
circumplex model of affective space. Due to the great variability of speech data, clusters of emotions
create overlapping clouds. Their average location can be represented by centroids. A hypothesis
on the position of these centroids is formulated and evaluated. The system’s ability to separate the
emotions is evaluated by measuring the distance of the centroids. It can be concluded that the system
works as expected and the positions of the clusters follow the hypothesized rules. Although the
variance in individual measurements is still very high and the overlap of emotion clusters is large, it
can be stated that the AV coordinates predicted by the system lead to an observable separation of
the emotions in accordance with the hypothesis. Knowledge from training databases can therefore
be used to predict AV coordinates of unseen data of various origins. This could be used to detect
high levels of stress or depression. With the appearance of more dimensionally annotated training
data, the systems predicting emotional dimensions from speech sound will become more robust
and usable in practical applications in call-centers, avatars, robots, information-providing systems,
security applications, and the like