309 research outputs found

    Semi-Supervised Speech Emotion Recognition with Ladder Networks

    Full text link
    Speech emotion recognition (SER) systems find applications in various fields such as healthcare, education, and security and defense. A major drawback of these systems is their lack of generalization across different conditions. This problem can be solved by training models on large amounts of labeled data from the target domain, which is expensive and time-consuming. Another approach is to increase the generalization of the models. An effective way to achieve this goal is by regularizing the models through multitask learning (MTL), where auxiliary tasks are learned along with the primary task. These methods often require the use of labeled data which is computationally expensive to collect for emotion recognition (gender, speaker identity, age or other emotional descriptors). This study proposes the use of ladder networks for emotion recognition, which utilizes an unsupervised auxiliary task. The primary task is a regression problem to predict emotional attributes. The auxiliary task is the reconstruction of intermediate feature representations using a denoising autoencoder. This auxiliary task does not require labels so it is possible to train the framework in a semi-supervised fashion with abundant unlabeled data from the target domain. This study shows that the proposed approach creates a powerful framework for SER, achieving superior performance than fully supervised single-task learning (STL) and MTL baselines. The approach is implemented with several acoustic features, showing that ladder networks generalize significantly better in cross-corpus settings. Compared to the STL baselines, the proposed approach achieves relative gains in concordance correlation coefficient (CCC) between 3.0% and 3.5% for within corpus evaluations, and between 16.1% and 74.1% for cross corpus evaluations, highlighting the power of the architecture

    Speech-driven Animation with Meaningful Behaviors

    Full text link
    Conversational agents (CAs) play an important role in human computer interaction. Creating believable movements for CAs is challenging, since the movements have to be meaningful and natural, reflecting the coupling between gestures and speech. Studies in the past have mainly relied on rule-based or data-driven approaches. Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech. Data-driven approaches, especially speech-driven models, can capture the relationship between speech and gestures. However, they create behaviors disregarding the meaning of the message. This study proposes to bridge the gap between these two approaches overcoming their limitations. The approach builds a dynamic Bayesian network (DBN), where a discrete variable is added to constrain the behaviors on the underlying constraint. The study implements and evaluates the approach with two constraints: discourse functions and prototypical behaviors. By constraining on the discourse functions (e.g., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data. By constraining on prototypical behaviors (e.g., head nods), the approach can be embedded in a rule-based system as a behavior realizer creating trajectories that are timely synchronized with speech. The study proposes a DBN structure and a training approach that (1) models the cause-effect relationship between the constraint and the gestures, (2) initializes the state configuration models increasing the range of the generated behaviors, and (3) captures the differences in the behaviors across constraints by enforcing sparse transitions between shared and exclusive states per constraint. Objective and subjective evaluations demonstrate the benefits of the proposed approach over an unconstrained model.Comment: 13 pages, 12 figures, 5 table

    End-to-end Audiovisual Speech Activity Detection with Bimodal Recurrent Neural Models

    Full text link
    Speech activity detection (SAD) plays an important role in current speech processing systems, including automatic speech recognition (ASR). SAD is particularly difficult in environments with acoustic noise. A practical solution is to incorporate visual information, increasing the robustness of the SAD approach. An audiovisual system has the advantage of being robust to different speech modes (e.g., whisper speech) or background noise. Recent advances in audiovisual speech processing using deep learning have opened opportunities to capture in a principled way the temporal relationships between acoustic and visual features. This study explores this idea proposing a \emph{bimodal recurrent neural network} (BRNN) framework for SAD. The approach models the temporal dynamic of the sequential audiovisual data, improving the accuracy and robustness of the proposed SAD system. Instead of estimating hand-crafted features, the study investigates an end-to-end training approach, where acoustic and visual features are directly learned from the raw data during training. The experimental evaluation considers a large audiovisual corpus with over 60.8 hours of recordings, collected from 105 speakers. The results demonstrate that the proposed framework leads to absolute improvements up to 1.2% under practical scenarios over a VAD baseline using only audio implemented with deep neural network (DNN). The proposed approach achieves 92.7% F1-score when it is evaluated using the sensors from a portable tablet under noisy acoustic environment, which is only 1.0% lower than the performance obtained under ideal conditions (e.g., clean speech obtained with a high definition camera and a close-talking microphone).Comment: Submitted to Speech Communicatio

    Seed weight variation of wyoming sagebrush in Northern Nevada

    Get PDF
    Seed size is a crucial plant trait that may potentially affect not only immediate seedling success but also the subsequent generation. We examined variation in seed weight of Wyoming sagebrush (Artemisia tridentata ssp. wyomingensis Beetle and Young), an excellent candidate species for rangeland restoration. The working hypothesis was that a major fraction of spatial and temporal variability in seed size (weight) of Wyoming sagebrush could be explained by variations in mean monthly temperatures and precipitation. Seed collection was conducted at Battle Mountain and Eden Valley sites in northern Nevada, USA, during November of 2002 and 2003. Frequency distributions of seed weight varied from leptokurtic to platykurtic, and from symmetry to skewness to the right for both sites and years. Mean seed weight varied by a factor of 1.4 between locations and years. Mean seed weight was greater (P0.05) in all study situations. Simple linear regression showed that monthly precipitation (March to November) explained 85% of the total variation in mean seed weight ( P=0.079). Since the relationship between mean monthly temperature (June-November) and mean seed weight was not significant (r2=0.00, P=0.431), this emphasizes the importance of precipitation as an important determinant of mean seed weight. Our results suggest that the precipitation regime to which the mother plant is exposed can have a significant effect on sizes of seeds produced. Hence, seasonal changes in water availability would tend to alter size distributions of produced offspring.Fil: Busso, Carlos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Perryman, Barry L.. University of Nevada; Estados Unido

    Factors Affecting Recovery from Defoliation during Drought in Two Aridland Tussock Grasses

    Get PDF
    The importance of several factors in limiting recovery from defoliation was investigated in field-grown plants of Agropyron desertorum and Agropyron spicatum exposed to drought , natural or irrigated conditions. Leaf extension rate, components of leaf area production, number of metabolically active axillary buds and carbohydrate availability were examined on the same plants immediately after defoliation and/or in the following spring from 1984 until 1986. The diurnal course of leaf growth did not relate to turgor pressure in the expanded portion of leaf laminae. Rather growth was apparently associated with air temperature. Leaf extension rate was lower under drought than under better moisture levels during 1984 to 1986. For both species, reduced growth rates and shorter growth periods resulted in reduced tiller height, leaf number and leaf size under drought compared with natural or irrigated conditions in 1985 and 1986, but not in 1984. As a result, leaf area and/or yields were also lower under drought in 1985 and 1986, and lowest under drought plus defoliation in 1986. Production of daughter tillers immediately after defoliation was also lowest under drought. Regrowth capacity of both species was not limited by axillary bud number, size or viability immediately after defoliation under any water level in 1986. In early spring, however, tiller number and growth were lower on clipped than on unclipped plants of both species under drought and irrigated conditions in 1986, and under all water levels in 1987; this resulted in considerably reduced photosynthetic canopies on clipped plants. Crown and root total nonstructural carbohydrate (TNC) pools were higher under drought than under better moisture levels in A. desertorum and A. spicatum in early spring 1986. These high pools of TNC apparently enhanced the production of etiolated regrowth in both species when meristematic limitations did not exist in early spring. The productive potential of both Agropyron species will probably not be affected following a late and severe defoliation under drought. However, vegetative growth and/or productivity, and probably the persistence of these species in the community, will be reduced after two or more years of late and heavy defoliations under drought

    Estimation of Driver's Gaze Region from Head Position and Orientation using Probabilistic Confidence Regions

    Full text link
    A smart vehicle should be able to understand human behavior and predict their actions to avoid hazardous situations. Specific traits in human behavior can be automatically predicted, which can help the vehicle make decisions, increasing safety. One of the most important aspects pertaining to the driving task is the driver's visual attention. Predicting the driver's visual attention can help a vehicle understand the awareness state of the driver, providing important contextual information. While estimating the exact gaze direction is difficult in the car environment, a coarse estimation of the visual attention can be obtained by tracking the position and orientation of the head. Since the relation between head pose and gaze direction is not one-to-one, this paper proposes a formulation based on probabilistic models to create salient regions describing the visual attention of the driver. The area of the predicted region is small when the model has high confidence on the prediction, which is directly learned from the data. We use Gaussian process regression (GPR) to implement the framework, comparing the performance with different regression formulations such as linear regression and neural network based methods. We evaluate these frameworks by studying the tradeoff between spatial resolution and accuracy of the probability map using naturalistic recordings collected with the UTDrive platform. We observe that the GPR method produces the best result creating accurate predictions with localized salient regions. For example, the 95% confidence region is defined by an area that covers 3.77% region of a sphere surrounding the driver.Comment: 13 Pages, 12 figures, 2 table

    Ladder Networks for Emotion Recognition: Using Unsupervised Auxiliary Tasks to Improve Predictions of Emotional Attributes

    Full text link
    Recognizing emotions using few attribute dimensions such as arousal, valence and dominance provides the flexibility to effectively represent complex range of emotional behaviors. Conventional methods to learn these emotional descriptors primarily focus on separate models to recognize each of these attributes. Recent work has shown that learning these attributes together regularizes the models, leading to better feature representations. This study explores new forms of regularization by adding unsupervised auxiliary tasks to reconstruct hidden layer representations. This auxiliary task requires the denoising of hidden representations at every layer of an auto-encoder. The framework relies on ladder networks that utilize skip connections between encoder and decoder layers to learn powerful representations of emotional dimensions. The results show that ladder networks improve the performance of the system compared to baselines that individually learn each attribute, and conventional denoising autoencoders. Furthermore, the unsupervised auxiliary tasks have promising potential to be used in a semi-supervised setting, where few labeled sentences are available.Comment: Submitted to Interspeech 201

    Competitive ability and defoliation tolerance in Stipa clarazii, Stipa tenuis y Stipa ambigua

    Get PDF
    Mayores valores de tasas de crecimiento, capacidad de proliferación radical, densidad de longitud de raíces y capacidad de absorción de nutrientes se han asociado con un aumento en la adquisición de nutrientes en las gramíneas perennes, y contribuirían por ello a su capacidad competitiva y tolerancia a la defoliación (Bedunah y Sosbee, 1995).Fil: Saint Pierre, Carolina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur. Departamento de Agronomía; ArgentinaFil: Busso, Carlos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina. Universidad Nacional del Sur. Departamento de Agronomía; Argentin

    Germinación de gramíneas y arbustos bajo varias condiciones de estrés hídrico y temperatura

    Get PDF
    Los efectos de varias combinaciones de potencial hídrico y temperatura se determinaron en la germinación de Atriplex lampa Gill. ex Moquin, Larrea divaricata Cav., Leymus erianthus (Phil.) Dubcovsky, Stipa neaei Nees ex Steudel and Poa ligularis Nees ap. Steudel bajo condiciones controladas. La hipótesis puesta a prueba fue que la germinación de las semillas se incrementa a mayores temperaturas y potenciales hídricos en A. lampa, L. divaricata, L. erianthus, S. neaei and P. ligularis, y que el tiempo para alcanzar el 50% de la germinación total es mayor a menores que mayores potenciales hídricos. PEG 2000 se utilizó para imponer las condiciones de estrés hídrico. En general, los resultados obtenidos condujeron a aceptar la hipótesis propuesta.The effects of various temperature combinations and water potentials were determined on the germination of Atriplex lampa Gill. ex Moquin, Larrea divaricata Cav., Leymus erianthus (Phil.) Dubcovsky, Stipa neaei Nees ex Steudel and Poa ligularis Nees ap. Steudel under controlled conditions. The tested hypothesis was that seed germination increases with increasing temperatures and water potentials in A. lampa, L. divaricata, L. erianthus, S. neaei and P. ligularis, and that time to reach 50% of total germination is greater at lower than higher water potentials. PEG 2000 was used to impose water stress conditions. In general, obtained results conducted to accept the posted hyphotesis.Fil: Bonvissuto, Griselda Luz. Instituto Nacional de Tecnología Agropecuaria. Centro Regional Patagonia Norte. Estación Experimental Agropecuaria San Carlos de Bariloche; ArgentinaFil: Busso, Carlos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentin

    Perennial grasses of different successional stages under various soil water inputs: do they differ in root length density?

    Get PDF
    Information about root length density (RLD) on perennial grasses of different successional stages exposed to various soil water inputs is limited. The effects on RLD of different soil water inputs were evaluated in the late-seral Stipa clarazii Ball, the comparatively earlier-seral S. tenuis Phil, and the earlyseral S. gynerioides Phil. Field studies were conducted in 1996 and early 1997, although treatments were imposed since 1995. S. clarazii and S. tenuis are two important palatable perennial tussock grasses in temperate, semiarid rangelands of central Argentina, where S. gynerioides is one of the most abundant, unpalatable perennial grass species. It was hypothesized that 1) S. clarazii and S. tenuis have a lower RLD under irrigated than under rainfed or water stress conditions, 2) S. clarazii has a greater RLD than S. gynerioides and S. tenuis under all water inputs and sampling dates, and 3) the RLD of the three species will vary with sampling date, within each species and soil water level. Results led to reject hypothesis 1 and accept hypotheses 2 and 3. Maintenance of root growth under all water inputs would allow these species a greater soil exploration and resource finding to sustain regrowth in their native, semiarid environments. Also, the study demonstrated that late-seral perennial grasses (S. clarazii) should have a superior competitive ability than earlier seral grasses (S. tenuis and S. gynerioides) because of, at least in part, their greater average RLD under water stress, rainfed and irrigated conditions.La información sobre densidad de longitud de raíces (DLR) es escasa en gramíneas perennes de diferentes estados sucesionales expuestas a varios niveles hídricos del suelo. Los efectos de distintos niveles hídricos sobre la DLR fueron evaluados en gramíneas perennes de etapas sucesionales tardías (Stipa clarazii Ball.), intermedias (S. tenuis Phil.), y tempranas (S. gynerioides Phil.). Se condujeron estudios de campo en 1996 y principios de 1997, aunque los tratamientos fueron impuestos desde 1995. S. clarazii y S. tenuis son importantes gramíneas perennes cespitosas palatables en pastizales templados semiáridos del centro de Argentina, donde S. gynerioides es una de las especies de gramíneas perennes no palatables más abundantes. Se probaron las siguientes hipótesis: 1) S. clarazii y S. tenuis tienen menor DLR bajo condiciones de riego que bajo condiciones naturales o estrés hídrico, 2) S. clarazii tiene mayor DLR que S. gynerioides y S. tenuis bajo todos los niveles hídricos y fechas de muestreo, y 3) la DLR de las tres especies varía con la fecha de muestreo, dentro de cada especie y nivel hídrico del suelo. Los resultados condujeron a rechazar la hipótesis 1 y aceptar las hipótesis 2 y 3. El mantenimiento del crecimiento radical en todos los niveles hídricos les permitiría a estas especies una mayor exploración del volumen del suelo y adquisición de recursos que mantengan el rebrote en sus ambientes nativos, semiáridos. El estudio también demostró que las gramíneas perennes de estados sucesionales tardíos (S. clarazii) deberían tener mayor capacidad competitiva que especies más tempranas en la sucesión vegetal (S. tenuis y S. gynerioides) debido, al menos en parte, a su mayor DLR promedio bajo condiciones de estrés hídrico, naturales y de riego.EEA BordenaveFil: Busso, Carlos Alberto. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Centro de Recursos Naturales Renovables de la Zona Semiárida; ArgentinaFil: Busso, Carlos Alberto. Universidad Nacional del Sur. Centro de Recursos Naturales Renovables de la Zona Semiárida; Argentina.Fil: Busso, Carlos Alberto. Universidad Nacional del Sur. Departamento de Agronomía; ArgentinaFil: Bolletta, Andrea Ivana. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Bordenave; Argentina
    corecore