16 research outputs found
Analysis of deep stress field using well log and wellbore breakout data: a case study in Cretaceous oil reservoir, southwest Iran
To identify the wellbore instability of Bangestan oil reservoir in the southwestern Iran, the direction and magnitude of stresses were determined using two different methods in this study. Results of injection test and analysis of wellbore breakouts were used to verify the accuracy of the stress profiles. In this study the Bartoon method, which using the breakout angle and strength of rock, was used.
In addition, the ability of artificial neural network to estimate the elastic parameters of rock and stress field was used. The output of the neural network represents a high accuracy in the estimation of the desired parameters. In addition, the Mohr-Coulomb failure criterion was used to verify stress profiles. Estimated stresses show relative compliance with the results of injection test and Barton method. The required minimum mud pressure for preventing shear failures was calculated by using the Mohr-Coulomb failure criterion and the estimated stress profiles. The results showed a good compliance with failures which have been identified in the caliper and image logs. However, a number of noncompliance is observed in some depth. This is due to the concentration of fractures, collisions between the drill string and the wellbore wall, as well as swab and surge pressures. The stress mode is normal and strike-slip in some depth based on the estimated stress profiles. According to direction of breakouts which is clearly visible in the caliper and image logs, the minimum and maximum horizontal stresses directions were NW-SE and NE-SW, respectively. Thses directions were consistent with the direction of regional stresses in the Zagros belt
Cross-domain Voice Activity Detection with Self-Supervised Representations
Voice Activity Detection (VAD) aims at detecting speech segments on an audio
signal, which is a necessary first step for many today's speech based
applications. Current state-of-the-art methods focus on training a neural
network exploiting features directly contained in the acoustics, such as Mel
Filter Banks (MFBs). Such methods therefore require an extra normalisation step
to adapt to a new domain where the acoustics is impacted, which can be simply
due to a change of speaker, microphone, or environment. In addition, this
normalisation step is usually a rather rudimentary method that has certain
limitations, such as being highly susceptible to the amount of data available
for the new domain. Here, we exploited the crowd-sourced Common Voice (CV)
corpus to show that representations based on Self-Supervised Learning (SSL) can
adapt well to different domains, because they are computed with contextualised
representations of speech across multiple domains. SSL representations also
achieve better results than systems based on hand-crafted representations
(MFBs), and off-the-shelf VADs, with significant improvement in cross-domain
settings
AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition
The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind,
Detecting Depression with AI, and Cross-cultural Affect Recognition" is the
ninth competition event aimed at the comparison of multimedia processing and
machine learning methods for automatic audiovisual health and emotion analysis,
with all participants competing strictly under the same conditions. The goal of
the Challenge is to provide a common benchmark test set for multimodal
information processing and to bring together the health and emotion recognition
communities, as well as the audiovisual processing communities, to compare the
relative merits of various approaches to health and emotion recognition from
real-life data. This paper presents the major novelties introduced this year,
the challenge guidelines, the data used, and the performance of the baseline
systems on the three proposed tasks: state-of-mind recognition, depression
assessment with AI, and cross-cultural affect sensing, respectively
LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
Self-Supervised Learning (SSL) using huge unlabeled data has been
successfully explored for image and natural language processing. Recent works
also investigated SSL from speech. They were notably successful to improve
performance on downstream tasks such as automatic speech recognition (ASR).
While these works suggest it is possible to reduce dependence on labeled data
for building efficient speech systems, their evaluation was mostly made on ASR
and using multiple and heterogeneous experimental settings (most of them for
English). This questions the objective comparison of SSL approaches and the
evaluation of their impact on building speech systems. In this paper, we
propose LeBenchmark: a reproducible framework for assessing SSL from speech. It
not only includes ASR (high and low resource) tasks but also spoken language
understanding, speech translation and emotion recognition. We also focus on
speech technologies in a language different than English: French. SSL models of
different sizes are trained from carefully sourced and documented datasets.
Experiments show that SSL is beneficial for most but not all tasks which
confirms the need for exhaustive and reliable benchmarks to evaluate its real
impact. LeBenchmark is shared with the scientific community for reproducible
research in SSL from speech.Comment: Will be presented at Interspeech 202
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Self-supervised learning (SSL) is at the origin of unprecedented improvements
in many different domains including computer vision and natural language
processing. Speech processing drastically benefitted from SSL as most of the
current domain-related tasks are now being approached with pre-trained models.
This work introduces LeBenchmark 2.0 an open-source framework for assessing and
building SSL-equipped French speech technologies. It includes documented,
large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous
speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to
one billion learnable parameters shared with the community, and an evaluation
protocol made of six downstream tasks to complement existing benchmarks.
LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for
speech with the investigation of frozen versus fine-tuned downstream models,
task-agnostic versus task-specific pre-trained models as well as a discussion
on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe
On the evolution of speech representations for affective computing A brief history and critical overview
Recent advances in the field of machine learning have shown great potential for the automatic recognition of apparent human emotions. In the era of Internet of Things and big-data processing, where voice-based systems are well established, opportunities to leverage cutting-edge technologies to develop personalized and human-centered services are genuinely real, with a growing demand in many areas such as education, health, well-being, and entertainment. Automatic emotion recognition from speech, which is a key element for developing personalized and human-centered services, has reached a degree of maturity that makes it of broad commercial interest today. However, there are still major limiting factors that prevent a broad applicability of emotion recognition technology. For example, one open challenge is the poor generalization capabilities of currently used feature extraction techniques to interpret expressions of affect across different persons, contexts, cultures, and languages
*EP004198978A1*
The invention relates to a computer implemented method for real-time emotion recognition from a real-time audio signal. The method includes transcribing, into text, an audio speech signal contained in the audio signal by an automatic speech recognition model, and computing, by a speech representation model, a joint representation vector corresponding to a joint representation of the speech as a function of the speech signal and the text. The method also include computing, by an emotion prediction model, an emotion embedding vector as a function of the joint representation vector, and mapping the emotion in at least one emotional frame, according to the emotion embedding vector, by an emotion mapping model. The invention further relates to a computer program and a device implementing such a method
Reconnaissance d'affects multi-corpus avec Emotion Embeddings et représentations auto-supervisées de la parole
International audienceSpeech emotion recognition systems use data-driven machine learning techniques that rely on annotated corpora. To achieve a usable performance in real-life, we need to exploit multiple different datasets since each one can shed the light on some specific expression of affect. However, different corpora use subjectively defined annotation schemes, which poses a challenge to train a model that can sense similar emotions across different corpora. Here, we propose a method that can relate similar emotions across corpora without being explicitly trained for it. Our method relies on self-supervised representations, which can provide us with highly contextualised speech representations, and multi-task learning paradigms. This allows to train on different corpora without changing their labelling schemes. The results show that by fine-tuning self-supervised representations on each corpus separately, we can significantly improve the state of the art within-corpus performance. We further demonstrate that by using multiple corpora during the training of the same model, we can improve the cross-corpus performance, and show that our emotion embeddings can effectively recognise the same emotions across different corpora.Les systèmes de reconnaissance vocale des émotions utilisent des techniques d'apprentissage automatique basées sur des données qui se fondent sur des corpus annotés. Pour obtenir une performance utilisable dans la vie réelle, nous devons exploiter plusieurs ensembles de données différents, car chacun d'entre eux peut mettre en lumière une expression spécifique de l'affect. Cependant, les différents corpus utilisent des schémas d'annotation définis de manière subjective, ce qui pose un problème pour former un modèle capable de détecter des émotions similaires dans différents corpus. Nous proposons ici une méthode qui permet d'établir des liens entre des émotions similaires dans différents corpus sans qu'il soit nécessaire de s'entraîner explicitement à cette fin. Notre méthode s'appuie sur des représentations auto-supervisées, qui peuvent nous fournir des représentations vocales hautement contextualisées, et sur des paradigmes d'apprentissage multi-tâches. Cela permet de s'entraîner sur différents corpus sans changer leurs schémas d'étiquetage. Les résultats montrent qu'en affinant les représentations auto-supervisées sur chaque corpus séparément, nous pouvons améliorer de manière significative les performances de l'état de l'art au sein du corpus. Nous démontrons également qu'en utilisant plusieurs corpus pendant l'entraînement du même modèle, nous pouvons améliorer la performance inter-corpus, et nous montrons que nos représentations d'émotions peuvent reconnaître efficacement les mêmes émotions dans différents corpus
LeBenchmark, un référentiel d'évaluation pour le français oral *
International audienceL'apprentissage autosupervisé a apporté des améliorations remarquables dans de nombreux domaines tels que la vision par ordinateur ou le traitement de la langue et de la parole, en exploitant de grandes quantités de données non étiquetées. Dans le contexte spécifique de la parole, cependant, et malgré des résultats prometteurs, il existe un manque évident de normalisation dans les processus d'évaluation permettant des comparaisons précises de ces modèles, en particulier pour les autres langues que l'anglais. Nous présentons ici à la communauté francophone LeBenchmark, un cadre de référence en sources ouvertes et reproductible pour évaluer des modèles autosupervisés à partir de corpus de parole en français. Il est composé de quatre tâches : reconnaissance automatique de la parole, compréhension du langage parlé, traduction automatique de la parole et reconnaissance automatique d'émotions. Nous encourageons la communauté francophone à utiliser ce référentiel dans ses futures expérimentations, notamment pour l'évaluation de modèles autosupervisés