Search CORE

16 research outputs found

Analysis of deep stress field using well log and wellbore breakout data: a case study in Cretaceous oil reservoir, southwest Iran

Author: Abdideh Mohammad
Alisamir Sina
Publication venue: 'Vilnius Gediminas Technical University'
Publication date: 31/12/2018
Field of study

To identify the wellbore instability of Bangestan oil reservoir in the southwestern Iran, the direction and magnitude of stresses were determined using two different methods in this study. Results of injection test and analysis of wellbore breakouts were used to verify the accuracy of the stress profiles. In this study the Bartoon method, which using the breakout angle and strength of rock, was used. In addition, the ability of artificial neural network to estimate the elastic parameters of rock and stress field was used. The output of the neural network represents a high accuracy in the estimation of the desired parameters. In addition, the Mohr-Coulomb failure criterion was used to verify stress profiles. Estimated stresses show relative compliance with the results of injection test and Barton method. The required minimum mud pressure for preventing shear failures was calculated by using the Mohr-Coulomb failure criterion and the estimated stress profiles. The results showed a good compliance with failures which have been identified in the caliper and image logs. However, a number of noncompliance is observed in some depth. This is due to the concentration of fractures, collisions between the drill string and the wellbore wall, as well as swab and surge pressures. The stress mode is normal and strike-slip in some depth based on the estimated stress profiles. According to direction of breakouts which is clearly visible in the caliper and image logs, the minimum and maximum horizontal stresses directions were NW-SE and NE-SW, respectively. Thses directions were consistent with the direction of regional stresses in the Zagros belt

VGTU Journals (Vilnius Gediminas Technical University - Vilnius Tech)

Cross-domain Voice Activity Detection with Self-Supervised Representations

Author: Alisamir Sina
Portet Francois
Ringeval Fabien
Publication venue
Publication date: 22/09/2022
Field of study

Voice Activity Detection (VAD) aims at detecting speech segments on an audio signal, which is a necessary first step for many today's speech based applications. Current state-of-the-art methods focus on training a neural network exploiting features directly contained in the acoustics, such as Mel Filter Banks (MFBs). Such methods therefore require an extra normalisation step to adapt to a new domain where the acoustics is impacted, which can be simply due to a change of speaker, microphone, or environment. In addition, this normalisation step is usually a rather rudimentary method that has certain limitations, such as being highly susceptible to the amount of data available for the new domain. Here, we exploited the crowd-sourced Common Voice (CV) corpus to show that representations based on Self-Supervised Learning (SSL) can adapt well to different domains, because they are computed with contextualised representations of speech across multiple domains. SSL representations also achieve better results than systems based on hand-crafted representations (MFBs), and off-the-shelf VADs, with significant improvement in cross-domain settings

arXiv.org e-Print Archive

AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition

Author: Alisamir Sina
Amiriparian Shahin
Cowie Roddy
Cummins Nicholas
Liu Shuo
Mallol-Ragolta Adria
Messner Eva-Maria
Pantic Maja
Ren Zhao
Ringeval Fabien
Schmitt Maximilian
Schuller Björn
Soleymani Mohammad
Song Siyang
Tavabi Leili
Valstar Michel
Zhao Ziping
Publication venue
Publication date: 01/01/2019
Field of study

The Audio/Visual Emotion Challenge and Workshop (AVEC 2019) "State-of-Mind, Detecting Depression with AI, and Cross-cultural Affect Recognition" is the ninth competition event aimed at the comparison of multimedia processing and machine learning methods for automatic audiovisual health and emotion analysis, with all participants competing strictly under the same conditions. The goal of the Challenge is to provide a common benchmark test set for multimodal information processing and to bring together the health and emotion recognition communities, as well as the audiovisual processing communities, to compare the relative merits of various approaches to health and emotion recognition from real-life data. This paper presents the major novelties introduced this year, the challenge guidelines, the data used, and the performance of the baseline systems on the three proposed tasks: state-of-mind recognition, depression assessment with AI, and cross-cultural affect sensing, respectively

arXiv.org e-Print Archive

OPUS Augsburg

Crossref

ZENODO

University of Twente Research Information

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Dinarelli Marco
Esteve Yannick
Evain Solene
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Ringeval Fabien
Rossato Solange
Schwab Didier
Tomashenko Natalia
Tong Ziyi
Publication venue
Publication date: 10/06/2021
Field of study

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.Comment: Will be presented at Interspeech 202

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Hal-Diderot

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Coavoux Maximin
Dinarelli Marco
Esteve Yannick
Evain Solene
Goulian Jerome
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Pupier Adrien
Ringeval Fabien
Rossato Solange
Rouvier Mickael
Schwab Didier
Tomashenko Natalia
Zhang Shucong
Publication venue
Publication date: 11/09/2023
Field of study

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

arXiv.org e-Print Archive

On the evolution of speech representations for affective computing A brief history and critical overview

Author: Alisamir Sina
Ringeval Fabien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Recent advances in the field of machine learning have shown great potential for the automatic recognition of apparent human emotions. In the era of Internet of Things and big-data processing, where voice-based systems are well established, opportunities to leverage cutting-edge technologies to develop personalized and human-centered services are genuinely real, with a growing demand in many areas such as education, health, well-being, and entertainment. Automatic emotion recognition from speech, which is a key element for developing personalized and human-centered services, has reached a degree of maturity that makes it of broad commercial interest today. However, there are still major limiting factors that prevent a broad applicability of emotion recognition technology. For example, one open challenge is the poor generalization capabilities of currently used feature extraction techniques to interpret expressions of affect across different persons, contexts, cultures, and languages

Hal - Université Grenoble Alpes

EP004198978A1

Author: Alisamir Sina
Ringeval Fabien
Publication venue: HAL CCSD
Publication date: 21/06/2023
Field of study

The invention relates to a computer implemented method for real-time emotion recognition from a real-time audio signal. The method includes transcribing, into text, an audio speech signal contained in the audio signal by an automatic speech recognition model, and computing, by a speech representation model, a joint representation vector corresponding to a joint representation of the speech as a function of the speech signal and the text. The method also include computing, by an emotion prediction model, an emotion embedding vector as a function of the joint representation vector, and mapping the emotion in at least one emotional frame, according to the emotion embedding vector, by an emotion mapping model. The invention further relates to a computer program and a device implementing such a method

Hal - Université Grenoble Alpes

Reconnaissance d'affects multi-corpus avec Emotion Embeddings et représentations auto-supervisées de la parole

Author: Alisamir Sina
Portet François
Ringeval Fabien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/10/2022
Field of study

International audienceSpeech emotion recognition systems use data-driven machine learning techniques that rely on annotated corpora. To achieve a usable performance in real-life, we need to exploit multiple different datasets since each one can shed the light on some specific expression of affect. However, different corpora use subjectively defined annotation schemes, which poses a challenge to train a model that can sense similar emotions across different corpora. Here, we propose a method that can relate similar emotions across corpora without being explicitly trained for it. Our method relies on self-supervised representations, which can provide us with highly contextualised speech representations, and multi-task learning paradigms. This allows to train on different corpora without changing their labelling schemes. The results show that by fine-tuning self-supervised representations on each corpus separately, we can significantly improve the state of the art within-corpus performance. We further demonstrate that by using multiple corpora during the training of the same model, we can improve the cross-corpus performance, and show that our emotion embeddings can effectively recognise the same emotions across different corpora.Les systèmes de reconnaissance vocale des émotions utilisent des techniques d'apprentissage automatique basées sur des données qui se fondent sur des corpus annotés. Pour obtenir une performance utilisable dans la vie réelle, nous devons exploiter plusieurs ensembles de données différents, car chacun d'entre eux peut mettre en lumière une expression spécifique de l'affect. Cependant, les différents corpus utilisent des schémas d'annotation définis de manière subjective, ce qui pose un problème pour former un modèle capable de détecter des émotions similaires dans différents corpus. Nous proposons ici une méthode qui permet d'établir des liens entre des émotions similaires dans différents corpus sans qu'il soit nécessaire de s'entraîner explicitement à cette fin. Notre méthode s'appuie sur des représentations auto-supervisées, qui peuvent nous fournir des représentations vocales hautement contextualisées, et sur des paradigmes d'apprentissage multi-tâches. Cela permet de s'entraîner sur différents corpus sans changer leurs schémas d'étiquetage. Les résultats montrent qu'en affinant les représentations auto-supervisées sur chaque corpus séparément, nous pouvons améliorer de manière significative les performances de l'état de l'art au sein du corpus. Nous démontrons également qu'en utilisant plusieurs corpus pendant l'entraînement du même modèle, nous pouvons améliorer la performance inter-corpus, et nous montrons que nos représentations d'émotions peuvent reconnaître efficacement les mêmes émotions dans différents corpus

Hal - Université Grenoble Alpes

LeBenchmark, un référentiel d'évaluation pour le français oral *

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Dinarelli Marco
Estève Yannick
Evain Solène
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet François
Ringeval Fabien
Rossato Solange
Schwab Didier
Tomashenko Natalia
Tong Ziyi
Zanon Boito Marcely
Publication venue: HAL CCSD
Publication date: 13/06/2022
Field of study

International audienceL'apprentissage autosupervisé a apporté des améliorations remarquables dans de nombreux domaines tels que la vision par ordinateur ou le traitement de la langue et de la parole, en exploitant de grandes quantités de données non étiquetées. Dans le contexte spécifique de la parole, cependant, et malgré des résultats prometteurs, il existe un manque évident de normalisation dans les processus d'évaluation permettant des comparaisons précises de ces modèles, en particulier pour les autres langues que l'anglais. Nous présentons ici à la communauté francophone LeBenchmark, un cadre de référence en sources ouvertes et reproductible pour évaluer des modèles autosupervisés à partir de corpus de parole en français. Il est composé de quatre tâches : reconnaissance automatique de la parole, compréhension du langage parlé, traduction automatique de la parole et reconnaissance automatique d'émotions. Nous encourageons la communauté francophone à utiliser ce référentiel dans ses futures expérimentations, notamment pour l'évaluation de modèles autosupervisés

Hal - Université Grenoble Alpes