Search CORE

1,333 research outputs found

Recommended from our members

Performing television history

Author: Allain P
Anon
Baron C
Brown L
Butler J
Butler J
Cornea C
Dovey J
Ellis J
Gitlin T
Goffman E
Hawes W
Hewett R
Hilmes M
Hilmes M
Jenkins H
Kepley V
Lovell A
MacDonald JF
Marc D
Milch D
Mills B
Neale S
Nichols B
Paget D
Palmer G
Pearson R
Raeside J
Scannell P
Schechner R
Uricchio W
Wood H
Woods F
Publication venue: 'SAGE Publications'
Publication date: 01/09/2018
Field of study

An expanded conception of performance study can disturb current theoretical and historical assumptions about television’s medial identity. The article considers how to write histories of the dominant forms and assumptions about performance in British and American television drama, and analyses how acting is situated in relation to the multiple meaning-making components of television. A longitudinal, wide-ranging analysis is briefly sketched to show that the concept of performance, from acting to the display of television’s mediating capability, can extend to the analysis of how the television medium ‘performed’ its own identity to shape its distinctiveness in specific historical circumstances

Central Archive at the University of Reading

Crossref

Speaker segmentation and clustering

Author: Ajmera
Ajmera
Almpanidis
Barras
Bimbot
Campbell
Campbell
Cettolo
Constantine Kotropoulos
Delacourt
Deller
Fiscus
Gales
Garofolo
Godfrey
Graff
Graff
Graff
Hansen
Harb
Hess
Huang
Jain
Kim
Know
Lapidot
Lu
Manjunath
Margarita Kotti
Meignier
Oppenheim
Pellom
Reynolds
Sondhi
Tranter
Vassiliki Moschou
Ververidis
Wang
Wu
Wu
Zhou
Zhu
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

This survey focuses on two challenging speech processing topics, namely: speaker segmentation and speaker clustering. Speaker segmentation aims at finding speaker change points in an audio stream, whereas speaker clustering aims at grouping speech segments based on speaker characteristics. Model-based, metric-based, and hybrid speaker segmentation algorithms are reviewed. Concerning speaker clustering, deterministic and probabilistic algorithms are examined. A comparative assessment of the reviewed algorithms is undertaken, the algorithm advantages and disadvantages are indicated, insight to the algorithms is offered, and deductions as well as recommendations are given. Rich transcription and movie analysis are candidate applications that benefit from combined speaker segmentation and clustering. © 2007 Elsevier B.V. All rights reserved

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

Service public, société de l'information et Internet

Author: SCHEITER Amit
Publication venue: INA, Bry-sur-Marne (FRA)
Publication date: 01/01/2005
Field of study

I-Revues

Investigating the Effects of Training Set Synthesis for Audio Segmentation of Radio Broadcast

Author: Miranda ER
Moffat D
Venkatesh S
Publication venue: 'MDPI AG'
Publication date: 31/03/2021
Field of study

Special Issue "Machine Learning Applied to Music/Audio Signal Processing"Music and speech detection provides us valuable information regarding the nature of content in broadcast audio. It helps detect acoustic regions that contain speech, voice over music, only music, or silence. In recent years, there have been developments in machine learning algorithms to accomplish this task. However, broadcast audio is generally well-mixed and copyrighted, which makes it challenging to share across research groups. In this study, we address the challenges encountered in automatically synthesising data that resembles a radio broadcast. Firstly, we compare state-of-the-art neural network architectures such as CNN, GRU, LSTM, TCN, and CRNN. Later, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Thirdly, we examine how the quantity of synthetic training data impacts the results. Finally, we evaluate the effectiveness of synthesised, real-world, and combined approaches for training models, to understand if the synthetic data presents any additional value. Amongst the network architectures, CRNN was the best performing network. Results also show that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. After testing our model on in-house and public datasets, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative

Multidisciplinary Digital Publishing Institute

Plymouth Electronic Archive and Research Library

Recherche du rôle des intervenants et de leurs interactions pour la structuration de documents audiovisuels

Author: Bigot Benjamin
Publication venue
Publication date: 06/07/2011
Field of study

Nous présentons un système de structuration automatique d'enregistrements audiovisuels s'appuyant sur des informations non lexicales caractéristiques des rôles des intervenants et de leurs interactions. Dans une première étape, nous proposons une méthode de détection et de caractérisation de séquences temporelles, nommée « zones d'interaction », susceptibles de correspondre à des conversations. La seconde étape de notre système réalise une reconnaissance du rôle des intervenants : présentateur, journaliste et autre. Notre contribution au domaine de la reconnaissance automatique du rôle se distingue en reposant sur l'hypothèse selon laquelle les rôles des intervenants sont accessibles à travers des paramètres « bas-niveau » inscrits d'une part dans l'organisation temporelle des tours de parole des intervenants, dans les environnements acoustiques dans lesquels ils apparaissent, ainsi que dans plusieurs paramètres prosodiques (intonation et débit). Dans une dernière étape, nous combinons l'information du rôle des intervenants à la connaissance des séquences d'interaction afin de produire deux niveaux de description du contenu des documents. Le premier niveau de description segmente les enregistrements en zones de 4 types : informations, entretiens, transition et intermède. Un second niveau de description classe les zones d'interaction orales en 4 catégories : débat, interview, chronique et relais. Chaque étape du système est validée par une grand nombre d'expériences menées sur le corpus du projet EPAC et celui de la campagne d'évaluation ESTER.We present a system for audiovisual document structuring, based-on speaker role recognition and speech interaction zone detection. The first stage of our system consists in an automatic method for speech interaction zones detection and characterization. Such zones correspond to temporal sequences of documents which potentially contain conversations between speakers. The second stage of our system achieves the recognition of speaker roles : anchorman, journalist and other. Our contribution to this domain is based on the hypothesis that cues about speaker roles are available through low-level features extracted from the temporal organization of turn-takings and from acoustic and prosodic features (speech rate and pitch). In the last stage of our system, we combine speaker roles and speech interaction zones to provide two descriptive layers of the audiovisual document contents. The first descriptive layer gathers segments of 4 types : informations, meeting, transition and interlude. The second descriptive layer consists in a classification of speech interaction zones into 4 categories : debate, interview, chronicle and relay. Each step of the system has been evaluated using a large number of experiments realized using the EPAC project and ESTER campaign corpora

Thèses en Ligne

Scientific Publications of the University of Toulouse II Le Mirail

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

SPARC 2016 Salford postgraduate annual research conference book of abstracts

Author: Abdalla KSA
Abdullah M
Abubaker MSS
Adebayo O
Adunola T
Adwik S
Aiyenitaju OT
Akinsola OO
Al-Abboodi HMA
Al-Ani A
Al-Bayati Zaid Jafer
Al-Falahy NFA
Al-Isawi Rawaa Hussein
Al-Mukaram NAR
Al-ossmi Laith Hady
Al-Turki RA
Alabdalkarim SI
Albelazi MS
Alexander D
Alkayed H
Almadani N
Almfleah AMA
Alquwez N
Alshaheen H
Amadi AI
Ashtiani F
Atallah HM
Avgoustaki F
Bakare WA
Barnes N
Barratt AE
Barton J
Brettell L
Brissenden PG
Butler DE
Byrne HM
Campbell J
Carter M
Chadwell AEA
Chandler JDB
Costamagna E
Cunningham EA
Danaan GN
Davis D
Dinu ILI
Dodd J
Doroh RM
Dron RM
Ebnmhana JA
Edge CE
Egena O
Ekuma CV
Elhoush R
Elmuntser A
Enyita C
Evangelopoulou E
Fakhrudeen A
Fawkes R
Fenton A
Forde J
Frigenti PP
Ghanbour HA
Ghulam HS
Gill N
Goyol S
Greuel H
Hadgraft N
Hall J
Hart AM
Hasan A
Haynes E
Heale R
Heaney N
Heaton C
Hussein A
Iheme JO
Iliyasu U
Iredale S
JERVIS BK
Jones C
Kassim ME
Kerr G
Keshwan AJ
Kevill J
Khan IM
Lomas M
Maghrabi A
Mahmood A
Mardan A
Martin P
Mashiter C
McCarthy R
McGown AT
Meikle GT
Menelec V
Moezoddin A
Mohammed R
Mohammed WA
Moraitis E
Morlan-Mairal M
Mosa H
Muhammad M
Munro NM
Nassrullah ZFA
Nyam C
Odufuwa K
Osman H
Panwar P
Peets AG
Phakphum A
Price A
Rajab M
Rodrigues Nogueira Forti I
Salamati Nia Seyed Payam
Sani Pour F
Sasse F
Schork IG
Scott R
Silva HCE
Stan DM
Sudarsono AS
Swamy AM
Szeto SS
Temile SO
van der Veen S
WILLETT J
Williams AE
Yadav R
Yangomodou OD
Yousef R
Publication venue: University of Salford
Publication date: 14/06/2016
Field of study

University of Salford Institutional Repository

Perspectives de valorisation des fonds d’archives sonores et audiovisuelles de la RTS pour le jeune public sur les réseaux sociaux

Author: Druey Guy
Gaudinat Arnaud
Thévoz Louise-Anne
Publication venue
Publication date: 15/01/2018
Field of study

Ce travail de Bachelor a pour objet de proposer des solutions de mise en valeur des fonds d’archives audiovisuelles et sonores de la RTS sur les réseaux sociaux. L’objectif est d’atteindre un public cible âgé de 18 à 25 ans, soit des représentants des générations Y et Z. Le service Données et Archives (D+A) de la RTS met en valeur depuis quelques années ses fonds d’archives, mais peine encore à toucher ce public, même via les médias et réseaux sociaux. Ce projet s’articule en trois parties principales. La première présente un état des lieux des pratiques de mise en valeur des archives par le service D+A, et une comparaison avec les méthodes de valorisation d’autres institutions aux missions similaires. Il est ensuite question, dans la deuxième partie de ce travail, de l’utilisation des médias et réseaux sociaux par le public cible, soit les jeunes de 18 à 25 ans. Enfin, dans la dernière partie et à partir des informations récoltées précédemment, nous nous intéressons à la mise en valeur des archives audiovisuelles et sonores, et proposons des solutions de valorisation pertinentes pour chaque type d’archives. A la suite de nos recherches, nous avons déterminé que, pour mettre en valeur de manière optimale des archives, qu’elles soient audiovisuelles ou sonores, il est nécessaire de choisir correctement les contenus à publier et de connaître le public ciblé. En effet, si certaines archives réussissent à toucher plusieurs générations, d’autres sont clairement destinées à atteindre une certaine tranche d’âge. Dans le cas des 18-25 ans, nous avons identifié un certain nombre de thématiques qui puissent les toucher, ainsi qu’une période de temps, les années 1990 à 2000, durant laquelle les archives à mettre en valeur doivent avoir été produites. Cette période correspond à l’enfance de notre public cible. Enfin, nous nous sommes concentrés plus précisément sur les archives sonores d’une part et les archives audiovisuelles d’autre part, afin de proposer des solutions de mise en valeur adaptées à chacun de ces formats. Nous avons alors déterminé qu’il était nécessaire d’illustrer les archives sonores pour qu’elles s’inscrivent mieux dans la dynamique des réseaux sociaux. Les archives audiovisuelles quant à elles devront être présentées sous la forme de courtes capsules regroupant un ou plusieurs extraits d’archives datant d’après les années 1990

RERO DOC Digital Library

Deep Learning for Audio Segmentation and Intelligent Remixing

Author: Venkatesh Satvik
Publication venue: 'University of Plymouth'
Publication date: 01/01/2022
Field of study

Audio segmentation divides an audio signal into homogenous sections such as music and speech. It is useful as a preprocessing step to index, store, and modify audio recordings, radio broadcasts and TV programmes. Machine learning models for audio segmentation are generally trained on copyrighted material, which cannot be shared across research groups. Furthermore, annotating these datasets is a time-consuming and expensive task. In this thesis, we present a novel approach that artificially synthesises data that resembles radio signals. We replicate the workflow of a radio DJ in mixing audio and investigate parameters like fade curves and audio ducking. Using this approach, we obtained state-of-the-art performance for music-speech detection on in-house and public datasets. After demonstrating the efficacy of training set synthesis, we investigate how audio ducking of background music impacts the precision and recall of the machine learning algorithm. Interestingly, we observed that the minimum level of audio ducking preferred by the machine learning algorithm was similar to that of human listeners. Furthermore, we observe that our proposed synthesis technique outperforms real-world data in some cases and serves as a promising alternative. This project also proposes a novel deep learning system called You Only Hear Once (YOHO), which is inspired by the YOLO algorithm popularly adopted in Computer Vision. We convert the detection of acoustic boundaries into a regression problem instead of frame-based classification. The relative improvement for F-measure of YOHO, compared to the state-of-the-art Convolutional Recurrent Neural Network, ranged from 1% to 6% across multiple datasets. As YOHO predicts acoustic boundaries directly, the speed of inference and post-processing steps are 6 times faster than frame-based classification. Furthermore, we investigate domain generalisation methods such as transfer learning and adversarial training. We demonstrated that these methods helped our algorithm perform better in unseen domains. In addition to audio segmentation, another objective of this project is to explore real-time radio remixing. This is a step towards building a customised radio and consequently, integrating it with the schedule of the listener. The system would remix music from the user’s personal playlist and play snippets of diary reminders at appropriate transition points. The intelligent remixing is governed by the underlying audio segmentation and other deep learning methods. We also explore how individuals can communicate with intelligent mixing systems through non-technical language. We demonstrated that word embeddings help in understanding representations of semantic descriptors

Plymouth Electronic Archive and Research Library

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive