1,911 research outputs found

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed

    Feature Trajectory Dynamic Time Warping for Clustering of Speech Segments

    Get PDF
    Dynamic time warping (DTW) can be used to compute the similarity between two sequences of generally differing length. We propose a modification to DTW that performs individual and independent pairwise alignment of feature trajectories. The modified technique, termed feature trajectory dynamic time warping (FTDTW), is applied as a similarity measure in the agglomerative hierarchical clustering of speech segments. Experiments using MFCC and PLP parametrisations extracted from TIMIT and from the Spoken Arabic Digit Dataset (SADD) show consistent and statistically significant improvements in the quality of the resulting clusters in terms of F-measure and normalised mutual information (NMI).Comment: 10 page

    Sign language video retrieval with free-form textual queries

    Get PDF
    Systems that can efficiently search collections of sign language videos have been highlighted as a useful application of sign language technology. However, the problem of searching videos beyond individual keywords has received limited attention in the literature. To address this gap, in this work we introduce the task of sign language retrieval with textual queries: given a written query (e.g. a sentence) and a large collection of sign language videos, the objective is to find the signing video that best matches the written query. We propose to tackle this task by learning cross-modal embeddings on the recently introduced large-scale How2Sign dataset of American Sign Language (ASL). We identify that a key bottleneck in the performance of the system is the quality of the sign video embedding which suffers from a scarcity of labelled training data. We, therefore, propose SPOT-ALIGN, a framework for interleaving iterative rounds of sign spotting and feature alignment to expand the scope and scale of available training data. We validate the effectiveness of SPOT-ALIGN for learning a robust sign video embedding through improvements in both sign recognition and the proposed video retrieval task.This work was supported by the project PID2020-117142GB-I00, funded by MCIN/ AEI /10.13039/501100011033, ANR project CorVis ANR-21-CE23-0003- 01, and gifts from Google and Adobe. AD received support from la Caixa Foundation (ID 100010434), fellowship code LCF/BQ/IN18/11660029.Peer ReviewedObjectius de Desenvolupament Sostenible::10 - Reducció de les DesigualtatsObjectius de Desenvolupament Sostenible::10 - Reducció de les Desigualtats::10.2 - Per a 2030, potenciar i promoure la inclusió social, econòmica i política de totes les persones, independentment de l’edat, sexe, discapacitat, raça, ètnia, origen, religió, situació econòmica o altra condicióPostprint (author's final draft

    Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

    Get PDF
    Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities
    corecore