23,442 research outputs found

    Combining Multiple Views for Visual Speech Recognition

    Get PDF
    Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (30∘30^\circ) to up to 83% when combining this view with the frontal and 60∘60^\circ view angles

    Band-pass filtering of the time sequences of spectral parameters for robust wireless speech recognition

    Get PDF
    In this paper we address the problem of automatic speech recognition when wireless speech communication systems are involved. In this context, three main sources of distortion should be considered: acoustic environment, speech coding and transmission errors. Whilst the first one has already received a lot of attention, the last two deserve further investigation in our opinion. We have found out that band-pass filtering of the recognition features improves ASR performance when distortions due to these particular communication systems are present. Furthermore, we have evaluated two alternative configurations at different bit error rates (BER) typical of these channels: band-pass filtering the LP-MFCC parameters or a modification of the RASTA-PLP using a sharper low-pass section perform consistently better than LP-MFCC and RASTA-PLP, respectively.Publicad

    CHORUS Deliverable 4.3: Report from CHORUS workshops on national initiatives and metadata

    Get PDF
    Minutes of the following Workshops: ‱ National Initiatives on Multimedia Content Description and Retrieval, Geneva, October 10th, 2007. ‱ Metadata in Audio-Visual/Multimedia production and archiving, Munich, IRT, 21st – 22nd November 2007 Workshop in Geneva 10/10/2007 This highly successful workshop was organised in cooperation with the European Commission. The event brought together the technical, administrative and financial representatives of the various national initiatives, which have been established recently in some European countries to support research and technical development in the area of audio-visual content processing, indexing and searching for the next generation Internet using semantic technologies, and which may lead to an internet-based knowledge infrastructure. The objective of this workshop was to provide a platform for mutual information and exchange between these initiatives, the European Commission and the participants. Top speakers were present from each of the national initiatives. There was time for discussions with the audience and amongst the European National Initiatives. The challenges, communalities, difficulties, targeted/expected impact, success criteria, etc. were tackled. This workshop addressed how these national initiatives could work together and benefit from each other. Workshop in Munich 11/21-22/2007 Numerous EU and national research projects are working on the automatic or semi-automatic generation of descriptive and functional metadata derived from analysing audio-visual content. The owners of AV archives and production facilities are eagerly awaiting such methods which would help them to better exploit their assets.Hand in hand with the digitization of analogue archives and the archiving of digital AV material, metadatashould be generated on an as high semantic level as possible, preferably fully automatically. All users of metadata rely on a certain metadata model. All AV/multimedia search engines, developed or under current development, would have to respect some compatibility or compliance with the metadata models in use. The purpose of this workshop is to draw attention to the specific problem of metadata models in the context of (semi)-automatic multimedia search

    Expert Finding by Capturing Organisational Knowledge from Legacy Documents

    No full text
    Organisations capitalise on their best knowledge through the improvement of shared expertise which leads to a higher level of productivity and competency. The recognition of the need to foster the sharing of expertise has led to the development of expert finder systems that hold pointers to experts who posses specific knowledge in organisations. This paper discusses an approach to locating an expert through the application of information retrieval and analysis processes to an organization’s existing information resources, with specific reference to the engineering design domain. The approach taken was realised through an expert finder system framework. It enables the relationships of heterogeneous information sources with experts to be factored in modelling individuals’ expertise. These valuable relationships are typically ignored by existing expert finder systems, which only focus on how documents relate to their content. The developed framework also provides an architecture that can be easily adapted to different organisational environments. In addition, it also allows users to access the expertise recognition logic, giving them greater trust in the systems implemented using this framework. The framework were applied to real world application and evaluated within a major engineering company
    • 

    corecore