86,230 research outputs found

    Language Identification Using Visual Features

    Get PDF
    Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech

    The DRIVE-SAFE project: signal processing and advanced information technologies for improving driving prudence and accidents

    Get PDF
    In this paper, we will talk about the Drivesafe project whose aim is creating conditions for prudent driving on highways and roadways with the purposes of reducing accidents caused by driver behavior. To achieve these primary goals, critical data is being collected from multimodal sensors (such as cameras, microphones, and other sensors) to build a unique databank on driver behavior. We are developing system and technologies for analyzing the data and automatically determining potentially dangerous situations (such as driver fatigue, distraction, etc.). Based on the findings from these studies, we will propose systems for warning the drivers and taking other precautionary measures to avoid accidents once a dangerous situation is detected. In order to address these issues a national consortium has been formed including Automotive Research Center (OTAM), Koç University, Istanbul Technical University, Sabancı University, Ford A.S., Renault A.S., and Fiat A. Ş

    Towards Multi-Modal Interactions in Virtual Environments: A Case Study

    Get PDF
    We present research on visualization and interaction in a realistic model of an existing theatre. This existing ‘Muziek¬centrum’ offers its visitors information about performances by means of a yearly brochure. In addition, it is possible to get information at an information desk in the theatre (during office hours), to get information by phone (by talking to a human or by using IVR). The database of the theater holds the information that is available at the beginning of the ‘theatre season’. Our aim is to make this information more accessible by using multi-modal accessible multi-media web pages. A more general aim is to do research in the area of web-based services, in particu¬lar interactions in virtual environments

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    Authentication of Students and Students’ Work in E-Learning : Report for the Development Bid of Academic Year 2010/11

    Get PDF
    Global e-learning market is projected to reach $107.3 billion by 2015 according to a new report by The Global Industry Analyst (Analyst 2010). The popularity and growth of the online programmes within the School of Computer Science obviously is in line with this projection. However, also on the rise are students’ dishonesty and cheating in the open and virtual environment of e-learning courses (Shepherd 2008). Institutions offering e-learning programmes are facing the challenges of deterring and detecting these misbehaviours by introducing security mechanisms to the current e-learning platforms. In particular, authenticating that a registered student indeed takes an online assessment, e.g., an exam or a coursework, is essential for the institutions to give the credit to the correct candidate. Authenticating a student is to ensure that a student is indeed who he says he is. Authenticating a student’s work goes one step further to ensure that an authenticated student indeed does the submitted work himself. This report is to investigate and compare current possible techniques and solutions for authenticating distance learning student and/or their work remotely for the elearning programmes. The report also aims to recommend some solutions that fit with UH StudyNet platform.Submitted Versio

    VoxCeleb2: Deep Speaker Recognition

    Full text link
    The objective of this paper is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual speaker recognition dataset collected from open-source media. Using a fully automated pipeline, we curate VoxCeleb2 which contains over a million utterances from over 6,000 speakers. This is several times larger than any publicly available speaker recognition dataset. Second, we develop and compare Convolutional Neural Network (CNN) models and training strategies that can effectively recognise identities from voice under various conditions. The models trained on the VoxCeleb2 dataset surpass the performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 . 1806.05622v2: minor fixes; 5 page
    corecore