86,230 research outputs found
Language Identification Using Visual Features
Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to iden- tify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discrimi- nating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artefacts, either visual (e.g. skin-tone) or audio (e.g. rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of < 10% in discriminating between Arabic and English on 19 speakers and using about 30s of visual speech
The DRIVE-SAFE project: signal processing and advanced information technologies for improving driving prudence and accidents
In this paper, we will talk about the Drivesafe project whose aim is creating conditions for prudent driving on highways and roadways with the purposes of reducing accidents caused by driver behavior. To achieve these primary goals, critical data is being collected from multimodal sensors (such as cameras, microphones, and other sensors) to build a unique databank on driver behavior. We are developing system and technologies for analyzing the data and automatically determining potentially dangerous situations (such as driver fatigue, distraction, etc.). Based on the findings from these studies, we will propose systems for warning the drivers and taking other precautionary measures to avoid accidents once a dangerous situation is detected. In order to address these issues a national consortium has been formed including Automotive Research Center (OTAM), Koç University, Istanbul Technical University, Sabancı University, Ford A.S., Renault A.S., and Fiat A. Ş
Towards Multi-Modal Interactions in Virtual Environments: A Case Study
We present research on visualization and interaction in a realistic model of an existing theatre. This existing ‘Muziek¬centrum’ offers its visitors information about performances by means of a yearly brochure. In addition, it is possible to get information at an information desk in the theatre (during office hours), to get information by phone (by talking to a human or by using IVR). The database of the theater holds the information that is available at the beginning of the ‘theatre season’. Our aim is to make this information more accessible by using multi-modal accessible multi-media web pages. A more general aim is to do research in the area of web-based services, in particu¬lar interactions in virtual environments
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
Authentication of Students and Students’ Work in E-Learning : Report for the Development Bid of Academic Year 2010/11
Global e-learning market is projected to reach $107.3 billion by 2015 according to a new report by The Global Industry Analyst (Analyst 2010). The popularity and growth of the online programmes within the School of Computer Science obviously is in line with this projection. However, also on the rise are students’ dishonesty and cheating in the open and virtual environment of e-learning courses (Shepherd 2008). Institutions offering e-learning programmes are facing the challenges of deterring and detecting these misbehaviours by introducing security mechanisms to the current e-learning platforms. In particular, authenticating that a registered student indeed takes an online assessment, e.g., an exam or a coursework, is essential for the institutions to give the credit to the correct candidate. Authenticating a student is to ensure that a student is indeed who he says he is. Authenticating a student’s work goes one step further to ensure that an authenticated student indeed does the submitted work himself. This report is to investigate and compare current possible techniques and solutions for authenticating distance learning student and/or their work remotely for the elearning programmes. The report also aims to recommend some solutions that fit with UH StudyNet platform.Submitted Versio
VoxCeleb2: Deep Speaker Recognition
The objective of this paper is speaker recognition under noisy and
unconstrained conditions.
We make two key contributions. First, we introduce a very large-scale
audio-visual speaker recognition dataset collected from open-source media.
Using a fully automated pipeline, we curate VoxCeleb2 which contains over a
million utterances from over 6,000 speakers. This is several times larger than
any publicly available speaker recognition dataset.
Second, we develop and compare Convolutional Neural Network (CNN) models and
training strategies that can effectively recognise identities from voice under
various conditions. The models trained on the VoxCeleb2 dataset surpass the
performance of previous works on a benchmark dataset by a significant margin.Comment: To appear in Interspeech 2018. The audio-visual dataset can be
downloaded from http://www.robots.ox.ac.uk/~vgg/data/voxceleb2 .
1806.05622v2: minor fixes; 5 page
- …