11,328 research outputs found

    Meetings and Meeting Modeling in Smart Environments

    Get PDF
    In this paper we survey our research on smart meeting rooms and its relevance for augmented reality meeting support and virtual reality generation of meetings in real time or off-line. The research reported here forms part of the European 5th and 6th framework programme projects multi-modal meeting manager (M4) and augmented multi-party interaction (AMI). Both projects aim at building a smart meeting environment that is able to collect multimodal captures of the activities and discussions in a meeting room, with the aim to use this information as input to tools that allow real-time support, browsing, retrieval and summarization of meetings. Our aim is to research (semantic) representations of what takes place during meetings in order to allow generation, e.g. in virtual reality, of meeting activities (discussions, presentations, voting, etc.). Being able to do so also allows us to look at tools that provide support during a meeting and at tools that allow those not able to be physically present during a meeting to take part in a virtual way. This may lead to situations where the differences between real meeting participants, human-controlled virtual participants and (semi-) autonomous virtual participants disappear

    MUSIC TO OUR EYES: ASSESSING THE ROLE OF EXPERIENCE FOR MULTISENSORY INTEGRATION IN MUSIC PERCEPTION

    Get PDF
    Based on research on the “McGurk Effect” (McGurk & McDonald, 1976) in speech perception, some researchers (e.g. Liberman & Mattingly, 1985) have argued that humans uniquely interpret auditory and visual (motor) speech signals as a single intended audiovisual articulatory gesture, and that such multisensory integration is innate and specific to language. Our goal for the present study was to determine if a McGurk-like Effect holds true for music perception as well, as a domain for which innateness and experience can be disentangled more easily than in language. We sought to investigate the effects of visual musical information on auditory music perception and judgment, the impact of music experience on such audiovisual integration, and the possible role of eye gaze patterns as a potential mediator for music experience and the extent of visual influence on auditory judgments. 108 participants (ages 18-40) completed a questionnaire and melody/rhythm perception tasks to determine music experience and abilities, and then completed speech and musical McGurk tasks. Stimuli were recorded from five sounds produced by a speaker or musician (cellist and trombonist) that ranged incrementally along a continuum from one type to another (e.g. non-vibrato to strong vibrato). In the audiovisual condition, these sounds were paired with videos of the speaker/performer producing one type of sound or another (representing either end of the continuum) such that the audio and video matched or mismatched to varying degrees. Participants indicated, on a 100-point scale, the extent to which the auditory presentation represents one end of the continuum or the other. Auditory judgments for each sound were then compared based on their visual pairings to determine the impact of visual cues on auditory judgments. Additionally, several types of music experience were evaluated as potential predictors of the degree of influence visual stimuli had on auditory judgments. Finally, eye gaze patterns were measured in a different sample of 15 participants to assess relationships between music experience and eye gaze patterns, and eye gaze patterns and extent of visual on auditory judgments. Results indicated a reliable “musical McGurk Effect” in the context of cello vibrato sounds, but weaker overall effects for trombone vibrato sounds and cello pluck and bow sounds. Limited evidence was found to suggest that music experience impacts the extent to which individuals are influenced by visual stimuli when making auditory judgments. The support that was obtained, however, indicated the possibility for diminished visual influence on auditory judgments based on variables associated with music “production” experience. Potential relationships between music experience and eye-gaze patterns were identified. Implications for audiovisual integration in the context of speech and music perception are discussed, and future directions advised

    THE EMIBO CORPUS A resource for investigating lecture discourse across disciplines and lecture modes in an EMI context

    Get PDF
    The aim of this paper is to introduce and describe the EmiBO corpus and present some initial data. EmiBO is a corpus of transcribed Master’s degree university lectures in English given by Italian lecturers, featuring different disciplines and lecture modes. The corpus is constantly being expanded as new recordings are acquired and their transcriptions added. At present it includes 21 complete lecture events by 14 different lecturers in Engineering and Economics subjects, corresponding to 36 lecture hours and just over 200,000 words. Lecturer and student participant turns are annotated. One part of the corpus includes transcripts of audio and video recordings of face-to-face (F2F) lectures, while the other features transcripts of online lectures, including written elements in the chat. The inclusion of audio and video recordings of different lecture modes make it possible to focus on the interplay between spoken and written input, image and body language, while variations in communicative practices may be tracked as new lectures by the same speaker are added. The different modes brought together in a single corpus constitute a unique opportunity to investigate and compare language and non-verbal elements across EMI lecture contexts. Insights are given into the hitherto under-investigated features of Online Distance Learning in EMI, thus being of interest to others besides EMI scholars. Also of note is that non-native English speaking lecturer discourse practices may be compared cross-sectionally across different modes from a truly ELF-oriented perspective. The paper presents and comments quantitative data resulting from corpus analysis as well as outlining some initial qualitative explorations with suggestions for further development

    Investigating Automatic Dominance Estimation in Groups From Visual Attention and Speaking Activity

    Get PDF
    We study the automation of the visual dominance ratio (VDR); a classic measure of displayed dominance in social psychology literature, which combines both gaze and speaking activity cues. The VDR is modified to estimate dominance in multi-party group discussions where natural verbal exchanges occur and other visual targets such as a table and slide screen are present. Our findings suggest that fully automated versions of these measures can estimate effectively the most dominant person in a meeting and can approximate the dominance estimation performance when manual labels of visual attention are used

    Intersensory Redundancy and Infant Selective Attention to Audiovisual Speech

    Get PDF
    The current study utilized eye-tracking to investigate the effects of intersensory redundancy on infant visual attention and discrimination of a change in prosody in native and non-native audiovisual speech. The Intersensory Redundancy Hypothesis states synchronous and redundant presentation of bimodal stimuli selectively recruits infant attention to and facilitates processing of amodal stimulus properties (Bahrick & Lickliter, 2000). Twelve-month-old monolingual English learning infants viewed either synchronous (redundant) or asynchronous (non-redundant) video clips of a woman speaking in English (native speech) or Spanish (non-native speech). Halfway through each trial, the speaker changed prosody from adult-directed speech (ADS) to infant-directed speech (IDS) or vice versa. Participants completed four 1-min trials, counter-balanced for order. I hypothesized intersensory redundancy would direct infant attention to amodal properties of speech and facilitate discrimination of a change in prosody. Specifically, I predicted infants in the synchronous condition would demonstrate differential scanning of the face based on changes in prosody on both English and Spanish trials. I predicted infants in the asynchronous condition would only demonstrate differential scanning patterns based on a change in prosody on English trials. The analyses revealed a main effect of prosody. Infants focused their visual attention more on the mouth of the speaker on IDS trials in comparison to ADS trials regardless of language or redundancy. There was also an interaction of prosody and language on infants\u27 selective attention. Infants focused more on the nose during English ADS speech in comparison to English IDS speech. These results indicate IDS directs infant attention to the mouth of speakers. In the analysis of detection of a change in prosody, infants in the synchronous condition showed significant differences in looking during the second block of trials depicting English ADS changing to English IDS. This effect may have been due to an interaction of the greater salience of IDS, the infants\u27 extensive experience with their native language, and the facilitating effects of intersensory redundancy for detecting changes in prosody. Overall, these findings exemplify the complexity of development and indicate multiple factors interact to affect infants\u27 visual attention and their ability to discriminate changes in prosody in audiovisual speech

    A survey on security analysis of Amazon echo devices

    Get PDF
    Since its launch in 2014, Amazon Echo family of devices has seen a considerable increase in adaptation in consumer homes and offices. With a market worth millions of dollars, Echo is used for diverse tasks such as accessing online information, making phone calls, purchasing items, and controlling the smart home. Echo offers user-friendly voice interaction to automate everyday tasks making it a massive success. Though many people view Amazon Echo as a helpful assistant at home or office, few know its underlying security and privacy implications. In this paper, we present the findings of our research on Amazon Echo’s security and privacy concerns. The findings are divided into different categories by vulnerability or attacks. The proposed mitigation(s) to the vulnerabilities are also presented in the paper. We conclude that though numerous privacy concerns and security vulnerabilities associated with the device are mitigated, many vulnerabilities still need to be addressed

    How visual cues to speech rate influence speech perception

    No full text
    Spoken words are highly variable and therefore listeners interpret speech sounds relative to the surrounding acoustic context, such as the speech rate of a preceding sentence. For instance, a vowel midway between short /ɑ/ and long /a:/ in Dutch is perceived as short /ɑ/ in the context of preceding slow speech, but as long /a:/ if preceded by a fast context. Despite the well-established influence of visual articulatory cues on speech comprehension, it remains unclear whether visual cues to speech rate also influence subsequent spoken word recognition. In two ‘Go Fish’-like experiments, participants were presented with audio-only (auditory speech + fixation cross), visual-only (mute videos of talking head), and audiovisual (speech + videos) context sentences, followed by ambiguous target words containing vowels midway between short /ɑ/ and long /a:/. In Experiment 1, target words were always presented auditorily, without visual articulatory cues. Although the audio-only and audiovisual contexts induced a rate effect (i.e., more long /a:/ responses after fast contexts), the visual-only condition did not. When, in Experiment 2, target words were presented audiovisually, rate effects were observed in all three conditions, including visual-only. This suggests that visual cues to speech rate in a context sentence influence the perception of following visual target cues (e.g., duration of lip aperture), which at an audiovisual integration stage bias participants’ target categorization responses. These findings contribute to a better understanding of how what we see influences what we hear

    Can you please turn your cameras on? Communication Apprehension and Teleconferencing

    Get PDF
    With the onset of the COVID-19 pandemic, a multitude of normalities in individuals\u27 lives had to change to continue moving forward. The world began to embrace new technologies that allowed individuals to be connected while physically apart. One of the most embraced technologies was teleconferencing. Teleconferencing is not a new technology with the first primitive form being created in 1968, however, it was not until the world had to embrace teleconferencing during the COVID-19 pandemic did the technologies became a common part of everyday life. The term Zoom is now synonymous with video chatting and conferencing becoming a part of society\u27s lexicon similar to the terms Xerox and Band-Aid. Zoom has begun to reshape how individuals communicate. Teleconferencing has created a new mode of communication to be explored adding to the extensive list of emerging technologies that have expanded virtual communication. With emerging technologies, it is critical to explore communication apprehension\u27s effect in these new terrains. Communication apprehension (CA) is the extent to which individuals feel fear or anxiety while communicating or prior to communicating. Teleconferencing environments have changed the way individuals experience communication apprehension. This study seeks to explore the impact of teleconferencing technologies and how communication apprehension manifests itself in online scenarios. This study will use qualitative research methods since there has been little research on video conferencing and communication apprehension. By understanding how communication apprehension occurs in teleconferencing, society can better understand ways to reduce this apprehension and refine their communication skills
    • 

    corecore