3,611 research outputs found

    Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

    Full text link
    The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal synchrony. We build on earlier work to train embeddings that are more discriminative for uni-modal downstream tasks. To this end, we propose a novel training strategy that not only optimises metrics across modalities, but also enforces intra-class feature separation within each of the modalities. The effectiveness of the method is demonstrated on two downstream tasks: lip reading using the features trained on audio-visual synchronisation, and speaker recognition using the features trained for cross-modal biometric matching. The proposed method outperforms state-of-the-art self-supervised baselines by a signficant margin.Comment: Under submission as a conference pape

    Automatic Prediction Of Small Group Performance In Information Sharing Tasks

    Get PDF
    In this paper, we describe a novel approach, based on Markov jump processes, to model small group conversational dynamics and to predict small group performance. More precisely, we estimate conversational events such as turn taking, backchannels, turn-transitions at the micro-level (1 minute windows) and then we bridge the micro-level behavior and the macro-level performance. We tested our approach with a cooperative task, the Information Sharing task, and we verified the relevance of micro- level interaction dynamics in determining a good group performance (e.g. higher speaking turns rate and more balanced participation among group members).Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

    Towards a comprehensive 3D dynamic facial expression database

    Get PDF
    Human faces play an important role in everyday life, including the expression of person identity, emotion and intentionality, along with a range of biological functions. The human face has also become the subject of considerable research effort, and there has been a shift towards understanding it using stimuli of increasingly more realistic formats. In the current work, we outline progress made in the production of a database of facial expressions in arguably the most realistic format, 3D dynamic. A suitable architecture for capturing such 3D dynamic image sequences is described and then used to record seven expressions (fear, disgust, anger, happiness, surprise, sadness and pain) by 10 actors at 3 levels of intensity (mild, normal and extreme). We also present details of a psychological experiment that was used to formally evaluate the accuracy of the expressions in a 2D dynamic format. The result is an initial, validated database for researchers and practitioners. The goal is to scale up the work with more actors and expression types

    Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters

    Full text link
    Data privacy is crucial when dealing with biometric data. Accounting for the latest European data privacy regulation and payment service directive, biometric template protection is essential for any commercial application. Ensuring unlinkability across biometric service operators, irreversibility of leaked encrypted templates, and renewability of e.g., voice models following the i-vector paradigm, biometric voice-based systems are prepared for the latest EU data privacy legislation. Employing Paillier cryptosystems, Euclidean and cosine comparators are known to ensure data privacy demands, without loss of discrimination nor calibration performance. Bridging gaps from template protection to speaker recognition, two architectures are proposed for the two-covariance comparator, serving as a generative model in this study. The first architecture preserves privacy of biometric data capture subjects. In the second architecture, model parameters of the comparator are encrypted as well, such that biometric service providers can supply the same comparison modules employing different key pairs to multiple biometric service operators. An experimental proof-of-concept and complexity analysis is carried out on the data from the 2013-2014 NIST i-vector machine learning challenge

    Methodological issues in developing a multi-dimensional coding procedure for small group chat communication

    Get PDF
    In CSCL research, collaboration through chat has primarily been studied in dyadic settings. This article discusses three issues that emerged during the development of a multi-dimensional coding procedure for small group chat communication: a) the unit of analysis and unit fragmentation, b) the reconstruction of the response structure and c) determining reliability without overestimation. Threading, i.e. connections between analysis units, proved essential to handle unit fragmentation, to reconstruct the response structure and for reliability of coding. In addition, a risk for reliability overestimation was illustrated. Implications for analysis methodology in CSCL are discussed

    A Review of the Fingerprint, Speaker Recognition, Face Recognition and Iris Recognition Based Biometric Identification Technologies

    Get PDF
    This paper reviews four biometric identification technologies (fingerprint, speaker recognition, face recognition and iris recognition). It discusses the mode of operation of each of the technologies and highlights their advantages and disadvantages

    An investigation into the “beautification” of security ceremonies

    Get PDF
    “Beautiful Security” is a paradigm that requires security ceremonies to contribute to the ‘beauty’ of a user experience. The underlying assumption is that people are likely to be willing to engage with more beautiful security ceremonies. It is hoped that such ceremonies will minimise human deviations from the prescribed interaction, and that security will be improved as a consequence. In this paper, we explain how we went about deriving beautification principles, and how we tested the efficacy of these by applying them to specific security ceremonies. As a first step, we deployed a crowd-sourced platform, using both explicit and metaphorical questions, to extract general aspects associated with the perception of the beauty of real-world security mechanisms. This resulted in the identification of four beautification design guidelines. We used these to beautify the following existing security ceremonies: Italian voting, user-to-laptop authentication, password setup and EU premises access. To test the efficacy of our guidelines, we again leveraged crowd-sourcing to determine whether our “beautified” ceremonies were indeed perceived to be more beautiful than the original ones. The results of this initial foray into the beautification of security ceremonies delivered promising results, but must be interpreted carefully

    Machine Understanding of Human Behavior

    Get PDF
    A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior

    Analysis and Detection of Information Types of Open Source Software Issue Discussions

    Full text link
    Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019
    corecore