4,823 research outputs found

    On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification

    Get PDF
    Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Detection and handling of overlapping speech for speaker diarization

    Get PDF
    For the last several years, speaker diarization has been attracting substantial research attention as one of the spoken language technologies applied for the improvement, or enrichment, of recording transcriptions. Recordings of meetings, compared to other domains, exhibit an increased complexity due to the spontaneity of speech, reverberation effects, and also due to the presence of overlapping speech. Overlapping speech refers to situations when two or more speakers are speaking simultaneously. In meeting data, a substantial portion of errors of the conventional speaker diarization systems can be ascribed to speaker overlaps, since usually only one speaker label is assigned per segment. Furthermore, simultaneous speech included in training data can eventually lead to corrupt single-speaker models and thus to a worse segmentation. This thesis concerns the detection of overlapping speech segments and its further application for the improvement of speaker diarization performance. We propose the use of three spatial cross-correlationbased parameters for overlap detection on distant microphone channel data. Spatial features from different microphone pairs are fused by means of principal component analysis, linear discriminant analysis, or by a multi-layer perceptron. In addition, we also investigate the possibility of employing longterm prosodic information. The most suitable subset from a set of candidate prosodic features is determined in two steps. Firstly, a ranking according to mRMR criterion is obtained, and then, a standard hill-climbing wrapper approach is applied in order to determine the optimal number of features. The novel spatial as well as prosodic parameters are used in combination with spectral-based features suggested previously in the literature. In experiments conducted on AMI meeting data, we show that the newly proposed features do contribute to the detection of overlapping speech, especially on data originating from a single recording site. In speaker diarization, for segments including detected speaker overlap, a second speaker label is picked, and such segments are also discarded from the model training. The proposed overlap labeling technique is integrated in Viterbi decoding, a part of the diarization algorithm. During the system development it was discovered that it is favorable to do an independent optimization of overlap exclusion and labeling with respect to the overlap detection system. We report improvements over the baseline diarization system on both single- and multi-site AMI data. Preliminary experiments with NIST RT data show DER improvement on the RT Âż09 meeting recordings as well. The addition of beamforming and TDOA feature stream into the baseline diarization system, which was aimed at improving the clustering process, results in a bit higher effectiveness of the overlap labeling algorithm. A more detailed analysis on the overlap exclusion behavior reveals big improvement contrasts between individual meeting recordings as well as between various settings of the overlap detection operation point. However, a high performance variability across different recordings is also typical of the baseline diarization system, without any overlap handling

    Authentication of Students and Students’ Work in E-Learning : Report for the Development Bid of Academic Year 2010/11

    Get PDF
    Global e-learning market is projected to reach $107.3 billion by 2015 according to a new report by The Global Industry Analyst (Analyst 2010). The popularity and growth of the online programmes within the School of Computer Science obviously is in line with this projection. However, also on the rise are students’ dishonesty and cheating in the open and virtual environment of e-learning courses (Shepherd 2008). Institutions offering e-learning programmes are facing the challenges of deterring and detecting these misbehaviours by introducing security mechanisms to the current e-learning platforms. In particular, authenticating that a registered student indeed takes an online assessment, e.g., an exam or a coursework, is essential for the institutions to give the credit to the correct candidate. Authenticating a student is to ensure that a student is indeed who he says he is. Authenticating a student’s work goes one step further to ensure that an authenticated student indeed does the submitted work himself. This report is to investigate and compare current possible techniques and solutions for authenticating distance learning student and/or their work remotely for the elearning programmes. The report also aims to recommend some solutions that fit with UH StudyNet platform.Submitted Versio

    Survey and Systematization of Secure Device Pairing

    Full text link
    Secure Device Pairing (SDP) schemes have been developed to facilitate secure communications among smart devices, both personal mobile devices and Internet of Things (IoT) devices. Comparison and assessment of SDP schemes is troublesome, because each scheme makes different assumptions about out-of-band channels and adversary models, and are driven by their particular use-cases. A conceptual model that facilitates meaningful comparison among SDP schemes is missing. We provide such a model. In this article, we survey and analyze a wide range of SDP schemes that are described in the literature, including a number that have been adopted as standards. A system model and consistent terminology for SDP schemes are built on the foundation of this survey, which are then used to classify existing SDP schemes into a taxonomy that, for the first time, enables their meaningful comparison and analysis.The existing SDP schemes are analyzed using this model, revealing common systemic security weaknesses among the surveyed SDP schemes that should become priority areas for future SDP research, such as improving the integration of privacy requirements into the design of SDP schemes. Our results allow SDP scheme designers to create schemes that are more easily comparable with one another, and to assist the prevention of persisting the weaknesses common to the current generation of SDP schemes.Comment: 34 pages, 5 figures, 3 tables, accepted at IEEE Communications Surveys & Tutorials 2017 (Volume: PP, Issue: 99

    A Formal Framework for Linguistic Annotation

    Get PDF
    `Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

    SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS

    Get PDF
    Automatic segregation of overlapping speech signals from single-channel recordings is a challenging problem in speech processing. Similarly, the problem of extracting speech signals from noisy speech is a problem that has attracted a variety of research for several years but is still unsolved. Speech extraction from noisy speech mixtures where the background interference could be either speech or noise is especially difficult when the task is to preserve perceptually salient properties of the recovered acoustic signals for use in human communication. In this work, we propose a speech segregation algorithm that can simultaneously deal with both background noise as well as interfering speech. We propose a feature-based, bottom-up algorithm which makes no assumptions about the nature of the interference or does not rely on any prior trained source models for speech extraction. As such, the algorithm should be applicable for a wide variety of problems, and also be useful for human communication since an aim of the system is to recover the target speech signals in the acoustic domain. The proposed algorithm can be compartmentalized into (1) a multi-pitch detection stage which extracts the pitch of the participating speakers, (2) a segregation stage which teases apart the harmonics of the participating sources, (3) a reliability and add-back stage which scales the estimates based on their reliability and adds back appropriate amounts of aperiodic energy for the unvoiced regions of speech and (4) a speaker assignment stage which assigns the extracted speech signals to their appropriate respective sources. The pitch of two overlapping speakers is extracted using a novel feature, the 2-D Average Magnitude Difference Function, which is also capable of giving a single pitch estimate when the input contains only one speaker. The segregation algorithm is based on a least squares framework relying on the estimated pitch values to give estimates of each speaker's contributions to the mixture. The reliability block is based on a non-linear function of the energy of the estimates, this non-linear function having been learnt from a variety of speech and noise data but being very generic in nature and applicability to different databases. With both single- and multiple- pitch extraction and segregation capabilities, the proposed algorithm is amenable to both speech-in-speech and speech-in-noise conditions. The algorithm is evaluated on several objective and subjective tests using both speech and noise interference from different databases. The proposed speech segregation system demonstrates performance comparable to or better than the state-of-the-art on most of the objective tasks. Subjective tests on the speech signals reconstructed by the algorithm, on normal hearing as well as users of hearing aids, indicate a significant improvement in the perceptual quality of the speech signal after being processed by our proposed algorithm, and suggest that the proposed segregation algorithm can be used as a pre-processing block within the signal processing of communication devices. The utility of the algorithm for both perceptual and automatic tasks, based on a single-channel solution, makes it a unique speech extraction tool and a first of its kind in contemporary technology

    Assessment of cockpit interface concepts for data link retrofit

    Get PDF
    The problem is examined of retrofitting older generation aircraft with data link capability. The approach taken analyzes requirements for the cockpit interface, based on review of prior research and opinions obtained from subject matter experts. With this background, essential functions and constraints for a retrofit installation are defined. After an assessment of the technology available to meet the functions and constraints, candidate design concepts are developed. The most promising design concept is described in detail. Finally, needs for further research and development are identified
    • …
    corecore