1,088 research outputs found

    Assessing the quality of audio and video components in desktop multimedia conferencing

    Get PDF
    This thesis seeks to address the HCI (Human-Computer Interaction) research problem of how to establish the level of audio and video quality that end users require to successfully perform tasks via networked desktop videoconferencing. There are currently no established HCI methods of assessing the perceived quality of audio and video delivered in desktop videoconferencing. The transport of real-time speech and video information across new digital networks causes novel and different degradations, problems and issues to those common in the traditional telecommunications areas (telephone and television). Traditional assessment methods involve the use of very short test samples, are traditionally conducted outside a task-based environment, and focus on whether a degradation is noticed or not. But these methods cannot help establish what audio-visual quality is required by users to perform tasks successfully with the minimum of user cost, in interactive conferencing environments. This thesis addresses this research gap by investigating and developing a battery of assessment methods for networked videoconferencing, suitable for use in both field trials and laboratory-based studies. The development and use of these new methods helps identify the most critical variables (and levels of these variables) that affect perceived quality, and means by which network designers and HCI practitioners can address these problems are suggested. The output of the thesis therefore contributes both methodological (i.e. new rating scales and data-gathering methods) and substantive (i.e. explicit knowledge about quality requirements for certain tasks) knowledge to the HCI and networking research communities on the subjective quality requirements of real-time interaction in networked videoconferencing environments. Exploratory research is carried out through an interleaved series of field trials and controlled studies, advancing substantive and methodological knowledge in an incremental fashion. Initial studies use the ITU-recommended assessment methods, but these are found to be unsuitable for assessing networked speech and video quality for a number of reasons. Therefore later studies investigate and establish a novel polar rating scale, which can be used both as a static rating scale and as a dynamic continuous slider. These and further developments of the methods in future lab- based and real conferencing environments will enable subjective quality requirements and guidelines for different videoconferencing tasks to be established

    The Influence of Training Method on Tone Colour Discrimination

    Get PDF
    This research addresses the question of whether one of two training methods, identification by continuous adjustment (ICA) or identification by successive approximation (ISA), is more effective in training students using a technical ear training program (TETP). No known empirical studies have examined the effectiveness of either training method within frequency spectrum-based student-targeted TETPs. Preliminary work involved the development of appropriate tests of students’ tone colour discrimination ability in isolation, on tasks sufficiently different from those encountered in TETPs. The tests were then deployed in a pilot study within a pre/post-training scenario using two groups of audio engineering students, one of which undertook an ICA and the other an ISA version of a TETP. These preliminary results indicated the suitability of a test that featured pairwise comparisons of synthetic percussive timbres to show differences in performance between the two training groups. This test was subsequently administered repeatedly in a full-scale study at regular intervals throughout a web-based TETP, in addition to before and after training. Results of the full-scale study showed the individual differences scaling (INDSCAL)-derived stimulus spaces for both groups were similar prior to undertaking the TETP. The ISA group’s post-training results were almost identical to their pre-training results, whereas the ICA groups’ post-training results showed minor, but insignificant differences. Although the full-scale study found insignificant differences in performance between training groups, the preliminary results suggest that the deployment of a pre/post-training test is an effective measure of the training method’s influence on students if the test features a task that is significantly different from those trained on in the TETP

    Naturalistic Emotional Speech Corpora with Large Scale Emotional Dimension Ratings

    Get PDF
    The investigation of the emotional dimensions of speech is dependent on large sets of reliable data. Existing work has been carried out on the creation of emotional speech corpora and the acoustic analysis of emotional speech and this research seeks to buildupon this work while suggesting new methods and areas of potential. A review of the literature determined that a two dimensional emotional model of activation and evaluation was the ideal method for representing the emotional states expressed inspeech. Two case studies were carried out to investigate methods of obtaining naturalunderlying emotional speech in a high quality audio environment, the results of which were used to design a final experimental procedure to elicit natural underlying emotional speech. The speech obtained in this experiment was used in the creation ofa speech corpus that was underpinned by a persistent backend database that incorporated a three-tiered annotation methodology. This methodology was used to comprehensively annotate the metadata, acoustic data and emotional data of the recorded speech. Structuring the three levels of annotation and the assets in a persistent backend database allowed interactive web-based tools to be developed; aweb-based listening tool was developed to obtain a large amount of ratings for the assets that were then written back to the database for analysis. Once a large amount of ratings had been obtained, statistical analysis was used to determine the dimensionalrating for each asset. Acoustic analysis of the underlying emotional speech was then carried out and determined that certain acoustic parameters were correlated with the activation dimension of the dimensional model. This substantiated some of thefindings in the literature review and further determined that spectral energy was strongly correlated with the activation dimension in relation to underlying emotional speech. The lack of a correlation for certain acoustic parameters in relation to the evaluation dimension was also determined, again substantiating some of the findings in the literature.The work contained in this thesis makes a number of contributions to the field: the development of an experimental design to elicit natural underlying emotional speech in a high quality audio environment; the development and implementation of acomprehensive three-tiered corpus annotation methodology; the development and implementation of large scale web based listening tests to rate the emotional dimensions of emotional speech; the determination that certain acoustic parameters are correlated with the activation dimension of a dimensional emotional model inrelation to natural underlying emotional speech and the determination that certain acoustic parameters are not correlated with the evaluation dimension of a twodimensional emotional model in relation to natural underlying emotional speech

    Towards Human-centered Explainable AI: A Survey of User Studies for Model Explanations

    Full text link
    Explainable AI (XAI) is widely viewed as a sine qua non for ever-expanding AI research. A better understanding of the needs of XAI users, as well as human-centered evaluations of explainable models are both a necessity and a challenge. In this paper, we explore how HCI and AI researchers conduct user studies in XAI applications based on a systematic literature review. After identifying and thoroughly analyzing 97core papers with human-based XAI evaluations over the past five years, we categorize them along the measured characteristics of explanatory methods, namely trust, understanding, usability, and human-AI collaboration performance. Our research shows that XAI is spreading more rapidly in certain application domains, such as recommender systems than in others, but that user evaluations are still rather sparse and incorporate hardly any insights from cognitive or social sciences. Based on a comprehensive discussion of best practices, i.e., common models, design choices, and measures in user studies, we propose practical guidelines on designing and conducting user studies for XAI researchers and practitioners. Lastly, this survey also highlights several open research directions, particularly linking psychological science and human-centered XAI

    Design and evaluation of mobile computer-assisted pronunciation training tools for second language learning

    Get PDF
    The quality of speech technology (automatic speech recognition, ASR, and textto- speech, TTS) has considerably improved and, consequently, an increasing number of computer-assisted pronunciation (CAPT) tools has included it. However, pronunciation is one area of teaching that has not been developed enough since there is scarce empirical evidence assessing the effectiveness of tools and games that include speech technology in the field of pronunciation training and teaching. This PhD thesis addresses the design and validation of an innovative CAPT system for smart devices for training second language (L2) pronunciation. Particularly, it aims to improve learner’s L2 pronunciation at the segmental level with a specific set of methodological choices, such as learner’s first and second language connection (L1– L2), minimal pairs, a training cycle of exposure–perception–production, individualistic and social approaches, and the inclusion of ASR and TTS technology. The experimental research conducted applying these methodological choices with real users validates the efficiency of the CAPT prototypes developed for the four main experiments of this dissertation. Data is automatically gathered by the CAPT systems to give an immediate specific feedback to users and to analyze all results. The protocols, metrics, algorithms, and methods necessary to statistically analyze and discuss the results are also detailed. The two main L2 tested during the experimental procedure are American English and Spanish. The different CAPT prototypes designed and validated in this thesis, and the methodological choices that they implement, allow to accurately measuring the relative pronunciation improvement of the individuals who trained with them. Both rater’s subjective scores and CAPT’s objective scores show a strong correlation, being useful in the future to be able to assess a large amount of data and reducing human costs. Results also show an intensive practice supported by a significant number of activities carried out. In the case of the controlled experiments, students who worked with the CAPT tool achieved better pronunciation improvement values than their peers in the traditional in-classroom instruction group. In the case of the challenge-based CAPT learning game proposed, the most active players in the competition kept on playing until the end and achieved significant pronunciation improvement results.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic

    Evaluating the translational potential of relative fundamental frequency

    Get PDF
    Relative fundamental frequency (RFF) is an acoustic measure that quantifies short-term changes in fundamental frequency during voicing transitions surrounding a voiceless consonant. RFF is hypothesized to be decreased by increased laryngeal tension during voice production and has been considered a potential objective measure of vocal hyperfunction. Previous studies have supported claims that decreased RFF values may indicate the severity of vocal hyperfunction and have attempted to improve the methods to obtain RFF. In order to make progress towards developing RFF into a clinical measure, this dissertation aimed to investigate further the validity and reliability of RFF. Specifically, we examined the underlying physiological mechanisms, the auditory-perceptual relationship with strained voice quality, and test-retest reliability. The first study evaluated one of the previously hypothesized physiological mechanisms for RFF, vocal fold abduction. Vocal fold kinematics and RFF were obtained from both younger and older typical speakers producing RFF stimuli with voiceless fricatives and stops during high-speed videoendoscopy. We did not find any statistical differences between younger and older speakers, but we found that vocal folds were less adducted and RFF was lower at voicing onset after the voiceless stop compared to the fricative. This finding is in accordance with the hypothesized positive association between vocal fold contact area during voicing transitions and RFF. The second study examined the relationship between RFF and strain, a major auditory-perceptual feature of vocal hyperfunction. RFF values were synthetically modified by exchanging the RFF contours between voice samples that were produced with a comfortable voice and with maximum vocal effort, while other acoustic features remained constant. We observed that comfortable voice samples with the RFF values of maximum vocal effort samples had increased strain ratings, whereas maximum vocal effort samples with the RFF values of comfortable voice samples had decreased strain ratings. These findings support the contribution of RFF to perceived strain. The third study compared the test-retest reliability of RFF with that of conventional voice measures. We recorded individuals with healthy voices during five consecutive days and obtained acoustic, aerodynamic, and auditory-perceptual measures from the recordings. RFF was comparably reliable as acoustic and aerodynamic measures and more reliable than auditory-perceptual measures. This dissertation supports the translational potential of RFF by providing empirical evidence of the physiological mechanisms of RFF, the relationship between RFF and perceived strain, and test-retest reliability of RFF. Clinical applications of RFF are expected to improve objective diagnosis and assessment of vocal hyperfunction, and thus to lead to better voice care for individuals with vocal hyperfunction.2021-09-25T00:00:00

    Professional competency of modern specialist: means of formation, development and improvement

    Get PDF
    The modern scientific and methodical approaches to the study and analysis of professional competence that are in line with the state requirements for reforming education and the tendencies of introducing a competent approach as one of the key factors of today's vocational education are analyzed. The emphasis is placed on the fact that implementation of the competence approach should include the use of professional training of real professional tasks with the orientation of future professionals to analyze the results of their own professional activities and decisions. The basic principles of professional training of future managers of economic security are determined. It has been established that the professional training of future managers of economic security should be carried out on a modular basis

    Elective English in secondary schools : descriptive evaluation in macrocosm and microcosm

    Get PDF
    This study traced the historical and philosophical evolution of short-course elective English programs in American secondary schools, emphasizing the development and effects of a selected example involving alternative course designs. The problem has been that many proponents of functional and/or content-oriented English curricula have considered the elective model incompatible with their professional commitments and have sought its demise. The short-course elective program, originally a manifestation of the experimental stance, has since demonstrated potential as an administrative accommodation for multidesign in future curricula

    An exploration of work dimensions in the Western Australian public service: A factor analysis of job skills and their contexts

    Get PDF
    The dimensions underlying the structure of work in the Western Australian Public sector were analysed and compared with the structure of work as ascertained by Functional Job Analysis and the Position Analysis Questionnaire. A questionnaire was developed by the Skills Resource Management Unit to determine the importance attached to work skills in a variety of public sector occupations. One hundred and ninety four subjects of mixed gender were randomly selected from public sector agencies and were surveyed through workshops. Results were subjected to exploratory factor analyses. Confirmatory factor analysis then investigated the fit of the data to the following contradictory hypotheses as to the structure of work in the public sector. The dimensionality of work resembles three dimensions: Working with People, Working With Information, and, Using Machines and Equipment as based on Sydney Fine\u27s (1971) factors, Data, People and Things. The dimensionality of work resembles six dimensions: Information Input, Mental Processes, Work Output, Relationships with Other People, Job Context, and Other Job Characteristics as based on an information processing model by McCormick, Jeanneret, & Mecham (1972). Results indicated that the structure of work fitted neither model well. However it approximated Fine\u27s (1971) model more closely than the PAQ model. Implications of ascertaining a structure of work in the public sector and future research prospects were suggested
    • …
    corecore