285 research outputs found

    Interactive Pattern Recognition applied to Natural Language Processing

    Full text link
    This thesis is about Pattern Recognition. In the last decades, huge efforts have been made to develop automatic systems able to rival human capabilities in this field. Although these systems achieve high productivity rates, they are not precise enough in most situations. Humans, on the contrary, are very accurate but comparatively quite slower. This poses an interesting question: the possibility of benefiting from both worlds by constructing cooperative systems. This thesis presents diverse contributions to this kind of collaborative approach. The point is to improve the Pattern Recognition systems by properly introducing a human operator into the system. We call this Interactive Pattern Recognition (IPR). Firstly, a general proposal for IPR will be stated. The aim is to develop a framework to easily derive new applications in this area. Some interesting IPR issues are also introduced. Multi-modality or adaptive learning are examples of extensions that can naturally fit into IPR. In the second place, we will focus on a specific application. A novel method to obtain high quality speech transcriptions (CAST, Computer Assisted Speech Transcription). We will start by proposing a CAST formalization and, next, we will cope with different implementation alternatives. Practical issues, as the system response time, will be also taken into account, in order to allow for a practical implementation of CAST. Word graphs and probabilistic error correcting parsing are tools that will be used to reach an alternative formulation that allows for the use of CAST in a real scenario. Afterwards, a special application within the general IPR framework will be discussed. This is intended to test the IPR capabilities in an extreme environment, where no input pattern is available and the system only has access to the user actions to produce a hypothesis. Specifically, we will focus here on providing assistance in the problem of text generation.Rodríguez Ruiz, L. (2010). Interactive Pattern Recognition applied to Natural Language Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8479Palanci

    A comparison of features for large population speaker identification

    Get PDF
    Bibliography: leaves 95-104.Speech recognition systems all have one criterion in common; they perform better in a controlled environment using clean speech. Though performance can be excellent, even exceeding human capabilities for clean speech, systems fail when presented with speech data from more realistic environments such as telephone channels. The differences using a recognizer in clean and noisy environments are extreme, and this causes one of the major obstacles in producing commercial recognition systems to be used in normal environments. It is the lack of performance of speaker recognition systems with telephone channels that this work addresses. The human auditory system is a speech recognizer with excellent performance, especially in noisy environments. Since humans perform well at ignoring noise more than any machine, auditory-based methods are the promising approaches since they attempt to model the working of the human auditory system. These methods have been shown to outperform more conventional signal processing schemes for speech recognition, speech coding, word-recognition and phone classification tasks. Since speaker identification has received lot of attention in speech processing because of its waiting real-world applications, it is attractive to evaluate the performance using auditory models as features. Firstly, this study rums at improving the results for speaker identification. The improvements were made through the use of parameterized feature-sets together with the application of cepstral mean removal for channel equalization. The study is further extended to compare an auditory-based model, the Ensemble Interval Histogram, with mel-scale features, which was shown to perform almost error-free in clean speech. The previous studies of Elli to be more robust to noise were conducted on speaker dependent, small population, isolated words and now are extended to speaker independent, larger population, continuous speech. This study investigates whether the Elli representation is more resistant to telephone noise than mel-cepstrum as was shown in the previous studies, when now for the first time, it is applied for speaker identification task using the state-of-the-art Gaussian mixture model system

    Towards a clinical assessment of acquired speech dyspraxia.

    Get PDF
    No standardised assessment exists for the recognition and quantification of acquired speech dyspraxia (also called apraxia of speech, AS). This thesis aims to work towards development of such an assessment based on perceptual features. Review of previous features claimed to characterise AS and differentiate it from other acquired pronunciation problems (dysarthrias; phonemic paraphasia - PP) has proved negative. Reasons for this have been explored. A reconceptualisation of AS is attempted based on physical studies of AS, PP and the dysarthrias; their position and relationship within coalitional models of speech production; by comparison with normal action control and other dyspraxias. Contrary to the view of many it is concluded that AS and PP are dyspraxias (albeit different types). However, due to the interactive nature of speech-language production and behaviour of the vocal tract as a functional whole AS is unlikely to be distinguishable in an absolute fashion based on single speech characteristics. Rather it is predicted that pronunciation disordered groups will differ relatively on total error profiles and susceptibility to associated effects (variability; propositionality; struggle; length-complexity; latency-utterance times). Using a prototype battery and refined error transcription and analysis procedures a series of studies test predictions on three groups: spastic dysarthrics (n = 6) AS and PP without (n = 12) and with (n = 12) dysphasia. The main conclusions do not support the error profile hypotheses in any straightforward manner. Length-complexity effects and latency-utterance times fail to consistently separate groups. Variability, propositionality and struggle proved the most reliable indicators. Error profiles remain the closest indicators of speakers' intelligibility and therapeutic goals. The thesis argues for a single case approach to differential diagnosis and alternative statistical analyses to capture individual and group differences. Suggestions for changes to the prototype clinical battery and data management to effect optimal speaker differentiation conclude the work

    Framework for Human Computer Interaction for Learning Dialogue Strategies using Controlled Natural Language in Information Systems

    Get PDF
    Spoken Language systems are going to have a tremendous impact in all the real world applications, be it healthcare enquiry, public transportation system or airline booking system maintaining the language ethnicity for interaction among users across the globe. These system have the capability of interacting with the user in di erent languages that the system supports. Normally when a person interacts with another person there are many non-verbal clues which guide the dialogue and all the utterances have a contextual relationship, which manage the dialogue as its mixed by the two speakers. Human Computer Interaction has a wide impact on the design of the applications and has become one of the emerging interest area of the researchers. All of us are witness to an explosive electronic revolution where lots of gadgets and gizmo's have surrounded us, advanced not only in power, design, applications but the ease of access or what we call user friendly interfaces are designed that we can easily use and control all the functionality of the devices. Since speech is one of the most intuitive form of interaction that humans use. It provides potential bene ts such as handfree access to machines, ergonomics and greater e ciency of interaction. Yet, speech-based interfaces design has been an expert job for a long time. Lot of research has been done in building real spoken Dialogue Systems which can interact with humans using voice interactions and help in performing various tasks as are done by humans. Last two decades have seen utmost advanced research in the automatic speech recognition, dialogue management, text to speech synthesis and Natural Language Processing for various applications which have shown positive results. This dissertation proposes to apply machine learning (ML) techniques to the problem of optimizing the dialogue management strategy selection in the Spoken Dialogue system prototype design. Although automatic speech recognition and system initiated dialogues where the system expects an answer in the form of `yes' or `no' have already been applied to Spoken Dialogue Systems( SDS), no real attempt to use those techniques in order to design a new system from scratch has been made. In this dissertation, we propose some novel ideas in order to achieve the goal of easing the design of Spoken Dialogue Systems and allow novices to have access to voice technologies. A framework for simulating and evaluating dialogues and learning optimal dialogue strategies in a controlled Natural Language is proposed. The simulation process is based on a probabilistic description of a dialogue and on the stochastic modelling of both arti cial NLP modules composing a SDS and the user. This probabilistic model is based on a set of parameters that can be tuned from the prior knowledge from the discourse or learned from data. The evaluation is part of the simulation process and is based on objective measures provided by each module. Finally, the simulation environment is connected to a learning agent using the supplied evaluation metrics as an objective function in order to generate an optimal behaviour for the SDS

    The Pedagogy Of The Operating Theatre

    Get PDF
    This thesis outlines the findings of a large body of research work undertaken during 3 years of full-time study. The findings have already provided the author with helpful anchors for structuring formative feedback to surgical trainees within a simulation program, as well as helpful insights into her own learning. This thesis explores the operating theatre as a teaching and learning environment for postgraduate surgical trainees. The work crosses paradigms and uses contrasting methodologies to provide rich insights into surgical pedagogic practice. The first chapter is an introduction to the subject material, outlining the thesis aims and research questions, making clear why the research is important. The perspectives of the researcher are explained, in the first person, to make explicit her background and epistemological stance. The next chapter presents a narrative review of the literature, providing a background to the subject and a theoretical framework. Chapters three to six constitute empirical work. The third and fourth chapters use a grounded theory method to explore surgeons’ perceptions of the content and process of learning in the operating theatre. Chapter five uses case study methodology to illustrate teaching and learning in the operating theatre with concrete examples of pedagogic practice. The sixth chapter is a quasi-experimental study of learning which makes comparison between different pedagogic styles. The final chapter of the thesis draws together the findings from the empirical investigations. The personal development of the researcher is discussed in the first person and the body of research work is critically examined in view of its contribution to the field and its implications for future educational innovation.Open Acces

    Algerian intonational proficiency in English: An empirical study

    Get PDF
    Rather than a thorough analysis, the present work should be regarded as a contribution to the study of intonation. More particularly, it concentrates on the intonational proficiency of a sample of Algerian speakers of English (ASE). The investigation consisted mainly of two experiments. The first one was a Production Test and aimed at gathering a speech sample of ASE, as well as a sample of native speech to be used as a control. So a test was designed and submitted to twenty ASE (ten males and ten females) and five native speakers (two males and three females). The test consisted of ten units. The first four were highly controllable and also analysed instrumentally; whereas the remaining ones were increasingly less controllable. The second experiment was a Perception Test and aimed at evaluating the data by an audience of 160 native English listeners. Here, due to the large number of utterances, only the first four units were considered. From the outset, a number of questions were raised, the most important of which are as follows. How successful do ASE manage to be in manipulating intonation so as to convey specific meanings? What are the major errors and how can they be categorized? How do ASE make Halliday's three dimensional decisions (i.e. 'tonality', 'tonicity' and 'tone')? Despite numerous errors, most Algerian utterances were correctly understood. ASE tend to divide their speech into far more intonation groups than natives do. 'Tonicity' and 'tone' errors did also occur. While attempting to answer those questions, further observations were made. The speech rhythm of the ASE tends to be syllable-timed. Rhythmic errors took place, e.g. failure to use weak forms. Short vowels tend to be lengthened. Another peculiar finding is the existence of falling and rising 'gestures' independent of nuclei. Equally peculiar is the existence of fall-level and rise- fall-level tones. Finally, the error causing the most serious communication breakdown is wrong placement of stress
    • …
    corecore