93,291 research outputs found
Recognizing Voice Over IP: A Robust Front-End for Speech Recognition on the World Wide Web
The Internet Protocol (IP) environment poses two relevant sources of distortion to the speech recognition problem: lossy speech coding and packet loss. In this paper, we propose a new front-end for speech recognition over IP networks. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bit stream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant benefits. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion due to the encoding-decoding process. Second, when packet loss occurs, our front-end becomes more effective since it is not constrained to the error handling mechanism of the codec. We have considered the ITU G.723.1 standard codec, which is one of the most preponderant coding algorithms in voice over IP (VoIP) and compared the proposed front-end with the conventional approach in two automatic speech recognition (ASR) tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated packet loss rates. Furthermore, the improvement is higher as network conditions worsen.Publicad
Artificial Sequences and Complexity Measures
In this paper we exploit concepts of information theory to address the
fundamental problem of identifying and defining the most suitable tools to
extract, in a automatic and agnostic way, information from a generic string of
characters. We introduce in particular a class of methods which use in a
crucial way data compression techniques in order to define a measure of
remoteness and distance between pairs of sequences of characters (e.g. texts)
based on their relative information content. We also discuss in detail how
specific features of data compression techniques could be used to introduce the
notion of dictionary of a given sequence and of Artificial Text and we show how
these new tools can be used for information extraction purposes. We point out
the versatility and generality of our method that applies to any kind of
corpora of character strings independently of the type of coding behind them.
We consider as a case study linguistic motivated problems and we present
results for automatic language recognition, authorship attribution and self
consistent-classification.Comment: Revised version, with major changes, of previous "Data Compression
approach to Information Extraction and Classification" by A. Baronchelli and
V. Loreto. 15 pages; 5 figure
Study to determine potential flight applications and human factors design guidelines for voice recognition and synthesis systems
A study was conducted to determine potential commercial aircraft flight deck applications and implementation guidelines for voice recognition and synthesis. At first, a survey of voice recognition and synthesis technology was undertaken to develop a working knowledge base. Then, numerous potential aircraft and simulator flight deck voice applications were identified and each proposed application was rated on a number of criteria in order to achieve an overall payoff rating. The potential voice recognition applications fell into five general categories: programming, interrogation, data entry, switch and mode selection, and continuous/time-critical action control. The ratings of the first three categories showed the most promise of being beneficial to flight deck operations. Possible applications of voice synthesis systems were categorized as automatic or pilot selectable and many were rated as being potentially beneficial. In addition, voice system implementation guidelines and pertinent performance criteria are proposed. Finally, the findings of this study are compared with those made in a recent NASA study of a 1995 transport concept
Aerospace medicine and biology: A continuing bibliography with indexes (supplement 323)
This bibliography lists 125 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during April, 1989. Subject coverage includes; aerospace medicine and psychology, life support systems and controlled environments, safety equipment exobiology and extraterrestrial life, and flight crew behavior and performance
The unexplained nature of reading.
The effects of properties of words on their reading aloud response times (RTs) are 1 major source of evidence about the reading process. The precision with which such RTs could potentially be predicted by word properties is critical to evaluate our understanding of reading but is often underestimated due to contamination from individual differences. We estimated this precision without such contamination individually for 4 people who each read 2,820 words 50 times each. These estimates were compared to the precision achieved by a 31-variable regression model that outperforms current cognitive models on variance-explained criteria. Most (around 2/3) of the meaningful (non-first-phoneme, non-noise) word-level variance remained unexplained by this model. Considerable empirical and theoretical-computational effort has been expended on this area of psychology, but the high level of systematic variance remaining unexplained suggests doubts regarding contemporary accounts of the details of the mechanisms of reading at the level of the word. Future assessment of models can take advantage of the availability of our precise participant-level database
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State Vowel Identification
Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. Such a transformation enables speech to be understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitchindependent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624
Recommended from our members
A qualititative approach to HCI research
Whilst science has a strong reliance on quantitative and experimental methods, there are many complex, socially based phenomena in HCI that cannot be easily quantified or experimentally manipulated or, for that matter, ethically researched with experiments. For example, the role of privacy in HCI is not obviously reduced to numbers and it would not be appropriate to limit a person's privacy in the name of research. In addition, technology is rapidly changing – just think of developments in mobile devices, tangible interfaces and so on – making it harder to abstract technology from the context of use if we are to study it effectively. Developments such as mediated social networking and the dispersal of technologies in ubiquitous computing also loosen the connection between technologies and work tasks that were the traditional cornerstone of HCI. Instead, complex interactions between technologies and ways of life are coming to the fore. Consequently, we frequently find that we do not know what the real HCI issues are before we start our research. This makes it hard, if not actually impossible, to define the variables necessary to do quantitative research, (see Chapter 2).
Within HCI, there is also the recognition that the focus on tasks is not enough to design and implement an effective system. There is also a growing need to understand how usability issues are subjectively and collectively experienced and perceived by different user groups (Pace, 2004; Razavim and Iverson, 2006). This means identifying the users' emotional and social drives and perspectives; their motivations, expectations, trust, identity, social norms and so on. It also means relating these concepts to work practices, communities and organisational social structures as well as organisational, economic and political drivers. These issues are increasingly needed in the design, development and implementation of systems to be understood both in isolation and as a part of the whole.
HCI researchers are therefore turning to more qualitative methods in order to deliver the research results that HCI needs.With qualitative research, the emphasis is not on measuring and producing numbers but instead on understanding the qualities of a particular technology and how people use it in their lives, how they think about it and how they feel about it. There are many varied approaches to qualitative research within the social sciences depending on what is being studied, how it can be studied and what the goals of the research are.Within HCI, though, grounded theory has been found to provide good insights that address well the issues raised above (Pace, 2004; Adams, Blandford and Lunt, 2005; Razavim and Iverson, 2006).
The purpose of this chapter is to give an overview of how grounded theory works as a method. Quantitative research methods adopt measuring instruments and experimental manipulations that can be repeated by any researcher (at least in principle) and every effort is made to reduce the influence of the researcher on the researched, which is regarded as a source of bias or error. In contrast, in qualitative research, where the goal is understanding rather than measuring and manipulating, the subjectivity of the researcher is an essential part of the production of an interpretation. The chapter therefore discusses how the influence of the researcher can be ameliorated through the grounded theory methodology whilst also acknowledging the subjective input of the researcher through reflexivity. The chapter also presents a case study of how grounded theory was used in practice to study people's use and understanding of computer passwords and related security
- …