12 research outputs found

    A Flexible Question Answering System for Mobile Devices

    Get PDF

    Spoken conversational search: speech-only interactive information retrieval

    Get PDF
    This research investigates a new interface paradigm for interactive information retrieval (IIR) which forces us to shift away from the classic "ten blue links" search engine results page. Instead we investigate how to present search results through a conversation over a speech-only communication channel where no screen is available. Accessing information via speech is becoming increasingly pervasive and is already important for people with a visual impairment. However, presenting search results over a speech-only communication channel is challenging due to cognitive limitations and the transient nature of audio. Studies have indicated that the implementation of speech recognizers and screen readers must be carefully designed and cannot simply be added to an existing system. Therefore the aim of this research is to develop a new interaction framework for effective and efficient IIR over a speech-only channel: a Spoken Conversational Search System (SCSS) which provides a conversational approach to defining user information needs, presenting results and enabling search reformulations. In order to contribute to a more efficient and effective search experience when using a SCSS, we intend for a tighter integration between document search and conversational processes

    A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

    Get PDF
    In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained

    A system for spoken query information retrieval on mobile devices

    No full text
    With the proliferation of handheld devices, information access on mobile devices is a topic of growing relevance. This paper presents a system that allows the user to search for information on mobile devices using spoken natural-language queries. We explore several issues related to the creation of this system, which combines state-of-the-art speech-recognition and information-retrieval technologies. This is the first work that we are aware of which evaluates spoken query based information retrieval on a commonly available and well researched text database, the Chinese news corpus used in National Institute of Standards and Technology (NIST)’s TREC-5 and TREC-6 benchmarks. To compare spoken-query retrieval performance for different relevant scenarios and recognition accuracies, the benchmark queries—read verbatim by 20 speakers—were recorded simultaneously through three channels: headset microphone, PDA microphone, and cellular phone. Our results show that for mobile devices with high-quality microphones, spoken-query retrieval based on existing technologies yields retrieval precisions that come close to that for perfect text input (mean average precision 0.459 and 0.489, respectively, on TREC-6)

    Chinese Text Entry with Mobile Devices

    Get PDF
    Tietokoneiden ja nykyaikaisten matkapuhelimien käytön kannalta on olennaista, että niihin voidaan syöttää tekstiä tehokkaasti. Kiinan kielen eri murteita puhuu äidinkielenään noin viidesosa maailman väestöstä eli yli miljardi ihmistä. Kiinan kielen merkki- ja tavuperustaisuus tekee siitä tekstinsyötön kannalta ainutlaatuisen haastavan. Monet kiinalaisista merkeistä ovat rakenteeltaan monimutkaisia ja homofonisia (ääntyvät samalla tavoin) joidenkin muiden merkkien kanssa. Syötettäessä tekstiä näppäimistöltä tavallinen tapa on käyttää ns. pinyin-koodeja, joiden avulla kukin kiinan merkki voidaan esittää useasta latinalaisen aakkoston merkistä koostuvana koodina. Homofoniasta johtuen tarkoitettu kiinan kielen merkki joudutaan tämän jälkeen vielä valitsemaan usean vaihtoehdon joukosta, mikä tekee tekstinsyöttöprosessista vaikeampaa kuin romaanisten kielten tapauksessa. Lisäksi on otettava huomioon Kiinan eri osissa puhutut useat murteet. Kaikki nämä tekijät yhdessä tekevät kiinankielisen tekstin syötöstä tietokoneille haastavaa. Tämän väitöskirjan tavoitteena on parantaa kiinankielisen tekstin syöttötapojen käyttäjäkokemusta käytettäessä matkapuhelimia ja muita mobiililaitteita. Väitöskirjassa tutkitaan empiiristen kokeiden ja mallinnuksen avulla uusia tekstinsyöttötapoja ja niiden käyttöä. Tutkimuksen kohteena on neljä erilaista tekstinsyöttötapaa: kiinankielen käsinkirjoituksen tunnistus, pyörivän kiekon avulla tapahtuva tekstinsyöttö, mandariinikiinaan perustuva sanelu, ja numeronäppäinten avulla tapahtuva pinyin-koodien syöttö. Työssä ehdotetaan uusia tekniikoita sekä käsinkirjoituksen tunnistukseen että kiekkoa käyttävään pinyin-koodien syöttöön. Empiirisissä kokeissa osoittautui että käyttäjät pitivät uusista tekniikoista. Mandariinikiinalle on suunniteltu lyhytviestien sanelusovellus, josta on tehty kaksi käyttäjäkoetta. Myös numeronäppäinten avulla tapahtuvaa pinyin-koodien syöttöä on tutkittu kahdessa kokeessa. Ensimmäisessä kokeessa vertailtiin viittä eri menetelmää. Se tuotti suunnitteluohjeita etenkin koskien fraasien (useamman merkin kokonaisuuksien) syöttöä, tekniikkaa joka voi nopeuttaa tekstinsyöttöä. Toisen osatutkimuksen tuloksena on tekstinsyöttöä kuvaava malli, jonka avulla voidaan ennustaa menetelmän nopeutta kun syötettäessä ei tehdä virheitä. Tutkimus johti myös useisiin jatkotutkimuskysymyksiin. On tarpeen kehittää tehokkaampia menetelmiä tilanteeseen, jossa merkki joudutaan valitsemaan useista vaihtoehdoista. Kehityspotentiaalia on myös merkkien perustana olevien viivojen tunnistustavoissa sekä kosketusnäytöllä esitettyjen näppäimistöjen paremmassa hyödyntämisessä.For using computers and modern mobile phones it is essential that there are efficient methods for providing textual input. About one fifth of the world´s population, or over one billion people, speaks some variety of Chinese as their native language. Chinese has unique characteristics as a logosyllabic language. For example, many Chinese characters are complex in structure and normally homophonic with some others. With keyboards and other key-based input devices the normal approach is to use so-called pinyin input, where the Chinese characters are entered using their pinyin mark that consists of several characters in the Roman alphabet. Because of homophony this technique requires choosing the correct Chinese character from a list of posssible choices, making the input process more complicated than in Roman languages. Moreover, the many varieties of the language in different parts of China have to be taken into account as well. All above factors bring new challenges to the design and evaluation of Chinese text entry methods in computing systems. The overall objective of this dissertation is to improve user experience of Chinese text entry on mobile devices. To achieve the goal, the author explores new interaction solutions and patterns of user behavior in the Chinese text entry process with various approaches including empirical studies and performance modeling. The work covers four means of Chinese text entry on mobile devices: Chinese handwriting recognition, Chinese indirect text entry with a rotator, Mandarin dictation, and Chinese pinyin input methods with a 12-key keypad. New design solutions for Chinese handwriting recognition and pinyin methods utilizing a rotator are proposed and proved being well accepted by users with empirical studies. A Mandarin short message dictation application for mobile phones is also presented , with two associated studies on human factors. Two studies were also carried out on Chinese pinyin input methods that are based on the 12-key keypad. The comparative study of five phrasal pinyin input methods led to design guidelines for the advanced feature of phrasal input. The second study of pinyin input methods produced a predictive model addressing users´ error-free speeds. Based on the conclusions from studies in this thesis, several additional research questions were identified for the future. For example, improvements are necessary to promote user performance on target selection process in Chinese text entry on mobile devices. Moreover, design and studies on stroke methods and Chinese specific soft keyboards are also required

    A system for spoken query information retrieval on mobile devices

    No full text

    Flexible photo retrieval (FlexPhoReS) : a prototype for multimodel personal digital photo retrieval

    Get PDF
    Digital photo technology is developing rapidly and is motivating more people to have large personal collections of digital photos. However, effective and fast retrieval of digital photos is not always easy, especially when the collections grow into thousands. World Wide Web (WWW) is one of the platforms that allows digital photo users to publish a collection of photos in a centralised and organised way. Users typically find their photos by searching or browsing uSing a keyboard and mouse. Also in development at the moment are alternative user interfaces such as graphical user interfaces with speech (S/GUI) and other multimodal user interfaces which offer more flexibility to users. The aim of this research was to design and evaluate a flexible user interface for a web based personal digital photo retrieval system. A model of a flexible photo retrieval system (FlexPhoReS) was developed based on a review of the literature and a small-scale user study. A prototype, based on the model, was built using MATLAB and WWW technology. FlexPhoReS is a web based personal digital photo retrieval prototype that enables digital photo users to . accomplish photo retrieval tasks (browsing, keyword and visual example searching (CBI)) using either mouse and keyboard input modalities or mouse and speech input modalities. An evaluation with 20 digital photo users was conducted using usability testing methods. The result showed that there was a significant difference in search performance between using mouse and keyboard input modalities and using mouse and speech input modalities. On average, the reduction in search performance time due to using mouse and speech input modalities was 37.31%. Participants were also significantly more satisfied with mouse and speech input modalities than with mouse and keyboard input modalities although they felt that both were complementary. This research demonstrated that the prototype was successful in providing a flexible model of the photo retrieval process by offering alternative input modalities through a multimodal user interface in the World Wide Web environment.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Linguistically-motivated sub-word modeling with applications to speech recognition

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.Includes bibliographical references (p. 173-185).Despite the proliferation of speech-enabled applications and devices, speech-driven human-machine interaction still faces several challenges. One of theses issues is the new word or the out-of-vocabulary (OOV) problem, which occurs when the underlying automatic speech recognizer (ASR) encounters a word it does not "know". With ASR being deployed in constantly evolving domains such as restaurant ratings, or music querying, as well as on handheld devices, the new word problem continues to arise.This thesis is concerned with the OOV problem, and in particular with the process of modeling and learning the lexical properties of an OOV word through a linguistically-motivated sub-syllabic model. The linguistic model is designed using a context-free grammar which describes the sub-syllabic structure of English words, and encapsulates phonotactic and phonological constraints. The context-free grammar is supported by a probability model, which captures the statistics of the parses generated by the grammar and encodes spatio-temporal context. The two main outcomes of the grammar design are: (1) sub-word units, which encode pronunciation information, and can be viewed as clusters of phonemes; and (2) a high-quality alignment between graphemic and sub-word units, which results in hybrid entities denoted as spellnemes. The spellneme units are used in the design of a statistical bi-directional letter-to-sound (L2S) model, which plays a significant role in automatically learning the spelling and pronunciation of a new word.The sub-word units and the L2S model are assessed on the task of automatic lexicon generation. In a first set of experiments, knowledge of the spelling of the lexicon is assumed. It is shown that the phonemic pronunciations associated with the lexicon can be successfully learned using the L2S model as well as a sub-word recognizer.(cont.) In a second set of experiments, the assumption of perfect spelling knowledge is relaxed, and an iterative and unsupervised algorithm, denoted as Turbo-style, makes use of spoken instances of both spellings and words to learn the lexical entries in a dictionary.Sub-word speech recognition is also embedded in a parallel fashion as a backoff mechanism for a word recognizer. The resulting hybrid model is evaluated in a lexical access application, whereby a word recognizer first attempts to recognize an isolated word. Upon failure of the word recognizer, the sub-word recognizer is manually triggered. Preliminary results show that such a hybrid set-up outperforms a large-vocabulary recognizer.Finally, the sub-word units are embedded in a flat hybrid OOV model for continuous ASR. The hybrid ASR is deployed as a front-end to a song retrieval application, which is queried via spoken lyrics. Vocabulary compression and open-ended query recognition are achieved by designing a hybrid ASR. The performance of the frontend recognition system is reported in terms of sentence, word, and sub-word error rates. The hybrid ASR is shown to outperform a word-only system over a range of out-of-vocabulary rates (1%-50%). The retrieval performance is thoroughly assessed as a fmnction of ASR N-best size, language model order, and the index size. Moreover, it is shown that the sub-words outperform alternative linguistically-motivated sub-lexical units such as phonemes. Finally, it is observed that a dramatic vocabulary compression - by more than a factor of 10 - is accompanied by a minor loss in song retrieval performance.by Ghinwa F. Choueiter.Ph.D
    corecore