14,570 research outputs found

    On the voice-activated question answering

    Full text link
    [EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542

    Factoid question answering for spoken documents

    Get PDF
    In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents. This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario. More specifically, we study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution. Our approach is largely based on supervised machine learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages. In the work resulting of this Thesis, we have impulsed and coordinated the creation of an evaluation framework for the task of QA on spoken documents. The framework, named QAst, provides multi-lingual corpora, evaluation questions, and answers key. These corpora have been used in the QAst evaluation that was held in the CLEF workshop for the years 2007, 2008 and 2009, thus helping the developing of state-of-the-art techniques for this particular topic. The presentend QA system and all its modules are extensively evaluated on the European Parliament Plenary Sessions English corpus composed of manual transcripts and automatic transcripts obtained by three different Automatic Speech Recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts. The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts unless the ASR quality is very low. Overall, the performance of our system is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.En aquesta Tesi, presentem un sistema de Question Answering (QA) factual, especialment ajustat per treballar amb documents orals. En el desenvolupament explorem, per primera vegada, quines tècniques de les habitualment emprades en QA per documents escrit són suficientment robustes per funcionar en l'escenari més difícil de documents orals. Amb més especificitat, estudiem nous mètodes de Information Retrieval (IR) dissenyats per tractar amb la veu, i utilitzem diversos nivells d'informació linqüística. Entre aquests s'inclouen, a saber: detecció de Named Entities utilitzant informació fonètica, "parsing" sintàctic aplicat a transcripcions de veu, i també l'ús d'un sub-sistema de detecció i resolució de la correferència. La nostra aproximació al problema es recolza en gran part en tècniques supervisades de Machine Learning, estant aquestes enfocades especialment cap a la part d'extracció de la resposta, i fa servir la menor quantitat possible de coneixement creat per humans. En conseqüència, tot el procés de QA pot ser adaptat a altres dominis o altres llengües amb relativa facilitat. Un dels resultats addicionals de la feina darrere d'aquesta Tesis ha estat que hem impulsat i coordinat la creació d'un marc d'avaluació de la taska de QA en documents orals. Aquest marc de treball, anomenat QAst (Question Answering on Speech Transcripts), proporciona un corpus de documents orals multi-lingüe, uns conjunts de preguntes d'avaluació, i les respostes correctes d'aquestes. Aquestes dades han estat utilitzades en les evaluacionis QAst que han tingut lloc en el si de les conferències CLEF en els anys 2007, 2008 i 2009; d'aquesta manera s'ha promogut i ajudat a la creació d'un estat-de-l'art de tècniques adreçades a aquest problema en particular. El sistema de QA que presentem i tots els seus particulars sumbòduls, han estat avaluats extensivament utilitzant el corpus EPPS (transcripcions de les Sessions Plenaries del Parlament Europeu) en anglès, que cónté transcripcions manuals de tots els discursos i també transcripcions automàtiques obtingudes mitjançant tres reconeixedors automàtics de la parla (ASR) diferents. Els reconeixedors tenen característiques i resultats diferents que permetes una avaluació quantitativa i qualitativa de la tasca. Aquestes dades pertanyen a l'avaluació QAst del 2009. Els resultats principals de la nostra feina confirmen que la informació sintàctica és mol útil per aprendre automàticament a valorar la plausibilitat de les respostes candidates, millorant els resultats previs tan en transcripcions manuals com transcripcions automàtiques, descomptat que la qualitat de l'ASR sigui molt baixa. En general, el rendiment del nostre sistema és comparable o millor que els altres sistemes pertanyents a l'estat-del'art, confirmant així la validesa de la nostra aproximació

    Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities

    Full text link
    Popular conversational agents frameworks such as Alexa Skills Kit (ASK) and Google Actions (gActions) offer unprecedented opportunities for facilitating the development and deployment of voice-enabled AI solutions in various verticals. Nevertheless, understanding user utterances with high accuracy remains a challenging task with these frameworks. Particularly, when building chatbots with large volume of domain-specific entities. In this paper, we describe the challenges and lessons learned from building a large scale virtual assistant for understanding and responding to equipment-related complaints. In the process, we describe an alternative scalable framework for: 1) extracting the knowledge about equipment components and their associated problem entities from short texts, and 2) learning to identify such entities in user utterances. We show through evaluation on a real dataset that the proposed framework, compared to off-the-shelf popular ones, scales better with large volume of entities being up to 30% more accurate, and is more effective in understanding user utterances with domain-specific entities

    The effect of component recognition on flexibility and speech recognition performance in a spoken question answering system

    Get PDF
    A spoken question answering system that recognizes questions as full sentences performs well when users ask one of the questions defined. A system that recognizes component words and finds an equivalent defined question might be more flexible, but is likely to have decreased speech recognition performance, leading to a loss in overall system success. The research described in this document compares the advantage in flexibility to the loss in recognition performance when using component recognition. Questions posed by participants were processed by a system of each type. As expected, the component system made frequent recognition errors while detecting words (word error rate of 31%). In comparison, the full system made fewer errors while detecting full sentences (sentence error rate of 10%). Nevertheless, the component system succeeded in providing proper responses to 76% of the queries posed, while the full system responded properly to only 46%. Four variations of the traditional tf-idf weighting method were compared as applied to the matching of short text strings (fewer than 10 words). It was found that the general approach was successful in finding matches, and that all four variations compensated for the loss in speech recognition performance to a similar degree. No significant difference due to the variations in weighting was detected in the results

    A Survey on Conversational Search and Applications in Biomedicine

    Full text link
    This paper aims to provide a radical rundown on Conversation Search (ConvSearch), an approach to enhance the information retrieval method where users engage in a dialogue for the information-seeking tasks. In this survey, we predominantly focused on the human interactive characteristics of the ConvSearch systems, highlighting the operations of the action modules, likely the Retrieval system, Question-Answering, and Recommender system. We labeled various ConvSearch research problems in knowledge bases, natural language processing, and dialogue management systems along with the action modules. We further categorized the framework to ConvSearch and the application is directed toward biomedical and healthcare fields for the utilization of clinical social technology. Finally, we conclude by talking through the challenges and issues of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an integrated and unified vision of the ConvSearch components from different fields, which benefit the information-seeking process in healthcare systems

    Experientially grounded language production: Advancing our understanding of semantic processing during lexical selection

    Get PDF
    Der Prozess der lexikalischen Selektion, d.h. die Auswahl der richtigen Wörter zur Übermittlung einer intendierten Botschaft, ist noch nicht hinreichend verstanden. Insbesondere wurde kaum erforscht, inwiefern Bedeutungsaspekte, welche in sensomotorischen Erfahrungen gründen, diesen Prozess der Sprachproduktion beeinflussen. Die Rolle dieser Bedeutungsaspekte wurde mit zwei Studien untersucht, in denen Probanden Sätze vervollständigten. In Studie 1 wurde der visuelle Eindruck der Satzfragmente manipuliert, so dass die Sätze auf- oder absteigend am Bildschirm erschienen. In Studie 2 mussten die Probanden Kopfbewegungen nach oben oder unten ausführen, während sie die Satzfragmente hörten. Wir untersuchten, ob räumliche Aspekte der produzierten Wörter durch die räumlichen Manipulationen sowie die räumlichen Eigenschaften der präsentierten Satzfragmente beeinflusst werden. Die vertikale visuelle Manipulation in Studie 1 wirkte sich nicht auf die räumlichen Attribute der produzierten Wörter aus. Die Kopfbewegungen in Studie 2 führten zu einem solchen Effekt – bei Kopfbewegungen nach oben waren die Referenten der produzierten Wörter weiter oben im Raum angesiedelt als nach Bewegungen nach unten (und anders herum). Darüber hinaus war dieser Effekt stärker, je ausgeprägter die interozeptive Sensibilität der Probanden war. Zudem beeinflussten die räumlichen Aspekte der Satzfragmente die räumlichen Eigenschaften der produzierten Wörter in beiden Studien. Somit zeigt diese Arbeit, dass in der Erfahrung basierende Bedeutungsanteile, welche entweder in Sprache eingebettet sind oder durch körperliche Aktivität reaktiviert werden, die Auswahl der Wörter beim Sprechen beeinflussen und dass interindividuelle Unterschiede diesen Effekt modulieren. Die Befunde werden in Bezug zu Theorien der Semantik gesetzt. Darüber hinaus wird das Methodenrepertoire erweitert, indem mit Studie 3 ein Ansatz für die Durchführung von Online-Sprachproduktionsexperimenten mit Bildbenennung vorgestellt wird.The process of lexical selection, i.e. producing the right words to get an intended message across, is not well understood. Specifically, meaning aspects grounded in sensorimotor experiences and their role during lexical selection have not been investigated widely. Here, we investigated the role of experientially grounded meaning aspects with two studies in which participants had to produce a noun to complete sentences which described sceneries. In Study 1, the visual appearance of sentence fragments was manipulated and they seemed to move upwards or downwards on screen. In Study 2, participants moved their head up- or downwards while listening to sentence fragments. We investigated whether the spatial properties of the freely chosen nouns are influenced by the spatial manipulations as well as by the spatial properties of the sentences. The vertical visual manipulation used in Study 1 did not influence the spatial properties of the produced words. However, the body movements in Study 2 influenced participants’ lexical choices, i.e. after up-movements the referents of the produced words were higher up compared to after downward movements (and vice verse). Furthermore, there was an increased effect of movement on the spatial properties of the produced nouns with higher levels of participants’ interoceptive sensibility. Additionally, the spatial properties of the stimulus sentences influenced the spatial properties of the produced words in both studies. Thus, experientially grounded meaning aspects which are either embedded in text or reactivated via bodily manipulations may influence which words we chose when speaking, and interindividual differences may moderate these effects. The findings are related to current theories of semantics. Furthermore, this dissertation enhances the methodological repertoire of language production researchers by showing how language production studies with overt articulation in picture naming tasks can be run online (Study 3)

    Neurocognitive Informatics Manifesto.

    Get PDF
    Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given

    Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System

    Get PDF
    Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems
    • …
    corecore