14,570 research outputs found
On the voice-activated question answering
[EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; SanchÃs Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542
Factoid question answering for spoken documents
In this dissertation, we present a factoid question answering system, specifically tailored for Question Answering (QA) on spoken documents.
This work explores, for the first time, which techniques can be robustly adapted from the usual QA on written documents to the more difficult spoken documents scenario. More specifically, we study new information retrieval (IR) techniques designed for speech, and utilize several levels of linguistic information for the speech-based QA task. These include named-entity detection with phonetic information, syntactic parsing applied to speech transcripts, and the use of coreference resolution.
Our approach is largely based on supervised machine learning techniques, with special focus on the answer extraction step, and makes little use of handcrafted knowledge. Consequently, it should be easily adaptable to other domains and languages.
In the work resulting of this Thesis, we have impulsed and coordinated the creation of an evaluation framework for the task of QA on spoken documents. The framework, named QAst, provides multi-lingual corpora, evaluation questions, and answers key. These corpora have been used in the QAst evaluation that was held in the CLEF workshop for the years 2007, 2008 and 2009, thus helping the developing of state-of-the-art techniques for this particular topic.
The presentend QA system and all its modules are extensively evaluated on the European Parliament Plenary Sessions
English corpus composed of manual transcripts and automatic transcripts obtained by three different Automatic Speech Recognition (ASR) systems that exhibit significantly different word error rates. This data belongs to the CLEF 2009 track for QA on speech transcripts.
The main results confirm that syntactic information is very useful for learning to rank question candidates, improving results on both manual and automatic transcripts unless the ASR quality is very low. Overall, the performance of our system is comparable or better than the state-of-the-art on this corpus, confirming the validity of our approach.En aquesta Tesi, presentem un sistema de Question Answering (QA) factual, especialment ajustat per treballar amb documents orals.
En el desenvolupament explorem, per primera vegada, quines tècniques de les habitualment emprades en QA per documents escrit són suficientment robustes per funcionar en l'escenari més difÃcil de documents orals. Amb més especificitat, estudiem nous mètodes de Information Retrieval (IR) dissenyats per tractar amb la veu, i utilitzem diversos nivells d'informació linqüÃstica. Entre aquests s'inclouen, a saber: detecció de Named Entities utilitzant informació fonètica, "parsing" sintà ctic aplicat a transcripcions de veu, i també l'ús d'un sub-sistema de detecció i resolució de la correferència.
La nostra aproximació al problema es recolza en gran part en tècniques supervisades de Machine Learning, estant aquestes enfocades especialment cap a la part d'extracció de la resposta, i fa servir la menor quantitat possible de coneixement creat per humans. En conseqüència, tot el procés de QA pot ser adaptat a altres dominis o altres llengües amb relativa facilitat.
Un dels resultats addicionals de la feina darrere d'aquesta Tesis ha estat que hem impulsat i coordinat la creació d'un marc d'avaluació de la taska de QA en documents orals. Aquest marc de treball, anomenat QAst (Question Answering on Speech Transcripts), proporciona un corpus de documents orals multi-lingüe, uns conjunts de preguntes d'avaluació, i les respostes correctes d'aquestes. Aquestes dades han estat utilitzades en les evaluacionis QAst que han tingut lloc en el si de les conferències CLEF en els anys 2007, 2008 i 2009; d'aquesta manera s'ha promogut i ajudat a la creació d'un estat-de-l'art de tècniques adreçades a aquest problema en particular.
El sistema de QA que presentem i tots els seus particulars sumbòduls, han estat avaluats extensivament utilitzant el corpus EPPS (transcripcions de les Sessions Plenaries del Parlament Europeu) en anglès, que cónté transcripcions manuals de tots els discursos i també transcripcions automà tiques obtingudes mitjançant tres reconeixedors automà tics de la parla (ASR) diferents. Els reconeixedors tenen caracterÃstiques i resultats diferents que permetes una avaluació quantitativa i qualitativa de la tasca. Aquestes dades pertanyen a l'avaluació QAst del 2009.
Els resultats principals de la nostra feina confirmen que la informació sintà ctica és mol útil per aprendre automà ticament a valorar la plausibilitat de les respostes candidates, millorant els resultats previs tan en transcripcions manuals com transcripcions automà tiques, descomptat que la qualitat de l'ASR sigui molt baixa. En general, el rendiment del nostre sistema és comparable o millor que els altres sistemes pertanyents a l'estat-del'art, confirmant aixà la validesa de la nostra aproximació
Building chatbots from large scale domain-specific knowledge bases: challenges and opportunities
Popular conversational agents frameworks such as Alexa Skills Kit (ASK) and
Google Actions (gActions) offer unprecedented opportunities for facilitating
the development and deployment of voice-enabled AI solutions in various
verticals. Nevertheless, understanding user utterances with high accuracy
remains a challenging task with these frameworks. Particularly, when building
chatbots with large volume of domain-specific entities. In this paper, we
describe the challenges and lessons learned from building a large scale virtual
assistant for understanding and responding to equipment-related complaints. In
the process, we describe an alternative scalable framework for: 1) extracting
the knowledge about equipment components and their associated problem entities
from short texts, and 2) learning to identify such entities in user utterances.
We show through evaluation on a real dataset that the proposed framework,
compared to off-the-shelf popular ones, scales better with large volume of
entities being up to 30% more accurate, and is more effective in understanding
user utterances with domain-specific entities
The effect of component recognition on flexibility and speech recognition performance in a spoken question answering system
A spoken question answering system that recognizes questions as full sentences performs well when users ask one of the questions defined. A system that recognizes component words and finds an equivalent defined question might be more flexible, but is likely to have decreased speech recognition performance, leading to a loss in overall system success. The research described in this document compares the advantage in flexibility to the loss in recognition performance when using component recognition.
Questions posed by participants were processed by a system of each type. As expected, the component system made frequent recognition errors while detecting words (word error rate of 31%). In comparison, the full system made fewer errors while detecting full sentences (sentence error rate of 10%). Nevertheless, the component system succeeded in providing proper responses to 76% of the queries posed, while the full system responded properly to only 46%.
Four variations of the traditional tf-idf weighting method were compared as applied to the matching of short text strings (fewer than 10 words). It was found that the general approach was successful in finding matches, and that all four variations compensated for the loss in speech recognition performance to a similar degree. No significant difference due to the variations in weighting was detected in the results
A Survey on Conversational Search and Applications in Biomedicine
This paper aims to provide a radical rundown on Conversation Search
(ConvSearch), an approach to enhance the information retrieval method where
users engage in a dialogue for the information-seeking tasks. In this survey,
we predominantly focused on the human interactive characteristics of the
ConvSearch systems, highlighting the operations of the action modules, likely
the Retrieval system, Question-Answering, and Recommender system. We labeled
various ConvSearch research problems in knowledge bases, natural language
processing, and dialogue management systems along with the action modules. We
further categorized the framework to ConvSearch and the application is directed
toward biomedical and healthcare fields for the utilization of clinical social
technology. Finally, we conclude by talking through the challenges and issues
of ConvSearch, particularly in Bio-Medicine. Our main aim is to provide an
integrated and unified vision of the ConvSearch components from different
fields, which benefit the information-seeking process in healthcare systems
Experientially grounded language production: Advancing our understanding of semantic processing during lexical selection
Der Prozess der lexikalischen Selektion, d.h. die Auswahl der richtigen Wörter zur Übermittlung einer intendierten Botschaft, ist noch nicht hinreichend verstanden. Insbesondere wurde kaum erforscht, inwiefern Bedeutungsaspekte, welche in sensomotorischen Erfahrungen gründen, diesen Prozess
der Sprachproduktion beeinflussen. Die Rolle dieser Bedeutungsaspekte wurde mit zwei Studien
untersucht, in denen Probanden Sätze vervollständigten. In Studie 1 wurde der visuelle
Eindruck der Satzfragmente manipuliert, so dass die Sätze auf- oder absteigend am Bildschirm
erschienen. In Studie 2 mussten die Probanden Kopfbewegungen nach oben oder unten ausführen,
während sie die Satzfragmente hörten. Wir untersuchten, ob räumliche Aspekte der produzierten
Wörter durch die räumlichen Manipulationen sowie die räumlichen Eigenschaften der
präsentierten Satzfragmente beeinflusst werden. Die vertikale visuelle Manipulation in Studie
1 wirkte sich nicht auf die räumlichen Attribute der produzierten Wörter aus. Die Kopfbewegungen
in Studie 2 führten zu einem solchen Effekt – bei Kopfbewegungen nach oben waren die Referenten der produzierten Wörter weiter oben im Raum angesiedelt als
nach Bewegungen nach unten (und anders herum). Darüber hinaus war dieser Effekt stärker, je
ausgeprägter die interozeptive Sensibilität der Probanden war.
Zudem beeinflussten die räumlichen
Aspekte der Satzfragmente die räumlichen Eigenschaften der produzierten Wörter in beiden
Studien. Somit zeigt diese Arbeit, dass in der Erfahrung basierende Bedeutungsanteile, welche
entweder in Sprache eingebettet sind oder durch körperliche Aktivität reaktiviert werden, die
Auswahl der Wörter beim Sprechen beeinflussen und dass interindividuelle Unterschiede diesen
Effekt modulieren. Die Befunde werden in Bezug zu Theorien der Semantik gesetzt.
Darüber hinaus wird das Methodenrepertoire erweitert, indem mit Studie 3 ein Ansatz für die Durchführung von Online-Sprachproduktionsexperimenten mit Bildbenennung vorgestellt wird.The process of lexical selection, i.e. producing the right words to get an intended
message across, is not well understood. Specifically, meaning aspects grounded in sensorimotor experiences and their role during lexical selection have not been investigated widely. Here, we investigated the role of experientially grounded meaning aspects with two studies in which participants had to produce a noun to complete sentences which described sceneries.
In Study 1, the visual appearance of sentence fragments was manipulated and they seemed to move upwards or downwards on screen.
In Study 2, participants moved their head up- or downwards while listening to sentence fragments.
We investigated whether the spatial properties of the freely chosen nouns are influenced
by the spatial manipulations as well as by the spatial properties of the sentences. The vertical
visual manipulation used in Study 1 did not influence the spatial properties of the produced
words. However, the body movements in Study 2 influenced participants’ lexical choices, i.e.
after up-movements the referents of the produced words were higher up compared to after downward movements (and vice verse). Furthermore, there was an increased effect of movement on
the spatial properties of the produced nouns with higher levels of participants’ interoceptive sensibility. Additionally, the spatial properties of the stimulus sentences influenced the spatial properties of the produced words in both studies.
Thus, experientially grounded meaning aspects which are either embedded in text or reactivated via bodily manipulations may influence which words we chose when speaking, and interindividual differences may moderate these effects. The findings are related to current theories of semantics.
Furthermore, this dissertation enhances the methodological repertoire of language production
researchers by showing how language production studies with overt articulation in picture naming tasks can be run online (Study 3)
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
Designing Service-Oriented Chatbot Systems Using a Construction Grammar-Driven Natural Language Generation System
Service oriented chatbot systems are used to inform users in a conversational manner about a particular service or
product on a website. Our research shows that current systems are time consuming to build and not very accurate or satisfying to users. We find that natural language understanding and natural language generation methods are central to creating an e�fficient and useful system. In this thesis we investigate current and past methods in this research area and place particular emphasis on Construction Grammar and its computational implementation. Our research shows that users have strong emotive reactions to how these systems behave, so we also investigate the human computer interaction component. We present three systems (KIA, John and KIA2), and carry out extensive user tests on all of them, as well as comparative tests. KIA is built using existing methods, John is built with the user in mind and KIA2 is built using the construction grammar method. We found that the construction grammar approach performs well in service oriented chatbots systems, and that users preferred it over other systems
- …