566 research outputs found

    On the voice-activated question answering

    Full text link
    [EN] Question answering (QA) is probably one of the most challenging tasks in the field of natural language processing. It requires search engines that are capable of extracting concise, precise fragments of text that contain an answer to a question posed by the user. The incorporation of voice interfaces to the QA systems adds a more natural and very appealing perspective for these systems. This paper provides a comprehensive description of current state-of-the-art voice-activated QA systems. Finally, the scenarios that will emerge from the introduction of speech recognition in QA will be discussed. © 2006 IEEE.This work was supported in part by Research Projects TIN2009-13391-C04-03 and TIN2008-06856-C05-02. This paper was recommended by Associate Editor V. Marik.Rosso, P.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2012). On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews. 42(1):75-85. https://doi.org/10.1109/TSMCC.2010.2089620S758542

    Language modelization and categorization for voice-activated QA

    Full text link
    The interest of the incorporation of voice interfaces to the Question Answering systems has increased in recent years. In this work, we present an approach to the Automatic Speech Recognition component of a Voice-Activated Question Answering system, focusing our interest in building a language model able to include as many relevant words from the document repository as possible, but also representing the general syntactic structure of typical questions. We have applied these technique to the recognition of questions of the CLEF QA 2003-2006 contests.Work partially supported by the Spanish MICINN under contract TIN2008-06856-C05-02, and by the Vicerrectorat d’Investigació, Desenvolupament i Innovació of the Universitat Politècnica de València under contract 20100982.Pastor Pellicer, J.; Hurtado Oliver, LF.; Segarra Soriano, E.; Sanchís Arnal, E. (2011). Language modelization and categorization for voice-activated QA. En Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. Springer Verlag (Germany). 7042(7042):475-482. https://doi.org/10.1007/978-3-642-25085-9_56S47548270427042Akiba, T., Itou, K., Fujii, A.: Language model adaptation for fixed phrases by amplifying partial n-gram sequences. Systems and Computers in Japan 38(4), 63–73 (2007)Atserias, J., Casas, B., Comelles, E., Gónzalez, M., Padró, L., Padró, M.: Freeling 1.3: Five years of open-source language processing tools. In: Proceedings of the 5th International Conference on Language Resources and Evaluation (2006)Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: An open-source suite of language analyzers. In: Proceedings of the 4th Language Resources and Evaluation Conference (2004)Castro-Bleda, M.J., España-Boquera, S., Marzal, A., Salvador, I.: Grapheme-to-phoneme conversion for the spanish language. In: Pattern Recognition and Image Analysis. Proceedings of the IX Spanish Symposium on Pattern Recognition and Image Analysis, pp. 397–402. Asociación Española de Reconocimiento de Formas y Análisis de Imágenes, Benicàssim (2001)Chu-Carroll, J., Prager, J.: An experimental study of the impact of information extraction accuracy on semantic search performance. In: Proceedings of the Sixteenth ACM Conference on Information and Knowledge Management, CIKM 2007, pp. 505–514. ACM (2007)Harabagiu, S., Moldovan, D., Picone, J.: Open-domain voice-activated question answering. In: Proceedings of the 19th International Conference on Computational Linguistics, COLING 2002, vol. 1, pp. 1–7. Association for Computational Linguistics (2002)Kim, D., Furui, S., Isozaki, H.: Language models and dialogue strategy for a voice QA system. In: 18th International Congress on Acoustics, Kyoto, Japan, pp. 3705–3708 (2004)Mishra, T., Bangalore, S.: Speech-driven query retrieval for question-answering. In: 2010 IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5318–5321. IEEE (2010)Padró, L., Collado, M., Reese, S., Lloberes, M., Castellón, I.: Freeling 2.1: Five years of open-source language processing tools. In: Proceedings of 7th Language Resources and Evaluation Conference (2010)Rosso, P., Hurtado, L.F., Segarra, E., Sanchis, E.: On the voice-activated question answering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews PP(99), 1–11 (2010)Sanchis, E., Buscaldi, D., Grau, S., Hurtado, L., Griol, D.: Spoken QA based on a Passage Retrieval engine. In: IEEE-ACL Workshop on Spoken Language Technology, Aruba, pp. 62–65 (2006

    Simulated role-playing from crowdsourced data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (p. 173-178).Collective Artificial Intelligence (CAl) simulates human intelligence from data contributed by many humans, mined for inter-related patterns. This thesis applies CAI to social role-playing, introducing an end-to-end process for compositing recorded performances from thousands of humans, and simulating open-ended interaction from this data. The CAI process combines crowdsourcing, pattern discovery, and case-based planning. Content creation is crowdsourced by recording role-players online. Browser-based tools allow nonexperts to annotate data, organizing content into a hierarchical narrative structure. Patterns discovered from data power a novel system combining plan recognition with case-based planning. The combination of this process and structure produces a new medium, which exploits a massive corpus to realize characters who interact and converse with humans. This medium enables new experiences in videogames, and new classes of training simulations, therapeutic applications, and social robots. While advances in graphics support incredible freedom to interact physically in simulations, current approaches to development restrict simulated social interaction to hand-crafted branches that do not scale to the thousands of possible patterns of actions and utterances observed in actual human interaction. There is a tension between freedom and system comprehension due to two bottlenecks, making open-ended social interaction a challenge. First is the authorial effort entailed to cover all possible inputs. Second, like other cognitive processes, imagination is a bounded resource. Any individual author only has so much imagination. The convergence of advances in connectivity, storage, and processing power is bringing people together in ways never before possible, amplifying the imagination of individuals by harnessing the creativity and productivity of the crowd, revolutionizing how we create media, and what media we can create. By embracing data-driven approaches, and capitalizing on the creativity of the crowd, authoring bottlenecks can be overcome, taking a step toward realizing a medium that robustly supports player choice. Doing so requires rethinking both technology and division of labor in media production. As a proof of concept, a CAI system has been evaluated by recording over 10,000 performances in The Restaurant Game, automating an Al-controlled waitress who interacts in the world, and converses with a human via text or speech. Quantitative results demonstrate how CAI supports significantly more open-ended interaction with humans, while focus groups reveal factors for improving engagement.by Jeffrey David Orkin.Ph.D

    Preference Learning for Machine Translation

    Get PDF
    Automatic translation of natural language is still (as of 2017) a long-standing but unmet promise. While advancing at a fast rate, the underlying methods are still far from actually being able to reliably capture syntax or semantics of arbitrary utterances of natural language, way off transporting the encoded meaning into a second language. However, it is possible to build useful translating machines when the target domain is well known and the machine is able to learn and adapt efficiently and promptly from new inputs. This is possible thanks to efficient and effective machine learning methods which can be applied to automatic translation. In this work we present and evaluate methods for three distinct scenarios: a) We develop algorithms that can learn from very large amounts of data by exploiting pairwise preferences defined over competing translations, which can be used to make a machine translation system robust to arbitrary texts from varied sources, but also enable it to learn effectively to adapt to new domains of data; b) We describe a method that is able to efficiently learn external models which adhere to fine-grained preferences that are extracted from a constricted selection of translated material, e.g. for adapting to users or groups of users in a computer-aided translation scenario; c) We develop methods for two machine translation paradigms, neural- and traditional statistical machine translation, to directly adapt to user-defined preferences in an interactive post-editing scenario, learning precisely adapted machine translation systems. In all of these settings, we show that machine translation can be made significantly more useful by careful optimization via preference learning

    The evolution of language: Proceedings of the Joint Conference on Language Evolution (JCoLE)

    Get PDF

    Adversarial content manipulation for analyzing and improving model robustness

    Get PDF
    The recent rapid progress in machine learning systems has opened up many real-world applications --- from recommendation engines on web platforms to safety critical systems like autonomous vehicles. A model deployed in the real-world will often encounter inputs far from its training distribution. For example, a self-driving car might come across a black stop sign in the wild. To ensure safe operation, it is vital to quantify the robustness of machine learning models to such out-of-distribution data before releasing them into the real-world. However, the standard paradigm of benchmarking machine learning models with fixed size test sets drawn from the same distribution as the training data is insufficient to identify these corner cases efficiently. In principle, if we could generate all valid variations of an input and measure the model response, we could quantify and guarantee model robustness locally. Yet, doing this with real world data is not scalable. In this thesis, we propose an alternative, using generative models to create synthetic data variations at scale and test robustness of target models to these variations. We explore methods to generate semantic data variations in a controlled fashion across visual and text modalities. We build generative models capable of performing controlled manipulation of data like changing visual context, editing appearance of an object in images or changing writing style of text. Leveraging these generative models we propose tools to study robustness of computer vision systems to input variations and systematically identify failure modes. In the text domain, we deploy these generative models to improve diversity of image captioning systems and perform writing style manipulation to obfuscate private attributes of the user. Our studies quantifying model robustness explore two kinds of input manipulations, model-agnostic and model-targeted. The model-agnostic manipulations leverage human knowledge to choose the kinds of changes without considering the target model being tested. This includes automatically editing images to remove objects not directly relevant to the task and create variations in visual context. Alternatively, in the model-targeted approach the input variations performed are directly adversarially guided by the target model. For example, we adversarially manipulate the appearance of an object in the image to fool an object detector, guided by the gradients of the detector. Using these methods, we measure and improve the robustness of various computer vision systems -- specifically image classification, segmentation, object detection and visual question answering systems -- to semantic input variations.Der schnelle Fortschritt von Methoden des maschinellen Lernens hat viele neue Anwendungen ermöglicht – von Recommender-Systemen bis hin zu sicherheitskritischen Systemen wie autonomen Fahrzeugen. In der realen Welt werden diese Systeme oft mit Eingaben außerhalb der Verteilung der Trainingsdaten konfrontiert. Zum Beispiel könnte ein autonomes Fahrzeug einem schwarzen Stoppschild begegnen. Um sicheren Betrieb zu gewährleisten, ist es entscheidend, die Robustheit dieser Systeme zu quantifizieren, bevor sie in der Praxis eingesetzt werden. Aktuell werden diese Modelle auf festen Eingaben von derselben Verteilung wie die Trainingsdaten evaluiert. Allerdings ist diese Strategie unzureichend, um solche Ausnahmefälle zu identifizieren. Prinzipiell könnte die Robustheit “lokal” bestimmt werden, indem wir alle zulässigen Variationen einer Eingabe generieren und die Ausgabe des Systems überprüfen. Jedoch skaliert dieser Ansatz schlecht zu echten Daten. In dieser Arbeit benutzen wir generative Modelle, um synthetische Variationen von Eingaben zu erstellen und so die Robustheit eines Modells zu überprüfen. Wir erforschen Methoden, die es uns erlauben, kontrolliert semantische Änderungen an Bild- und Textdaten vorzunehmen. Wir lernen generative Modelle, die kontrollierte Manipulation von Daten ermöglichen, zum Beispiel den visuellen Kontext zu ändern, die Erscheinung eines Objekts zu bearbeiten oder den Schreibstil von Text zu ändern. Basierend auf diesen Modellen entwickeln wir neue Methoden, um die Robustheit von Bilderkennungssystemen bezüglich Variationen in den Eingaben zu untersuchen und Fehlverhalten zu identifizieren. Im Gebiet von Textdaten verwenden wir diese Modelle, um die Diversität von sogenannten Automatische Bildbeschriftung-Modellen zu verbessern und Schreibtstil-Manipulation zu erlauben, um private Attribute des Benutzers zu verschleiern. Um die Robustheit von Modellen zu quantifizieren, werden zwei Arten von Eingabemanipulationen untersucht: Modell-agnostische und Modell-spezifische Manipulationen. Modell-agnostische Manipulationen basieren auf menschlichem Wissen, um bestimmte Änderungen auszuwählen, ohne das entsprechende Modell miteinzubeziehen. Dies beinhaltet das Entfernen von für die Aufgabe irrelevanten Objekten aus Bildern oder Variationen des visuellen Kontextes. In dem alternativen Modell-spezifischen Ansatz werden Änderungen vorgenommen, die für das Modell möglichst ungünstig sind. Zum Beispiel ändern wir die Erscheinung eines Objekts um ein Modell der Objekterkennung täuschen. Dies ist durch den Gradienten des Modells möglich. Mithilfe dieser Werkzeuge können wir die Robustheit von Systemen zur Bildklassifizierung oder -segmentierung, Objekterkennung und Visuelle Fragenbeantwortung quantifizieren und verbessern
    corecore