77 research outputs found
Learning to Map Natural Language to Executable Programs Over Databases
Natural language is a fundamental form of information and communication and is becoming the next frontier in computer interfaces. As the amount of data available online has increased exponentially, so has the need for Natural Language Interfaces (NLIs, which is not used for natural language inference in this thesis) to connect the data and the user by easily using natural language, significantly promoting the possibility and efficiency of information access for many users besides data experts. All consumer-facing software will one day have a dialogue interface, and this is the next vital leap in the evolution of search engines. Such intelligent dialogue systems should understand the meaning of language grounded in various contexts and generate effective language responses in different forms for information requests and human-computer communication.Developing these intelligent systems is challenging due to (1) limited benchmarks to drive advancements, (2) alignment mismatches between natural language and formal programs, (3) lack of trustworthiness and interpretability, (4) context dependencies in both human conversational interactions and the target programs, and (5) joint language understanding between dialog questions and NLI environments (e.g. databases and knowledge graphs). This dissertation presents several datasets, neural algorithms, and language models to address these challenges for developing deep learning technologies for conversational natural language interfaces (more specifically, NLIs to Databases or NLIDB). First, to drive advancements towards neural-based conversational NLIs, we design and propose several complex and cross-domain NLI benchmarks, along with introducing several datasets. These datasets enable training large, deep learning models. The evaluation is done on unseen databases. (e.g., about course arrangement). Systems must generalize well to not only new SQL queries but also to unseen database schemas to perform well on these tasks. Furthermore, in real-world applications, users often access information in a multi-turn interaction with the system by asking a sequence of related questions. The users may explicitly refer to or omit previously mentioned entities and constraints and may introduce refinements, additions, or substitutions to what has already been said. Therefore, some of them require systems to model dialog dynamics and generate natural language explanations for user verification. The full dialogue interaction with the system’s responses is also important as this supports clarifying ambiguous questions, verifying returned results, and notifying users of unanswerable or unrelated questions. A robust dialogue-based NLI system that can engage with users by forming its responses has thus become an increasingly necessary component for the query process. Moreover, this thesis presents the development of scalable algorithms designed to parse complex and sequential questions to formal programs (e.g., mapping questions to SQL queries that can execute against databases). We propose a novel neural model that utilizes type information from knowledge graphs to better understand rare entities and numbers in natural language questions. We also introduce a neural model based on syntax tree neural networks, which was the first methodology proposed for generating complex programs from language. Finally, language modeling creates contextualized vector representations of words by training a model to predict the next word given context words, which are the basis of deep learning for NLP. Recently, pre-trained language models such as BERT and RoBERTa achieve tremendous success in many natural language processing tasks such as text understanding and reading comprehension. However, most language models are pre-trained only on free-text such as Wikipedia articles and Books. Given that language in semantic parsing is usually related to some formal representations such as logic forms and SQL queries and has to be grounded in structural environments (e.g., databases), we propose better language models for NLIs by enforcing such compositional interpolation in them. To show they could better jointly understand dialog questions and NLI environments (e.g. databases and knowledge graphs), we show that these language models achieve new state-of-the-art results for seven representative tasks on semantic parsing, dialogue state tracking, and question answering. Also, our proposed pre-training method is much more effective than other prior work
Recommended from our members
The design of speech-based automated mobile phone services using interface metaphors
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Interface metaphor is a widely used design technique for interactive computer systems. The advantages of using interface metaphors derive from their ability to promote active learning, which enables a user to transfer knowledge from a familiar real world domain, to an unfamiliar computing domain. Interface metaphor is not currently used for the design of automated phone services, and it was the aim of this thesis to examine whether interface metaphor could improve the usability of speech-activated automated mobile phone services. A human-centred design methodology was followed to generate, select, and develop potential metaphors, which were used to implement metaphor-based phone services. An experimental methodology was then used to compare the usability of the metaphor-based services with the usability of currently available number-based phone services. The first experiment examined the effect of three different interface metaphors on the usability of a mobile city guide service. Usability was measured as a range of performance and attitude measures, and was supplemented by telephone interview data. After three consecutive days of usage, participants both preferred, and performed better with, the service that was based on an office filing system metaphor. Experiment two was conducted over a six week period, and investigated the effect of users' individual differences, and the context of use, on the usability of both the office filing system metaphor-based service, and a non-metaphor service. The results showed that performance with the metaphor-based service was significantly better than performance with the non-metaphor service. The usability of the metaphor-based service was not significantly affected by users' individual characteristics and aptitudes, whereas the number-based service was, suggesting that metaphor-based services may be more usable for a wider range of potential users. Usability levels for both services were found to be consistent across both private and public locations of use, suggesting that speech-activated mobile phone services provide a flexible means of information access. Experiment three investigated the strategies used by participants when interacting with mobile phone services, specifically the visualisation strategy that was used by two thirds of the metaphor-based service participants in experiment two. In addition to the attitude and performance measures used for experiments one and two, face-to face interviews were conducted with participants. The results indicated that significantly more participants visualised the metaphor-based services relative to a non-metaphor service, and that visualisation of the service structure led to significant performance improvements. This thesis has demonstrated the usability benefits of interface metaphor as a design technique for speech-based mobile phone services. These benefits of metaphor appear to derive from their ability to provide a mental model of the phone service that can be visualised, and their ability to accommodate the individual differences of users
Students´ language in computer-assisted tutoring of mathematical proofs
Truth and proof are central to mathematics. Proving (or disproving) seemingly simple statements often turns out to be one of the hardest mathematical tasks. Yet, doing proofs is rarely taught in the classroom. Studies on cognitive difficulties in learning to do proofs have shown that pupils and students not only often do not understand or cannot apply basic formal reasoning techniques and do not know how to use formal mathematical language, but, at a far more fundamental level, they also do not understand what it means to prove a statement or even do not see the purpose of proof at all. Since insight into the importance of proof and doing proofs as such cannot be learnt other than by practice, learning support through individualised tutoring is in demand.
This volume presents a part of an interdisciplinary project, set at the intersection of pedagogical science, artificial intelligence, and (computational) linguistics, which investigated issues involved in provisioning computer-based tutoring of mathematical proofs through dialogue in natural language. The ultimate goal in this context, addressing the above-mentioned need for learning support, is to build intelligent automated tutoring systems for mathematical proofs. The research presented here has been focused on the language that students use while interacting with such a system: its linguistic propeties and computational modelling. Contribution is made at three levels: first, an analysis of language phenomena found in students´ input to a (simulated) proof tutoring system is conducted and the variety of students´ verbalisations is quantitatively assessed, second, a general computational processing strategy for informal mathematical language and methods of modelling prominent language phenomena are proposed, and third, the prospects for natural language as an input modality for proof tutoring systems is evaluated based on collected corpora
Recommended from our members
Recurrent Neural Network Language Generation for Dialogue Systems
Language is the principal medium for ideas, while dialogue is the most natural and effective way for humans to interact with and access information from machines. Natural language generation (NLG) is a critical component of spoken dialogue and it has a significant impact on usability and perceived quality. Many commonly used NLG systems employ rules and heuristics, which tend to generate inflexible and stylised responses without the natural variation of human language. However, the frequent repetition of identical output forms can quickly make dialogue become tedious for most real-world users. Additionally, these rules and heuristics are not scalable and hence not trivially extensible to other domains or languages. A statistical approach to language generation can learn language decisions directly from data without relying on hand-coded rules or heuristics, which brings scalability and flexibility to NLG. Statistical models also provide an opportunity to learn in-domain human colloquialisms and cross-domain model adaptations.
A robust and quasi-supervised NLG model is proposed in this thesis. The model leverages a Recurrent Neural Network (RNN)-based surface realiser and a gating mechanism applied to input semantics. The model is motivated by the Long-Short Term Memory (LSTM) network. The RNN-based surface realiser and gating mechanism use a neural network to learn end-to-end language generation decisions from input dialogue act and sentence pairs; it also integrates sentence planning and surface realisation into a single optimisation problem. The single optimisation not only bypasses the costly intermediate linguistic annotations but also generates more natural and human-like responses. Furthermore, a domain adaptation study shows that the proposed model can be readily adapted and extended to new dialogue domains via a proposed recipe.
Continuing the success of end-to-end learning, the second part of the thesis speculates on building an end-to-end dialogue system by framing it as a conditional generation problem. The proposed model encapsulates a belief tracker with a minimal state representation and a generator that takes the dialogue context to produce responses. These features suggest comprehension and fast learning. The proposed model is capable of understanding requests and accomplishing tasks after training on only a few hundred human-human dialogues. A complementary Wizard-of-Oz data collection method is also introduced to facilitate the collection of human-human conversations from online workers. The results demonstrate that the proposed model can talk to human judges naturally, without any difficulty, for a sample application domain. In addition, the results also suggest that the introduction of a stochastic latent variable can help the system model intrinsic variation in communicative intention much better.Tsung-Hsien Wen's Ph.D. is supported by Toshiba Research Europe Ltd, Cambridge Research Laborator
Spoken dialogue systems: architectures and applications
171 p.Technology and technological devices have become habitual and omnipresent. Humans need to learn tocommunicate with all kind of devices. Until recently humans needed to learn how the devices expressthemselves to communicate with them. But in recent times the tendency has become to makecommunication with these devices in more intuitive ways. The ideal way to communicate with deviceswould be the natural way of communication between humans, the speech. Humans have long beeninvestigating and designing systems that use this type of communication, giving rise to the so-calledSpoken Dialogue Systems.In this context, the primary goal of the thesis is to show how these systems can be implemented.Additionally, the thesis serves as a review of the state-of-the-art regarding architectures and toolkits.Finally, the thesis is intended to serve future system developers as a guide for their construction. For that
Analysis and automatic identification of spontaneous emotions in speech from human-human and human-machine communication
383 p.This research mainly focuses on improving our understanding of human-human and human-machineinteractions by analysing paricipants¿ emotional status. For this purpose, we have developed andenhanced Speech Emotion Recognition (SER) systems for both interactions in real-life scenarios,explicitly emphasising the Spanish language. In this framework, we have conducted an in-depth analysisof how humans express emotions using speech when communicating with other persons or machines inactual situations. Thus, we have analysed and studied the way in which emotional information isexpressed in a variety of true-to-life environments, which is a crucial aspect for the development of SERsystems. This study aimed to comprehensively understand the challenge we wanted to address:identifying emotional information on speech using machine learning technologies. Neural networks havebeen demonstrated to be adequate tools for identifying events in speech and language. Most of themaimed to make local comparisons between some specific aspects; thus, the experimental conditions weretailored to each particular analysis. The experiments across different articles (from P1 to P19) are hardlycomparable due to our continuous learning of dealing with the difficult task of identifying emotions inspeech. In order to make a fair comparison, additional unpublished results are presented in the Appendix.These experiments were carried out under identical and rigorous conditions. This general comparisonoffers an overview of the advantages and disadvantages of the different methodologies for the automaticrecognition of emotions in speech
The design of speech-based automated mobile phone services using interface metaphors
Interface metaphor is a widely used design technique for interactive computer systems. The advantages of using interface metaphors derive from their ability to promote active learning, which enables a user to transfer knowledge from a familiar real world domain, to an unfamiliar computing domain. Interface metaphor is not currently used for the design of automated phone services, and it was the aim of this thesis to examine whether interface metaphor could improve the usability of speech-activated automated mobile phone services. A human-centred design methodology was followed to generate, select, and develop potential metaphors, which were used to implement metaphor-based phone services. An experimental methodology was then used to compare the usability of the metaphor-based services with the usability of currently available number-based phone services. The first experiment examined the effect of three different interface metaphors on the usability of a mobile city guide service. Usability was measured as a range of performance and attitude measures, and was supplemented by telephone interview data. After three consecutive days of usage, participants both preferred, and performed better with, the service that was based on an office filing system metaphor. Experiment two was conducted over a six week period, and investigated the effect of users' individual differences, and the context of use, on the usability of both the office filing system metaphor-based service, and a non-metaphor service. The results showed that performance with the metaphor-based service was significantly better than performance with the non-metaphor service. The usability of the metaphor-based service was not significantly affected by users' individual characteristics and aptitudes, whereas the number-based service was, suggesting that metaphor-based services may be more usable for a wider range of potential users. Usability levels for both services were found to be consistent across both private and public locations of use, suggesting that speech-activated mobile phone services provide a flexible means of information access. Experiment three investigated the strategies used by participants when interacting with mobile phone services, specifically the visualisation strategy that was used by two thirds of the metaphor-based service participants in experiment two. In addition to the attitude and performance measures used for experiments one and two, face-to face interviews were conducted with participants. The results indicated that significantly more participants visualised the metaphor-based services relative to a non-metaphor service, and that visualisation of the service structure led to significant performance improvements. This thesis has demonstrated the usability benefits of interface metaphor as a design technique for speech-based mobile phone services. These benefits of metaphor appear to derive from their ability to provide a mental model of the phone service that can be visualised, and their ability to accommodate the individual differences of users.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Spoken dialogue systems: architectures and applications
171 p.Technology and technological devices have become habitual and omnipresent. Humans need to learn tocommunicate with all kind of devices. Until recently humans needed to learn how the devices expressthemselves to communicate with them. But in recent times the tendency has become to makecommunication with these devices in more intuitive ways. The ideal way to communicate with deviceswould be the natural way of communication between humans, the speech. Humans have long beeninvestigating and designing systems that use this type of communication, giving rise to the so-calledSpoken Dialogue Systems.In this context, the primary goal of the thesis is to show how these systems can be implemented.Additionally, the thesis serves as a review of the state-of-the-art regarding architectures and toolkits.Finally, the thesis is intended to serve future system developers as a guide for their construction. For that
- …