112 research outputs found

    Building Machines That Learn and Think Like People

    Get PDF
    Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar

    The direction of technical change in AI and the trajectory effects of government funding

    Get PDF
    Government funding of innovation can have a significant impact not only on the rate of technical change, but also on its direction. In this paper, we examine the role that government grants and government departments played in the development of artificial intelligence (AI), an emergent general purpose technology with the potential to revolutionize many aspects of the economy and society. We analyze all AI patents filed at the US Patent and Trademark Office and develop network measures that capture each patent’s influence on all possible sequences of follow-on innovation. By identifying the effect of patents on technological trajectories, we are able to account for the long-term cumulative impact of new knowledge that is not captured by standard patent citation measures. We show that patents funded by government grants, but above all patents filed by federal agencies and state departments, profoundly influenced the development of AI. These long-term effects were especially significant in early phases, and weakened over time as private incentives took over. These results are robust to alternative specifications and controlling for endogeneity

    Towards learning sentence representation with self-supervision

    Full text link
    Ces dernières années, il y a eu un intérêt croissant dans le domaine de l'apprentissage profond pour le traitement du langage naturel. Plusieurs étapes importantes ont été franchies au cours de la dernière décennie dans divers problèmes, tels que les systèmes de questions-réponses, le résumé de texte, l'analyse des sentiments, etc. Le pré-entraînement des modèles de langage dans une manière auto-supervisé est une partie importante de ces réalisations. Cette thèse explore un ensemble de méthodes auto-supervisées pour apprendre des représentations de phrases à partir d'une grande quantité de données non étiquetées. Nous introduisons également un nouveau modèle de mémoire augmentée pour apprendre des représentations basées sur une structure d'arbre. Nous évaluons et analysons ces représentations sur différentes tâches. Dans le chapitre 1, nous introduisons les bases des réseaux neuronaux avant et des réseaux neuronaux récurrents. Le chapitre se poursuit avec la discussion de l'algorithme de rétropropagation pour former les réseaux neuronaux de flux avant, et la rétropropagation à travers l'algorithme de temps pour former les réseaux neuronaux récurrents. Nous discutons également de trois approches différentes dans le domaine de l’apprentissage de représentations, notamment l'apprentissage supervisé, l'apprentissage non supervisé et une approche relativement nouvelle appelée apprentissage auto-supervisé. Dans le chapitre 2, nous discutons des principes fondamentaux du traitement automatique du langage naturel profond. Plus précisément, nous couvrons les représentations de mots, les représentations de phrases et la modélisation du langage. Nous nous concentrons sur l'évaluation et l'état actuel de la littérature pour ces concepts. Nous finissons le chapitre en discutant le pré-entraînement à grande échelle et le transfert de l’apprentissage dans la langue. Dans le chapitre 3, nous étudions un ensemble de tâches auto-supervisées qui prend avantage de l’estimation contrastive bruitée afin d'apprendre des représentations de phrases à l'aide de données non étiquetées. Nous entraînons notre modèle sur un grand corpus et évaluons nos représentations de phrases apprises sur un ensemble de tâches du langage naturel en aval provenant du cadre SentEval. Notre modèle entraîné sur les tâches proposées surpasse les méthodes non-supervisées sur un sous-ensemble de tâches de SentEval. Dans les chapitres 4, nous introduisons un modèle de mémoire augmentée appelé Ordered Memory, qui présente plusieurs améliorations par rapport aux réseaux de neurones récurrents augmentés par pile traditionnels. Nous introduisons un nouveau mécanisme d'attention de Stick-breaking inspiré par les Ordered Neurons [shen et. al., 2019] pour écrire et effacer la mémoire. Une nouvelle cellule récursive à portes est également introduite pour composer des représentations de bas niveau en des représentations de haut niveau. Nous montrons que ce modèle fonctionne bien sur la tâche d'inférence logique et la tâche ListOps, et il montre également de fortes propriétés de généralisation dans ces tâches. Enfin, nous évaluons notre modèle sur les tâches (binaire et multi-classe) SST (Stanford Sentiment Treebank) et rapportons des résultats comparables à l’état de l’art sur ces tâches.In chapter 1, we introduce the basics of feed forward neural networks and recurrent neural networks. The chapter continues with the discussion of the backpropagation algorithm to train feed forward neural networks, and the backpropagation through time algorithm to train recurrent neural networks. We also discuss three different approaches in learning representations, namely supervised learning, unsupervised learning, and a relatively new approach called self-supervised learning. In chapter 2, we talk about the fundamentals of deep natural language processing. Specifically, we cover word representations, sentence representations, and language modelling. We focus on the evaluation and current state of the literature for these concepts. We close the chapter by discussing large scale pre-training and transfer learning in language. In chapter 3, we investigate a set of self-supervised tasks that take advantage of noise contrastive estimation in order to learn sentence representations using unlabeled data. We train our model on a large corpora and evaluate our learned sentence representations on a set of downstream natural language tasks from the SentEval framework. Our model trained on the proposed tasks outperforms unsupervised methods on a subset of tasks from SentEval. In chapter 4, we introduce a memory augmented model called Ordered Memory with several improvements over traditional stack-augmented recurrent neural networks. We introduce a new Stick-breaking attention mechanism inspired by Ordered Neurons [Shen et.al., 2019] to write in and erase from the memory. A new Gated Recursive Cell is also introduced to compose low level representations into higher level ones. We show that this model performs well on the logical inference task and the ListOps task, and it also shows strong generalization properties in these tasks. Finally, we evaluate our model on the SST (Stanford Sentiment Treebank) tasks (binary and fine-grained) and report results that are comparable with state-of-the-art on these tasks

    Evolutionary design of deep neural networks

    Get PDF
    Mención Internacional en el título de doctorFor three decades, neuroevolution has applied evolutionary computation to the optimization of the topology of artificial neural networks, with most works focusing on very simple architectures. However, times have changed, and nowadays convolutional neural networks are the industry and academia standard for solving a variety of problems, many of which remained unsolved before the discovery of this kind of networks. Convolutional neural networks involve complex topologies, and the manual design of these topologies for solving a problem at hand is expensive and inefficient. In this thesis, our aim is to use neuroevolution in order to evolve the architecture of convolutional neural networks. To do so, we have decided to try two different techniques: genetic algorithms and grammatical evolution. We have implemented a niching scheme for preserving the genetic diversity, in order to ease the construction of ensembles of neural networks. These techniques have been validated against the MNIST database for handwritten digit recognition, achieving a test error rate of 0.28%, and the OPPORTUNITY data set for human activity recognition, attaining an F1 score of 0.9275. Both results have proven very competitive when compared with the state of the art. Also, in all cases, ensembles have proven to perform better than individual models. Later, the topologies learned for MNIST were tested on EMNIST, a database recently introduced in 2017, which includes more samples and a set of letters for character recognition. Results have shown that the topologies optimized for MNIST perform well on EMNIST, proving that architectures can be reused across domains with similar characteristics. In summary, neuroevolution is an effective approach for automatically designing topologies for convolutional neural networks. However, it still remains as an unexplored field due to hardware limitations. Current advances, however, should constitute the fuel that empowers the emergence of this field, and further research should start as of today.This Ph.D. dissertation has been partially supported by the Spanish Ministry of Education, Culture and Sports under FPU fellowship with identifier FPU13/03917. This research stay has been partially co-funded by the Spanish Ministry of Education, Culture and Sports under FPU short stay grant with identifier EST15/00260.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: María Araceli Sanchís de Miguel.- Secretario: Francisco Javier Segovia Pérez.- Vocal: Simon Luca

    Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach

    Full text link
    [ES] El Análisis de la Estructura de Documentos (Document Layout Analysis), aplicado a documentos manuscritos, tiene como objetivo obtener automáticamente la estructura intrínseca de dichos documentos. Su desarrollo como campo de investigación se extiende desde los sistemas de segmentación de caracteres desarrollados a principios de la década de 1960 hasta los sistemas complejos desarrollados en la actualidad, donde el objetivo es analizar estructuras de alto nivel (líneas de texto, párrafos, tablas, etc.) y la relación que existe entre ellas. Esta tesis, en primer lugar, define el objetivo del Análisis de la Estructura de Documentos desde una perspectiva probabilística. A continuación, la complejidad del problema se reduce a un conjunto de subproblemas complementarios bien conocidos, de manera que pueda ser gestionado por medio de recursos informáticos modernos. Concretamente se abordan tres de los principales problemas del Análisis de la Estructura de Documentos siguiendo una formulación probabilística. Específicamente se aborda la Detección de Línea Base (Baseline Detection), la Segmentación de Regiones (Region Segmentation) y la Determinación del Orden de Lectura (Reading Order Determination). Uno de los principales aportes de esta tesis es la formalización de los problemas de Detección de Línea Base y Segmentación de Regiones bajo un marco probabilístico, donde ambos problemas pueden ser abordados por separado o de forma integrada por los modelos propuestos. Este último enfoque ha demostrado ser muy útil para procesar grandes colecciones de documentos con recursos informáticos limitados. Posteriormente se aborda el subproblema de la Determinación del Orden de Lectura, que es uno de los subproblemas más importantes, aunque subestimados, del Análisis de la Extructura de Documentos, ya que es el nexo que permite convertir los datos extraídos de los sistemas de Reconocimiento Automático de Texto (Automatic Text Recognition Systems) en información útil. Por lo tanto, en esta tesis abordamos y formalizamos la Determinación del Orden de Lectura como un problema de clasificación probabilística por pares. Además, se proponen dos diferentes algoritmos de decodificación que reducen la complejidad computacional del problema. Por otra parte, se utilizan diferentes modelos estadísticos para representar la distribución de probabilidad sobre la estructura de los documentos. Estos modelos, basados en Redes Neuronales Artificiales (desde un simple Perceptrón Multicapa hasta complejas Redes Convolucionales y Redes de Propuesta de Regiones), se estiman a partir de datos de entrenamiento utilizando algoritmos de aprendizaje automático supervisados. Finalmente, todas las contribuciones se evalúan experimentalmente, no solo en referencias académicas estándar, sino también en colecciones de miles de imágenes. Se han considerado documentos de texto manuascritos y documentos musicales manuscritos, ya que en conjunto representan la mayoría de los documentos presentes en bibliotecas y archivos. Los resultados muestran que los métodos propuestos son muy precisos y versátiles en una amplia gama de documentos manuscritos.[CA] L'Anàlisi de l'Estructura de Documents (Document Layout Analysis), aplicada a documents manuscrits, pretén automatitzar l'obtenció de l'estructura intrínseca d'un document. El seu desenvolupament com a camp d'investigació comprén des dels sistemes de segmentació de caràcters creats al principi dels anys 60 fins als complexos sistemes de hui dia que busquen analitzar estructures d'alt nivell (línies de text, paràgrafs, taules, etc) i les relacions entre elles. Aquesta tesi busca, primer de tot, definir el propòsit de l'anàlisi de l'estructura de documents des d'una perspectiva probabilística. Llavors, una vegada reduïda la complexitat del problema, es processa utilitzant recursos computacionals moderns, per a dividir-ho en un conjunt de subproblemes complementaris més coneguts. Concretament, tres dels principals subproblemes de l'Anàlisi de l'Estructura de Documents s'adrecen seguint una formulació probabilística: Detecció de la Línia Base Baseline Detection), Segmentació de Regions (Region Segmentation) i Determinació de l'Ordre de Lectura (Reading Order Determination). Una de les principals contribucions d'aquesta tesi és la formalització dels problemes de la Detecció de les Línies Base i dels de Segmentació de Regions en un entorn probabilístic, sent els dos problemes tractats per separat o integrats en conjunt pels models proposats. Aquesta última aproximació ha demostrat ser de molta utilitat per a la gestió de grans col·leccions de documents amb uns recursos computacionals limitats. Posteriorment s'ha adreçat el subproblema de la Determinació de l'Ordre de Lectura, sent un dels subproblemes més importants de l'Anàlisi d'Estructures de Documents, encara així subestimat, perquè és el nexe que permet transformar en informació d'utilitat l'extracció de dades dels sistemes de reconeixement automàtic de text. És per això que el fet de determinar l'ordre de lectura s'adreça i formalitza com un problema d'ordenació probabilística per parells. A més, es proposen dos algoritmes descodificadors diferents que reducix la complexitat computacional del problema. Per altra banda s'utilitzen diferents models estadístics per representar la distribució probabilística sobre l'estructura dels documents. Aquests models, basats en xarxes neuronals artificials (des d'un simple perceptron multicapa fins a complexes xarxes convolucionals i de propostes de regió), s'estimen a partir de dades d'entrenament mitjançant algoritmes d'aprenentatge automàtic supervisats. Finalment, totes les contribucions s'avaluen experimentalment, no només en referents acadèmics estàndard, sinó també en col·leccions de milers d'imatges. S'han considerat documents de text manuscrit i documents musicals manuscrits, ja que representen la majoria de documents presents a biblioteques i arxius. Els resultats mostren que els mètodes proposats són molt precisos i versàtils en una àmplia gamma de documents manuscrits.[EN] Document Layout Analysis, applied to handwritten documents, aims to automatically obtain the intrinsic structure of a document. Its development as a research field spans from the character segmentation systems developed in the early 1960s to the complex systems designed nowadays, where the goal is to analyze high-level structures (lines of text, paragraphs, tables, etc) and the relationship between them. This thesis first defines the goal of Document Layout Analysis from a probabilistic perspective. Then, the complexity of the problem is reduced, to be handled by modern computing resources, into a set of well-known complementary subproblems. More precisely, three of the main subproblems of Document Layout Analysis are addressed following a probabilistic formulation, namely Baseline Detection, Region Segmentation and Reading Order Determination. One of the main contributions of this thesis is the formalization of Baseline Detection and Region Segmentation problems under a probabilistic framework, where both problems can be handled separately or in an integrated way by the proposed models. The latter approach is proven to be very useful to handle large document collections under restricted computing resources. Later, the Reading Order Determination subproblem is addressed. It is one of the most important, yet underestimated, subproblem of Document Layout Analysis, since it is the bridge that allows us to convert the data extracted from Automatic Text Recognition systems into useful information. Therefore, Reading Order Determination is addressed and formalized as a pairwise probabilistic sorting problem. Moreover, we propose two different decoding algorithms that reduce the computational complexity of the problem. Furthermore, different statistical models are used to represent the probability distribution over the structure of the documents. These models, based on Artificial Neural Networks (from a simple Multilayer Perceptron to complex Convolutional and Region Proposal Networks), are estimated from training data using supervised Machine Learning algorithms. Finally, all the contributions are experimentally evaluated, not only on standard academic benchmarks but also in collections of thousands of images. We consider handwritten text documents and handwritten musical documents as they represent the majority of documents in libraries and archives. The results show that the proposed methods are very accurate and versatile in a very wide range of handwritten documents.Quirós Díaz, L. (2022). Layout Analysis for Handwritten Documents. A Probabilistic Machine Learning Approach [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/18148

    Coordinating knowledge to improve optical music recognition

    Get PDF
    Optical Music Recognition (OMR) is the process of automatically processing and understanding an image of a music score. This process involves various distinct phases to transform the image into primitive shapes, musical objects, and ultimately into a syntactic model representing the music's semantics. In general, OMR systems have performed these tasks in a linear sequence, so that the output of one component is the input to the next. However, this means that processing errors that occur in one of the tasks propagate through the system, and often when the error is eventually detected it is too late to reconsider the decisions leading to the incorrect classification or information. This thesis describes how OMR can be improved by modifying the recognition process from a sequence of linear tasks to a collection of modules that coordinate the information extracted from the data. Methods for_ data representation and controlling the system's flow of execution are investigated, and a practical implementation of such a system is described. This system has a message-passing design for providing contextual information from one module to another, such as suggesting possible classifications for an object. These messages are used to aid decision-making and to correct faulty decisions. This helps the system to adapt to a particular score while processing the image, increasing accuracy. This system is designed to aid in the research and evaluation of algorithms to achieve the above aims; therefore it is straightforward to modify various aspects of the system's behaviour, such as adding support for different music symbols. Examining the implemented system's behaviour clearly shows that this coordinated approach can correct many errors and can even identify some objects by only using syntactic information, based on the surrounding objects

    트랜스포머를 통한 복잡한 추론 능력 정복을 위한 연구: 시각적, 대화적, 수학적 추론에의 적용

    Get PDF
    학위논문 (박사) -- 서울대학교 대학원 : 공과대학 산업공학과, 2021. 2. 조성준.As deep learning models advanced, research is focusing on sophisticated tasks that require complex reasoning, rather than simple classification tasks. These complex tasks require multiple reasoning steps that resembles human intelligence. Architecture-wise, recurrent neural networks and convolutional neural networks have long been the main stream model for deep learning. However, both models suffer from shortcomings from their innate architecture. Nowadays, the attention-based Transformer is replacing them due to its superior architecture and performance. Particularly, the encoder of the Transformer has been extensively studied in the field of natural language processing. However, for the Transformer to be effective in data with distinct structures and characteristics, appropriate adjustments to its structure is required. In this dissertation, we propose novel architectures based on the Transformer encoder for various supervised learning tasks with different data types and characteristics. The tasks that we consider are visual IQ tests, dialogue state tracking and mathematical question answering. For the visual IQ test, the input is in a visual format with hierarchy. To deal with this, we propose using a hierarchical Transformer encoder with structured representation that employs a novel neural network architecture to improve both perception and reasoning. The hierarchical structure of the Transformer encoders and the architecture of each individual Transformer encoder all fit to the characteristics of the data of visual IQ tests. For dialogue state tracking, value prediction for multiple domain-slot pairs is required. To address this issue, we propose a dialogue state tracking model using a pre-trained language model, which is a pre-trained Transformer encoder, for domain-slot relationship modeling. We introduced special tokens for each domain-slot pair which enables effective dependency modeling among domain-slot pairs through the pre-trained language encoder. Finally, for mathematical question answering, we propose a method to pre-train a Transformer encoder on a mathematical question answering dataset for improved performance. Our pre-training method, Question-Answer Masked Language Modeling, utilizes both the question and answer text, which is suitable for the mathematical question answering dataset. Through experiments, we show that each of our proposed methods is effective in their corresponding task and data type.순환 신경망과 합성곱 신경망은 오랫동안 딥러닝 분야에서 주요 모델로 쓰여왔다. 하지만 두 모델 모두 자체적인 구조에서 오는 한계를 가진다. 최근에는 어텐션(attention)에 기반한 트랜스포머(Transformer)가 더 나은 성능과 구조로 인해서 이들을 대체해 나가고 있다. 트랜스포머 인코더(Transformer encoder)는 자연어 처리 분야에서 특별히 더 많은 연구가 이루어지고 있다. 하지만 Transformer가 특별한 구조와 특징을 가진 데이터에 대해서도 제대로 작동하기 위해서는 그 구조에 적절한 변화가 요구된다. 본 논문에서는 다양한 데이터 종류와 특성에 대한 교사 학습에 적용할 수 있는 트랜스포머 인코더에 기반한 새로운 구조의 모델들을 제안한다. 이번 연구에서 다루는 과업은 시각 IQ 테스트, 대화 상태 트래킹 그리고 수학 질의 응답이다. 시각 IQ 테스트의 입력 변수는 위계를 가진 시각적인 형태이다. 이에 대응하기 위해서 우리는 인지와 사고 측면에서 성능을 향상 시킬 수 있는 새로운 뉴럴 네트워크 구조인, 구조화된 표현형을 처리할 수 있는 계층적인 트랜스포머 인코더 모델을 제안한다. 트랜스 포머 인코더의 계층적 구조와 각각의 트랜스포머 인코더의 구조 모두가 시각 IQ 테스트 데이터의 특징에 적합하다. 대화 상태 트래킹은 여러 개의 도메인-슬롯(domain-slot)쌍에 대한 값(value)이 요구된다. 이를 해결하기 위해서 우리는 사전 학습된 트랜스포머 인코더인, 사전 학습 언어 모델을 활용하여 도메인-슬롯의 관계를 모델링하는 것을 제안한다. 각 도메인-슬롯 쌍에 대한 특수 토큰을 도입함으로써 효과적으로 도메인-슬롯 쌍들 간의 관계를 모델링 할 수 있다. 마지막으로, 수학 질의 응답을 위해서는 수학 질의 응답 데이터에 대해서 사전 학습을 진행함으로써 수학 질의 응답 과업에 대해서 성능을 높히는 방법을 제안한다. 우리의 사전 학습 방법인 질의-응답 마스킹 언어 모델링은 질의와 응답 텍스트 모두를 활용 함으로써 수학 질의 응답 데이터에 적합한 형태이다. 실험을 통해서 각각의 제안된 방법론들이 해당하는 과업과 데이터 종류에 대해서 효과적인 것을 밝혔다.Abstract i Contents vi List of Tables viii List of Figures xii Chapter 1 Introduction 1 Chapter 2 Literature Review 7 2.1 Related Works on Transformer . . . . . . . . . . . . . . . . . . . . . 7 2.2 Related Works on Visual IQ Tests . . . . . . . . . . . . . . . . . . . 10 2.2.1 RPM-related studies . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Object Detection related studies . . . . . . . . . . . . . . . . 11 2.3 Related works on Dialogue State Tracking . . . . . . . . . . . . . . . 12 2.4 Related Works on Mathematical Question Answering . . . . . . . . . 14 2.4.1 Pre-training of Neural Networks . . . . . . . . . . . . . . . . 14 2.4.2 Language Model Pre-training . . . . . . . . . . . . . . . . . . 15 2.4.3 Mathematical Reasoning with Neural Networks . . . . . . . . 17 Chapter 3 Hierarchical end-to-end architecture of Transformer encoders for solving visual IQ tests 19 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 Perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2.1 Perception Module: Object Detection Model . . . . . . . . . 24 3.2.2 Reasoning Module: Hierarchical Transformer Encoder . . . . 26 3.2.3 Contrasting Module and Loss function . . . . . . . . . . . . . 29 3.3 Experimental results . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 34 3.3.3 Results for Perception Module . . . . . . . . . . . . . . . . . 35 3.3.4 Results for Reasoning Module . . . . . . . . . . . . . . . . . . 36 3.3.5 Ablation studies . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Chapter 4 Domain-slot relationship modeling using Transformers for dialogue state tracking 40 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.2.1 Domain-Slot-Context Encoder . . . . . . . . . . . . . . . . . 44 4.2.2 Slot-gate classifier . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2.3 Slot-value classifier . . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.4 Total objective function . . . . . . . . . . . . . . . . . . . . . 50 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 51 4.3.3 Results for the MultiWOZ-2.1 dataset . . . . . . . . . . . . . 52 4.3.4 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Chapter 5 Pre-training of Transformers with Question-Answer Masked Language Modeling for Mathematical Question Answering 62 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.2 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.2.1 Pre-training: Question-Answer Masked Language Modeling . 65 5.2.2 Fine-tuning: Mathematical Question Answering . . . . . . . . 67 5.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 5.3.2 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . 70 5.3.3 Experimental Results on the Mathematics dataset . . . . . . 71 5.4 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Chapter 6 Conclusion 79 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Bibliography 83 국문초록 101 감사의 글 103Docto
    corecore