204 research outputs found

    Catch-Up Distillation: You Only Need to Train Once for Accelerating Sampling

    Full text link
    Diffusion Probability Models (DPMs) have made impressive advancements in various machine learning domains. However, achieving high-quality synthetic samples typically involves performing a large number of sampling steps, which impedes the possibility of real-time sample synthesis. Traditional accelerated sampling algorithms via knowledge distillation rely on pre-trained model weights and discrete time step scenarios, necessitating additional training sessions to achieve their goals. To address these issues, we propose the Catch-Up Distillation (CUD), which encourages the current moment output of the velocity estimation model ``catch up'' with its previous moment output. Specifically, CUD adjusts the original Ordinary Differential Equation (ODE) training objective to align the current moment output with both the ground truth label and the previous moment output, utilizing Runge-Kutta-based multi-step alignment distillation for precise ODE estimation while preventing asynchronous updates. Furthermore, we investigate the design space for CUDs under continuous time-step scenarios and analyze how to determine the suitable strategies. To demonstrate CUD's effectiveness, we conduct thorough ablation and comparison experiments on CIFAR-10, MNIST, and ImageNet-64. On CIFAR-10, we obtain a FID of 2.80 by sampling in 15 steps under one-session training and the new state-of-the-art FID of 3.37 by sampling in one step with additional training. This latter result necessitated only 620k iterations with a batch size of 128, in contrast to Consistency Distillation, which demanded 2100k iterations with a larger batch size of 256. Our code is released at https://anonymous.4open.science/r/Catch-Up-Distillation-E31F

    Narrative Information Extraction with Non-Linear Natural Language Processing Pipelines

    Get PDF
    Computational narrative focuses on methods to algorithmically analyze, model, and generate narratives. Most current work in story generation, drama management or even literature analysis relies on manually authoring domain knowledge in some specific formal representation language, which is expensive to generate. In this dissertation we explore how to automatically extract narrative information from unannotated natural language text, how to evaluate the extraction process, how to improve the extraction process, and how to use the extracted information in story generation applications. As our application domain, we use Vladimir Propp's narrative theory and the corresponding Russian and Slavic folktales as our corpus. Our hypothesis is that incorporating narrative-level domain knowledge (i.e., Proppian theory) to core natural language processing (NLP) and information extraction can improve the performance of tasks (such as coreference resolution), and the extracted narrative information. We devised a non-linear information extraction pipeline framework which we implemented in Voz, our narrative information extraction system. Finally, we studied how to map the output of Voz to an intermediate computational narrative model and use it as input for an existing story generation system, thus further connecting existing work in NLP and computational narrative. As far as we know, it is the first end-to-end computational narrative system that can automatically process a corpus of unannotated natural language stories, extract explicit domain knowledge from them, and use it to generate new stories. Our user study results show that specific error introduced during the information extraction process can be mitigated downstream and have virtually no effect on the perceived quality of the generated stories compared to generating stories using handcrafted domain knowledge.Ph.D., Computer Science -- Drexel University, 201

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    Efficient Sampling and Counting of Graph Structures related to Chordal Graphs

    Get PDF
    Counting problems aim to count the number of solutions for a given input, for example, counting the number of variable assignments that satisfy a Boolean formula. Sampling problems aim to produce a random object from a desired distribution, for example, producing a variable assignment drawn uniformly at random from all assignments that satisfy a Boolean formula. The problems of counting and sampling of graph structures on different types of graphs have been studied for decades for their great importance in areas like complexity theory and statistical physics. For many graph structures such as independent sets and acyclic orientations, it is widely believed that no exact or approximate (with an arbitrarily small error) polynomial-time algorithms on general graphs exist. Therefore, the research community studies various types of graphs, aiming either to design a polynomial-time counting or sampling algorithm for such inputs, or to prove a corresponding inapproximability result. Chordal graphs have been studied widely in both AI and theoretical computer science, but their study from the counting perspective has been relatively limited. Previous works showed that some graph structures can be counted in polynomial time on chordal graphs, when their counting on general graphs is provably computationally hard. The main objective of this thesis is to design and analyze counting and sampling algorithms for several well-known graph structures, including independent sets and different types of graph orientations, on chordal graphs. Our contributions can be described from two perspectives: evaluating the performances of some well-known sampling techniques, such as Markov chain Monte Carlo, on chordal graphs; and showing that the chordality does make those counting problems polynomial-time solvable

    An empirical analysis of terminological representation systems

    Get PDF
    The family of terminological representation systems has its roots in the representation system KL-ONE. Since the development of this system more than a dozen similar representation systems have been developed by various research groups. These systems vary along a number of dimensions.In this paper, we present the results of an empirical analysis of six such systems. Surprisingly, the systems turned out to be quite diverse leading to problems when transporting knowledge bases from one system to another. Additionally, the runtime performance between different systems and knowledge bases varied more than we expected. Finally, our empirical runtime performance results give an idea of what runtime performance to expect from such representation systems. These findings complement previously reported analytical results about the computational complexity of reasoning in such systems

    Tools and Algorithms for the Construction and Analysis of Systems

    Get PDF
    This open access book constitutes the proceedings of the 28th International Conference on Tools and Algorithms for the Construction and Analysis of Systems, TACAS 2022, which was held during April 2-7, 2022, in Munich, Germany, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2022. The 46 full papers and 4 short papers presented in this volume were carefully reviewed and selected from 159 submissions. The proceedings also contain 16 tool papers of the affiliated competition SV-Comp and 1 paper consisting of the competition report. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, exibility, and efficiency of tools and algorithms for building computer-controlled systems

    Dokumentverifikation mit Temporaler Beschreibungslogik

    Get PDF
    The thesis proposes a new formal framework for checking the content of web documents along individual reading paths. It is vital for the readability of web documents that their content is consistent and coherent along the possible browsing paths through the document. Manually ensuring the coherence of content along the possibly huge number of different browsing paths in a web document is time-consuming and error-prone. Existing methods for document validation and verification are not sufficiently expressive and efficient. The innovative core idea of this thesis is to combine the temporal logic CTL and description logic ALC for the representation of consistency criteria. The resulting new temporal description logics ALCCTL can - in contrast to existing specification formalisms - compactly represent coherence criteria on documents. Verification of web documents is modelled as a model checking problem of ALCCTL. The decidability and polynomial complexity of the ALCCTL model checking problem is proven and a sound, complete, and optimal model checking algorithm is presented. Case studies on real and realistic web documents demonstrate the performance and adequacy of the proposed methods. Existing methods such as symbolic model checking or XML-based document validation are outperformed in both expressiveness and speed.Die Dissertation stellt ein neues formales Framework fรผr die automatische Prรผfung inhaltlich-struktureller Konsistenzkriterien an Web-Dokumente vor. Viele Informationen werden heute in Form von Web-Dokumenten zugรคnglich gemacht. Komplexe Dokumente wie Lerndokumente oder technische Dokumentationen mรผssen dabei vielfรคltige Qualitรคtskriterien erfรผllen. Der Informationsgehalt des Dokuments muss aktuell, vollstรคndig und in sich stimmig sein. Die Prรคsentationsstruktur muss unterschiedlichen Zielgruppen mit unterschiedlichen Informationsbedรผrfnissen genรผgen. Die Sicherstellung grundlegender Konsistenzeigenschaften von Dokumenten ist angesichts der Vielzahl der Anforderungen und Nutzungskontexte eines elektronischen Dokuments nicht trivial. In dieser Arbeit werden aus der Hard-/Softwareverifikation bekannte Model-Checking-Verfahren mit Methoden zur Reprรคsentation von Ontologien kombiniert, um sowohl die Struktur des Dokuments als auch inhaltliche Zusammenhรคnge bei der Prรผfung von Konsistenzkriterien berรผcksichtigen zu kรถnnen. Als Spezifikationssprache fรผr Konsistenzkriterien wird die neue temporale Beschreibungslogik ALCCTL vorgeschlagen. Grundlegende Eigenschaften wie Entscheidbarkeit, Ausdruckskraft und Komplexitรคt werden untersucht. Die Adรคquatheit und Praxistauglichkeit des Ansatzes werden in Fallstudien mit eLearning-Dokumenten evaluiert. Die Ergebnisse รผbertreffen bekannte Ansรคtze wie symbolisches Model-Checking oder Methoden zur Validierung von XML-Dokumenten in Performanz, Ausdruckskraft hinsichtlich der prรผfbaren Kriterien und Flexibilitรคt hinsichtlich des Dokumenttyps und -formats

    ๋”ฅ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜์˜ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์ด์šฉํ•œ ๋ฌธ์žฅ ๊ฐ„ ๊ด€๊ณ„ ๋ชจ๋ธ๋ง

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์ด์ƒ๊ตฌ.๋ฌธ์žฅ ๋งค์นญ์ด๋ž€ ๋‘ ๋ฌธ์žฅ ๊ฐ„ ์˜๋ฏธ์ ์œผ๋กœ ์ผ์น˜ํ•˜๋Š” ์ •๋„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ์ด๋‹ค. ์–ด๋–ค ๋ชจ๋ธ์ด ๋‘ ๋ฌธ์žฅ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ๋ฐํ˜€๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋†’์€ ์ˆ˜์ค€์˜ ์ž์—ฐ์–ด ํ…์ŠคํŠธ ์ดํ•ด ๋Šฅ๋ ฅ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ๋ฌธ์žฅ ๋งค์นญ์€ ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ์„ฑ๋Šฅ์— ์ค‘์š”ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์นœ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”, ๋งค์นญ ํ•จ์ˆ˜, ์ค€์ง€๋„ ํ•™์Šต์ด๋ผ๋Š” ์„ธ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ๋ฌธ์žฅ ๋งค์นญ์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ๋ชจ์ƒ‰ํ•œ๋‹ค. ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ž€ ๋ฌธ์žฅ์œผ๋กœ๋ถ€ํ„ฐ ์œ ์šฉํ•œ ํŠน์งˆ๋“ค์„ ์ถ”์ถœํ•˜๋Š” ์—ญํ• ์„ ํ•˜๋Š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฌธ์žฅ ์ธ์ฝ”๋”์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์œ„ํ•˜์—ฌ Gumbel Tree-LSTM๊ณผ Cell-aware Stacked LSTM์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ์ƒˆ๋กœ์šด ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Gumbel Tree-LSTM์€ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ(recursive neural network) ๊ตฌ์กฐ์— ๊ธฐ๋ฐ˜ํ•œ ์•„ํ‚คํ…์ฒ˜์ด๋‹ค. ๊ตฌ์กฐ ์ •๋ณด๊ฐ€ ํฌํ•จ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜๋˜ ๊ธฐ์กด์˜ ์žฌ๊ท€์  ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๋ชจ๋ธ๊ณผ ๋‹ฌ๋ฆฌ, Gumbel Tree-LSTM์€ ๊ตฌ์กฐ๊ฐ€ ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ํŠน์ • ๋ฌธ์ œ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ํ™”ํ•˜๋Š” ํŒŒ์‹ฑ ์ „๋žต์„ ํ•™์Šตํ•œ๋‹ค. Cell-aware Stacked LSTM์€ LSTM ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•œ ์•„ํ‚คํ…์ฒ˜๋กœ, ์—ฌ๋Ÿฌ LSTM ๋ ˆ์ด์–ด๋ฅผ ์ค‘์ฒฉํ•˜์—ฌ ์‚ฌ์šฉํ•  ๋•Œ ๋ง๊ฐ ๊ฒŒ์ดํŠธ(forget gate)๋ฅผ ์ถ”๊ฐ€์ ์œผ๋กœ ๋„์ž…ํ•˜์—ฌ ์ˆ˜์ง ๋ฐฉํ–ฅ์˜ ์ •๋ณด ํ๋ฆ„์„ ๋” ํšจ์œจ์ ์œผ๋กœ ์ œ์–ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํ•œํŽธ, ์ƒˆ๋กœ์šด ๋งค์นญ ํ•จ์ˆ˜๋กœ์„œ ์šฐ๋ฆฌ๋Š” ์š”์†Œ๋ณ„ ์Œ์„ ํ˜• ๋ฌธ์žฅ ๋งค์นญ(element-wise bilinear sentence matching, ElBiS) ํ•จ์ˆ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ElBiS ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํŠน์ • ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ์— ์ ํ•ฉํ•œ ๋ฐฉ์‹์œผ๋กœ ๋‘ ๋ฌธ์žฅ ํ‘œํ˜„์„ ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๋กœ ํ•ฉ์น˜๋Š” ๋ฐฉ๋ฒ•์„ ์ž๋™์œผ๋กœ ์ฐพ๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•œ๋‹ค. ๋ฌธ์žฅ ํ‘œํ˜„์„ ์–ป์„ ๋•Œ์— ์„œ๋กœ ๊ฐ™์€ ๋ฌธ์žฅ ์ธ์ฝ”๋”๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค๋Š” ์‚ฌ์‹ค๋กœ๋ถ€ํ„ฐ ์šฐ๋ฆฌ๋Š” ๋ฒกํ„ฐ์˜ ๊ฐ ์š”์†Œ ๊ฐ„ ์Œ์„ ํ˜•(bilinear) ์ƒํ˜ธ ์ž‘์šฉ๋งŒ์„ ๊ณ ๋ คํ•˜์—ฌ๋„ ๋‘ ๋ฌธ์žฅ ๋ฒกํ„ฐ ๊ฐ„ ๋น„๊ต๋ฅผ ์ถฉ๋ถ„ํžˆ ์ž˜ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€์„ค์„ ์ˆ˜๋ฆฝํ•˜๊ณ  ์ด๋ฅผ ์‹คํ—˜์ ์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค. ์ƒํ˜ธ ์ž‘์šฉ์˜ ๋ฒ”์œ„๋ฅผ ์ œํ•œํ•จ์œผ๋กœ์จ, ์ž๋™์œผ๋กœ ์œ ์šฉํ•œ ๋ณ‘ํ•ฉ ๋ฐฉ๋ฒ•์„ ์ฐพ๋Š”๋‹ค๋Š” ์ด์ ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ชจ๋“  ์ƒํ˜ธ ์ž‘์šฉ์„ ๊ณ ๋ คํ•˜๋Š” ์Œ์„ ํ˜• ํ’€๋ง ๋ฐฉ๋ฒ•์— ๋น„ํ•ด ํ•„์š”ํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜๋ฅผ ํฌ๊ฒŒ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์Šต ์‹œ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต์„ ์œ„ํ•ด ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ๋ฌธ์žฅ ์ž ์žฌ ๋ณ€์ˆ˜ ๋ชจ๋ธ(cross-sentence latent variable model, CS-LVM)์„ ์ œ์•ˆํ•œ๋‹ค. CS-LVM์˜ ์ƒ์„ฑ ๋ชจ๋ธ์€ ์ถœ์ฒ˜ ๋ฌธ์žฅ(source sentence)์˜ ์ž ์žฌ ํ‘œํ˜„ ๋ฐ ์ถœ์ฒ˜ ๋ฌธ์žฅ๊ณผ ๋ชฉํ‘œ ๋ฌธ์žฅ(target sentence) ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๋ณ€์ˆ˜๋กœ๋ถ€ํ„ฐ ๋ชฉํ‘œ ๋ฌธ์žฅ์ด ์ƒ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. CS-LVM์—์„œ๋Š” ๋‘ ๋ฌธ์žฅ์ด ํ•˜๋‚˜์˜ ๋ชจ๋ธ ์•ˆ์—์„œ ๋ชจ๋‘ ๊ณ ๋ ค๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•™์Šต์— ์‚ฌ์šฉ๋˜๋Š” ๋ชฉ์  ํ•จ์ˆ˜๊ฐ€ ๋” ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ •์˜๋œ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ๋Š” ์ƒ์„ฑ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋” ์˜๋ฏธ์ ์œผ๋กœ ์ ํ•ฉํ•œ ๋ฌธ์žฅ์„ ์ƒ์„ฑํ•˜๋„๋ก ์œ ๋„ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์ผ๋ จ์˜ ์˜๋ฏธ ์ œ์•ฝ๋“ค์„ ์ •์˜ํ•œ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ ์ œ์•ˆ๋œ ๊ฐœ์„  ๋ฐฉ์•ˆ๋“ค์€ ๋ฌธ์žฅ ๋งค์นญ ๊ณผ์ •์„ ํฌํ•จํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์‘์šฉ์˜ ํšจ์šฉ์„ฑ์„ ๋†’์ผ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.Sentence matching is a task of predicting the degree of match between two sentences. Since high level of understanding natural language text is needed for a model to identify the relationship between two sentences, it is an important component for various natural language processing applications. In this dissertation, we seek for the improvement of the sentence matching module from the following three ingredients: sentence encoder, matching function, and semi-supervised learning. To enhance a sentence encoder network which takes responsibility of extracting useful features from a sentence, we propose two new sentence encoder architectures: Gumbel Tree-LSTM and Cell-aware Stacked LSTM (CAS-LSTM). Gumbel Tree-LSTM is based on a recursive neural network (RvNN) architecture, however unlike typical RvNN architectures it does not need a structured input. Instead, it learns from data a parsing strategy that is optimized for a specific task. The latter, CAS-LSTM, extends the stacked long short-term memory (LSTM) architecture by introducing an additional forget gate for better handling of vertical information flow. And then, as a new matching function, we present the element-wise bilinear sentence matching (ElBiS) function. It aims to automatically find an aggregation scheme that fuses two sentence representations into a single one suitable for a specific task. From the fact that a sentence encoder is shared across inputs, we hypothesize and empirically prove that considering only the element-wise bilinear interaction is sufficient for comparing two sentence vectors. By restricting the interaction, we can largely reduce the number of required parameters compared with full bilinear pooling methods without losing the advantage of automatically discovering useful aggregation schemes. Finally, to facilitate semi-supervised training, i.e. to make use of both labeled and unlabeled data in training, we propose the cross-sentence latent variable model (CS-LVM). Its generative model assumes that a target sentence is generated from the latent representation of a source sentence and the variable indicating the relationship between the source and the target sentence. As it considers the two sentences in a pair together in a single model, the training objectives are defined more naturally than prior approaches based on the variational auto-encoder (VAE). We also define semantic constraints that force the generator to generate semantically more plausible sentences. We believe that the improvements proposed in this dissertation would advance the effectiveness of various natural language processing applications containing modeling sentence pairs.Chapter 1 Introduction 1 1.1 Sentence Matching 1 1.2 Deep Neural Networks for Sentence Matching 2 1.3 Scope of the Dissertation 4 Chapter 2 Background and Related Work 9 2.1 Sentence Encoders 9 2.2 Matching Functions 11 2.3 Semi-Supervised Training 13 Chapter 3 Sentence Encoder: Gumbel Tree-LSTM 15 3.1 Motivation 15 3.2 Preliminaries 16 3.2.1 Recursive Neural Networks 16 3.2.2 Training RvNNs without Tree Information 17 3.3 Model Description 19 3.3.1 Tree-LSTM 19 3.3.2 Gumbel-Softmax 20 3.3.3 Gumbel Tree-LSTM 22 3.4 Implementation Details 25 3.5 Experiments 27 3.5.1 Natural Language Inference 27 3.5.2 Sentiment Analysis 32 3.5.3 Qualitative Analysis 33 3.6 Summary 36 Chapter 4 Sentence Encoder: Cell-aware Stacked LSTM 38 4.1 Motivation 38 4.2 Related Work 40 4.3 Model Description 43 4.3.1 Stacked LSTMs 43 4.3.2 Cell-aware Stacked LSTMs 44 4.3.3 Sentence Encoders 46 4.4 Experiments 47 4.4.1 Natural Language Inference 47 4.4.2 Paraphrase Identification 50 4.4.3 Sentiment Classification 52 4.4.4 Machine Translation 53 4.4.5 Forget Gate Analysis 55 4.4.6 Model Variations 56 4.5 Summary 59 Chapter 5 Matching Function: Element-wise Bilinear Sentence Matching 60 5.1 Motivation 60 5.2 Proposed Method: ElBiS 61 5.3 Experiments 63 5.3.1 Natural language inference 64 5.3.2 Paraphrase Identification 66 5.4 Summary and Discussion 68 Chapter 6 Semi-Supervised Training: Cross-Sentence Latent Variable Model 70 6.1 Motivation 70 6.2 Preliminaries 71 6.2.1 Variational Auto-Encoders 71 6.2.2 von Misesโ€“Fisher Distribution 73 6.3 Proposed Framework: CS-LVM 74 6.3.1 Cross-Sentence Latent Variable Model 75 6.3.2 Architecture 78 6.3.3 Optimization 79 6.4 Experiments 84 6.4.1 Natural Language Inference 84 6.4.2 Paraphrase Identification 85 6.4.3 Ablation Study 86 6.4.4 Generated Sentences 88 6.4.5 Implementation Details 89 6.5 Summary and Discussion 90 Chapter 7 Conclusion 92 Appendix A Appendix 96 A.1 Sentences Generated from CS-LVM 96Docto
    • โ€ฆ
    corecore