Search CORE

16 research outputs found

Parsing Inside-Out

Author: Goodman Joshua
Publication venue
Publication date: 01/01/1998
Field of study

The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including improving parser accuracy by matching parsing algorithms to evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster PCFG thresholding at a given accuracy level. I also give an elegant, state-of-the-art grammar formalism, which can be used to compute inside-outside probabilities; and a parser description formalism, which makes it easy to derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure

arXiv.org e-Print Archive

CiteSeerX

30th International Symposium on Theoretical Aspects of Computer Science: STACS '13, February 27th to March 2nd, 2013, Kiel, Germany

Author: STACS <30 2013, Kiel>
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/02/2013
Field of study

Digitale Bibliothek Thüringen

Parsing Schemata

Author: Sikkel Nicolaas
Publication venue: University of Twente
Publication date: 01/01/1993
Field of study

Parsing schemata provide a general framework for specication, analysis and comparison of (sequential and/or parallel) parsing algorithms. A grammar specifies implicitly what the valid parses of a sentence are; a parsing algorithm specifies explicitly how to compute these. Parsing schemata form a well-defined level of abstraction in between grammars and parsing algorithms. A parsing schema specifies the types of intermediate results that can be computed by a parser, and the rules that allow to expand a given set of such results with new results. A parsing schema does not specify the data structures, control structures, and (in case of parallel processing)\ud communication structures that are to be used by a parser.\ud Part I, Exposition, gives a general introduction to the ideas that are worked out in the following parts.\ud Part II, Foundation, unfolds a mathematical theory of parsing schemata. Different kinds of relations between parsing schemata are formally introduced and illustrated with examples drawn from the parsing literature.\ud Part III, Application, discusses a series of applications of parsing schemata.\ud - Feature percolation in unification grammar parsing can be described in an elegant, legible notation.\ud - Because of the absence of algorithmic detail, parsing schemata can be used to get a formal grip on highly complicated algorithms. We give substance to this claim by means of a thorough analysis of Left-Corner and Head-Corner chart parsing.\ud - As an example of structural similarity of parsers, despite differences in form and appearance, we show that the underlying parsing schemata of Earley's algorithm and Tomita's algorithm are virtually identical. Using this structural correspondence we can obtain a novel parallel parser by cross-fertilizing a parallel Earley parser with Tomita's graph-structured stack.\ud - Parsing schemata can be implemented straightforwardly by boolean circuits. This means that, in principle, parsing schemata can be coded directly into hardware.\ud Part IV, Perspective, discusses the prospects for natural language parsing applications and draws some conclusions. An important observation is that the theoretical and practical part of the book reinforce each other. The proposed framework is abstract enough to allow a thorough mathematical treatment and practical enough to allow rewriting a variety of real parsing algorithms (i.e. seriously proposed in the literature, not toy examples)\ud in a clear and coherent way

CiteSeerX

University of Twente Research Information

Modelling Incremental Self-Repair Processing in Dialogue.

Author: Hough Julian
Publication venue: 'Queen Mary University of London'
Publication date: 01/11/2014
Field of study

PhDSelf-repairs, where speakers repeat themselves, reformulate or restart what they are saying, are pervasive in human dialogue. These phenomena provide a window into real-time human language processing. For explanatory adequacy, a model of dialogue must include mechanisms that account for them. Artificial dialogue agents also need this capability for more natural interaction with human users. This thesis investigates the structure of self-repair and its function in the incremental construction of meaning in interaction. A corpus study shows how the range of self-repairs seen in dialogue cannot be accounted for by looking at surface form alone. More particularly it analyses a string-alignment approach and shows how it is insufficient, provides requirements for a suitable model of incremental context and an ontology of self-repair function. An information-theoretic model is developed which addresses these issues along with a system that automatically detects self-repairs and edit terms on transcripts incrementally with minimal latency, achieving state-of-the-art results. Additionally it is shown to have practical use in the psychiatric domain. The thesis goes on to present a dialogue model to interpret and generate repaired utterances incrementally. When processing repaired rather than fluent utterances, it achieves the same degree of incremental interpretation and incremental representation. Practical implementation methods are presented for an existing dialogue system. Finally, a more pragmatically oriented approach is presented to model self-repairs in a psycholinguistically plausible way. This is achieved through extending the dialogue model to include a probabilistic semantic framework to perform incremental inference in a reference resolution domain. The thesis concludes that at least as fine-grained a model of context as word-by-word is required for realistic models of self-repair, and context must include linguistic action sequences and information update effects. The way dialogue participants process self-repairs to make inferences in real time, rather than filter out their disfluency effects, has been modelled formally and in practical systems.Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Account (DTA) scholarship from the School of Electronic Engineering and Computer Science at Queen Mary University of London

Queen Mary Research Online

29th International Symposium on Theoretical Aspects of Computer Science: STACS '12, February 29th to March 3rd, 2012, Paris, France

Author: Schloss Dagstuhl Leibniz-Zentrum für Informatik
Publication venue: Schloss Dagstuhl - Leibniz-Center for Informatics
Publication date: 01/02/2012
Field of study

Digitale Bibliothek Thüringen

LIPIcs, Volume 261, ICALP 2023, Complete Volume

Author: Etessami Kousha
Feige Uriel
Puppis Gabriele
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 50th International Colloquium on Automata, Languages, and Programming (ICALP 2023)
Publication date: 01/01/2023
Field of study

LIPIcs, Volume 261, ICALP 2023, Complete Volum

Dagstuhl Research Online Publication Server

Foundations of Software Science and Computation Structures

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

This open access book constitutes the proceedings of the 24th International Conference on Foundations of Software Science and Computational Structures, FOSSACS 2021, which was held during March 27 until April 1, 2021, as part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2021. The conference was planned to take place in Luxembourg and changed to an online format due to the COVID-19 pandemic. The 28 regular papers presented in this volume were carefully reviewed and selected from 88 submissions. They deal with research on theories and methods to support the analysis, integration, synthesis, transformation, and verification of programs and software systems

OAPEN Library

Automated Deduction – CADE 28

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 14/07/2021
Field of study

This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions

Directory of Open Access Books (DOAB)

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Author: Chen Meiqi
Chen Sirui
Chen Yuxi
Deng Jingyi
Fan Hongxing
Fu Jinlan
Gao Hongzhi
Gui Tao
He Yinan
Huang Kexin
Li Kunchang
Li Lijun
Lu Chaochao
Mou Yurong
Ouyang Wanli
Qian Chen
Qiao Yu
Ren Qibing
Shao Jing
Shen Yujiong
Sheng Lu
Shi Zhelun
Teng Yan
Wang Limin
Wang Yali
Wang Yaru
Wang Yi
Wang Yingchun
Wang Yixu
Wang Zhipin
Yin Zhenfei
Zhang Jie
Zhang Ming
Zhang Yongting
Zhang Zaibin
Zheng Guodong
Publication venue
Publication date: 29/01/2024
Field of study

Multi-modal Large Language Models (MLLMs) have shown impressive abilities in generating reasonable responses with respect to multi-modal contents. However, there is still a wide gap between the performance of recent MLLM-based applications and the expectation of the broad public, even though the most powerful OpenAI's GPT-4 and Google's Gemini have been deployed. This paper strives to enhance understanding of the gap through the lens of a qualitative study on the generalizability, trustworthiness, and causal reasoning capabilities of recent proprietary and open-source MLLMs across four modalities: ie, text, code, image, and video, ultimately aiming to improve the transparency of MLLMs. We believe these properties are several representative factors that define the reliability of MLLMs, in supporting various downstream applications. To be specific, we evaluate the closed-source GPT-4 and Gemini and 6 open-source LLMs and MLLMs. Overall we evaluate 230 manually designed cases, where the qualitative results are then summarized into 12 scores (ie, 4 modalities times 3 properties). In total, we uncover 14 empirical findings that are useful to understand the capabilities and limitations of both proprietary and open-source MLLMs, towards more reliable downstream multi-modal applications

arXiv.org e-Print Archive