Search CORE

109 research outputs found

Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!

Author: Jayaram Rajesh
Saha Barna
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 44th International Colloquium on Automata, Languages, and Programming (ICALP 2017)
Publication date: 01/01/2017
Field of study

In 1975, a breakthrough result of L. Valiant showed that parsing context free grammars can be reduced to Boolean matrix multiplication, resulting in a running time of O(n^omega) for parsing where omega <= 2.373 is the exponent of fast matrix multiplication, and n is the string length. Recently, Abboud, Backurs and V. Williams (FOCS 2015) demonstrated that this is likely optimal; moreover, a combinatorial o(n^3) algorithm is unlikely to exist for the general parsing problem. The language edit distance problem is a significant generalization of the parsing problem, which computes the minimum edit distance of a given string (using insertions, deletions, and substitutions) to any valid string in the language, and has received significant attention both in theory and practice since the seminal work of Aho and Peterson in 1972. Clearly, the lower bound for parsing rules out any algorithm running in o(n^omega) time that can return a nontrivial multiplicative approximation of the language edit distance problem. Furthermore, combinatorial algorithms with cubic running time or algorithms that use fast matrix multiplication are often not desirable in practice. To break this n^omega hardness barrier, in this paper we study additive approximation algorithms for language edit distance. We provide two explicit combinatorial algorithms to obtain a string with minimum edit distance with performance dependencies on either the number of non-linear productions, k^*, or the number of nested non-linear production, k, used in the optimal derivation. Explicitly, we give an additive O(k^*gamma) approximation in time O(|G|(n^2 + (n/gamma)^3)) and an additive O(k gamma) approximation in time O(|G|(n^2 + (n^3/gamma^2))), where |G| is the grammar size and n is the string length. In particular, we obtain tight approximations for an important subclass of context free grammars known as ultralinear grammars, for which k and k^* are naturally bounded. Interestingly, we show that the same conditional lower bound for parsing context free grammars holds for the class of ultralinear grammars as well, clearly marking the boundary where parsing becomes hard

Dagstuhl Research Online Publication Server

Structural translation with synchronous tree adjoining grammars in VERBMOBIL

Author: Harbusch Karin
Poller Peter
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1996
Field of study

The VERBMOBIL project is developing a translation system that can assist a face-to-face dialogue between two non-native english speakers. Instead of having continiously speak english, the dialogue partners have the option to switch to their respective mother tongues (currently german or japanese) in cases where they can\u27t find the required word, phrase or sentence. In such situations, the users activate VERBMOBIL to translate their utterances into english. A very important requirement for such a system is realtime processing. Realtime processing is essentially necessary, if such a system is to be smoothly integrated into an ongoing communication. This can be achieved by the use of anytime processing, which always provides a result. The quality of the result however, depends on the computation time given to the system. Early interruptions can only produce shallow results. Aiming at such a processing mode, methods for fast but preliminary translation must be integrated into the system assisted by others that refine these results. In this case we suggest structural translation with Synchronous Tree Adjoining Grammars (S-TAGs), which can serve as a fast and shallow realisation of all steps necessary during translation, i.e. analysis, transfer and generation, in a system capable of running anytime methods. This mode is especially adequate for standardized speech acts and simple sentences. Furthermore, it provides a result for early interruptions of the translation process. By building an explicit linguistic structure, methods for refining the result can rearrange the structure in order to increase the quality of the translation given extended execution time. This paper describes the formalism of S-TAGs and the parsing algorithm implemented in VERBMOBIL. Furthermore the language covered by the german grammar is described. Finally we list examples together with the execution time required for their processing

Universaar

Acronym

Parsing User Queries using Context Free Grammars

Author: Hirche Christian F.
van Noortwijk Kees
Publication venue
Publication date: 02/04/2023
Field of study

EUR Research Repository

Natural Languages Analysis in Machine Translation (MT) Based on the STCG (String-Tree Correspondence Grammar)

Author: Kong Tang Enya
Yusoff Zaharin
Publication venue: City University of Hong Kong
Publication date: 01/01/1995
Field of study

Waseda University Repository

Parsing User Queries using Context Free Grammars

Author: Hirche Christian F.
van Noortwijk Kees
Publication venue
Publication date: 02/04/2023
Field of study

In legal information retrieval, query cooking can significantly improve recall and precision. Context free grammars can be used to effectively parse user queries, even if the number of items torecognize is high and recognition patterns are complicated

EUR Research Repository

A Polynomial-Time Algorithm for the Lambek Calculus with Brackets of Bounded Order

Author: Kanovich Max
Kuznetsov Stepan
Morrill Glyn
Scedrov Andre
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 2nd International Conference on Formal Structures for Computation and Deduction (FSCD 2017)
Publication date: 01/01/2017
Field of study

Lambek calculus is a logical foundation of categorial grammar, a linguistic paradigm of grammar as logic and parsing as deduction. Pentus (2010) gave a polynomial-time algorithm for determining provability of bounded depth formulas in L*, the Lambek calculus with empty antecedents allowed. Pentus\u27 algorithm is based on tabularisation of proof nets. Lambek calculus with brackets is a conservative extension of Lambek calculus with bracket modalities, suitable for the modeling of syntactical domains. In this paper we give an algorithm for provability in Lb*, the Lambek calculus with brackets allowing empty antecedents. Our algorithm runs in polynomial time when both the formula depth and the bracket nesting depth are bounded. It combines a Pentus-style tabularisation of proof nets with an automata-theoretic treatment of bracketing

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Dagstuhl Research Online Publication Server

Transition-based dependency parsing as latent-variable constituent parsing

Author: Nederhof Mark Jan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

We provide a theoretical argument that a common form of projective transition-based dependency parsing is less powerful than constituent parsing using latent variables. The argument is a proof that, under reasonable assumptions, a transition-based dependency parser can be converted to a latent-variable context-free grammar producing equivalent structures.Postprin

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

Grammar Generation and Optimization from Multiple Inputs

Author: Pankaj B. Devre, Prof. Madhuri A. Bhalekar, Dr. Madan U. Kharat
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 31/01/2015
Field of study

Human being uses multiple modes like speech, text, facial expression, hand gesture, showing picture etc. for communication in between them. The use of this ways for communication makes human communication more simple and fast. In previous years several techniques are used to bring the human computer interaction more closely. It costs more for development and maintenance of Multimodal grammar in integrating and understanding input in multimodal interfaces i.e. using multiple input ways. This leads to improve and investigate more robust algorithm. The proposed system generates the grammar from multiple inputs called as multimodal grammar and evaluates grammar description length. Furthermore, to optimize the multimodal grammar proposed system uses learning operators which improves grammar description DOI: 10.17762/ijritcc2321-8169.15016

International Journal on Recent and Innovation Trends in Computing and Communication

FEAT-REP : representing features in CAD/CAM

Author: Bernardi Ansgar
Klauck Christoph
Legleitner Ralf
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1991
Field of study

When CAD/CAM experts view a workpiece, they perceive it in terms of their own expertise. These terms, called features, which are build upon a syntax (geometry) and a semantic (e.g. skeletal plans in manufacturing or functional relations in design), provide an abstraction mechanism to facilitate the creation, manufacturing and analysis of workpieces. Our goal is to enable experts to represent their own feature-language via a feature-grammar in the computer to build feature-based systems e.g. CAPP systems. The application of formal language terminology to the feature definitions facilitates the use of well-known formal language methods in conjunction with our flexible knowledge representation formalism FEAT-REP which will be presented in this paper

Universaar

Acronym

A syntax definition formalism

Author: Heering J. (Jan)
Klint P. (Paul)
Publication venue: CWI
Publication date: 01/01/1986
Field of study

CWI's Institutional Repository