229 research outputs found
ETRANS: A English-Thai translator
ETRANS is an experimental English-Thai machine translation (MT) system that translates a simple English sentence into a grammatically correct Thai sentence. The entire system is written in C-Prolog, and runs on UNIX systems. The MT strategy taken by ETRANS is an interlingual strategy with a parser for English and a generator for Thai. The parser creates a semantic representation equivalent to the meaning of the English sentence. A generator then interprets the semantic representation into Thai. ETRANS employs frames as a means for representing knowledge, and an augmented transition network (ATN) as the linguistic framework for analyzing and generating sentences
One Parser to Rule Them All
Despite the long history of research in parsing, constructing parsers for real programming languages remains a difficult and painful task. In the last decades, different parser generators emerged to allow the construction of parsers from a BNF-like specification. However, still today, many parsers are handwritten, or are only partly generated, and include various hacks to deal with different peculiarities in programming languages. The main problem is that current declarative syntax definition techniques are based on pure context-free grammars, while many constructs found in programming languages require context information.
In this paper we propose a parsing framework that embraces context information in its core. Our framework is based on data-dependent grammars, which extend context-free grammars with arbitrary computation, variable binding and constraints. We present an implementation of our framework on top of the Generalized LL (GLL) parsing algorithm, and show how common idioms in syntax of programming languages such as (1) lexical disambiguation filters, (2) operator precedence, (3) indentation-sensitive rules, and (4) conditional preprocessor directives can be mapped to data-dependent grammars. We demonstrate the initial experience with our framework, by parsing more than 20000 Java, C#, Haskell, and OCaml source files
Recommended from our members
A Contrastive Study of Functional Unification Grammar for Surface Language Generation: A Case Study in Choice of Connectives
Language generation systems have used a variety of grammatical formalisms for producing syntactic structure and yet, there has been little research evaluating the formalisms for the specifics of the generation task. In our work at Columbia we have primarily used a unification based formalism, a Functional Unification Grammar (FUG) [Kay 79] and have found it well suited for many of the generation tasks we have addressed. Over the course of the past 5 years we have also explored the use of various off-the-shelf parsing formalisms, including an Augmented Transition Network (ATN) [Woods 701], a Bottom-Up Chart Parser (SUP) [Finin 84], and a Declarative Clause Grammar (DCG) [Pereira and Warren 80]. In contrast, we have found that parsing formalisms do not have the same benefits for the generation task
An overview of computer-based natural language processing
Computer based Natural Language Processing (NLP) is the key to enabling humans and their computer based creations to interact with machines in natural language (like English, Japanese, German, etc., in contrast to formal computer languages). The doors that such an achievement can open have made this a major research area in Artificial Intelligence and Computational Linguistics. Commercial natural language interfaces to computers have recently entered the market and future looks bright for other applications as well. This report reviews the basic approaches to such systems, the techniques utilized, applications, the state of the art of the technology, issues and research requirements, the major participants and finally, future trends and expectations. It is anticipated that this report will prove useful to engineering and research managers, potential users, and others who will be affected by this field as it unfolds
Recommended from our members
Comparison of Surface Language Generators: A Case Study in Choice of Connectives
Language generation systems have used a variety of grammatical formalisms for producing syntactic structure and yet, there has been little research evaluating the formalisms for the specifics of the generation task. In our work at Columbia we have primarily used a unification based formalism, a Functional Unification Grammar (FUG) [Kay 79] and have found it well suited for many of the generation tasks we have addressed. Over the course of the past 5 years we have also explored the use of various off-the-shelf parsing formalisms, including an Augmented Transition Network (ATN) [Woods 70]. a Bottom-Up Chan Parser (BUP) [Finin 84], and a Declarative Clause Grammar (DCG) [Pereira & Warren 80]. In this paper, we identify the characteristics of FDG that we find useful for generation and contrast these with characteristics of the parsing formalisms and with other formalisms that are typically used for generation
UGURU: a natural language UNIX consultant
UGURU is a natural language conversation program, implemented in Prolog, which can manage a wide knowledge base of facts about Unix. The range and wording of questions that it understands are based on surveys taken of students, mostly Unix beginners. UGURU is also designed to accept statements in English that can be added as facts to the knowledge base. Each fact is represented as a binding set: a verb-oriented semantic net with the characteristics of directed acyclic graphs. The main actions taken by UGURU are divided between two primary modules, a parser and a retriever. To produce a binding set from an input, the parser incorporates a new kind of object-oriented grammar of several levels, parallel tracing of distinct parse trees by independent units called recognizers, the concurrent use of both syntactic and semantic knowledge, and a pragmatic criterion that requires the system to mimic the sequence of human parsing. The retriever, invoked to answer input questions, seeks to match the binding set representing the question to a fact in the knowledge base by performing semantic transformations on the two sets
- …