14,025 research outputs found
The effectiveness of bottom up technique with probabilistic approach for a Malay parser
Parsing is a process of analyzing the input string in a sentence to define the syntax structures
according to rules of grammar. This task is performed by a parser which will produce a parse
tree as output. However, a problem occurs when the parsing process produces two or more
parse trees in which the parser unable to represent a precise parse tree. This limitation is
caused by ambiguity in the structure of sentences. Ambiguity is occurred when a word is
classified more than one category of syntax and its usage will affect the semantics of the
sentence. Thus, the parser needs to have an approach to solve the ambiguity problem and is
able to process the most appropriate parse tree to present a sentence. Like other languages in
the world, Malay language, a national language for Malaysian, is not exempted from
ambiguity problem. However, due to its grammar being context-free grammar, the
probabilistic context-free grammar approach can be used to support the parser in determining
a more accurate parse tree. This study focuses on the development of statistical parser using a
bottom-up technique for Malay language. The training data, in the form of simple Malay
language sentences, are collected from various sources. Based on this training data, a
statistical lexical corpus of Malay language which consists of vocabulary, grammar rules and
their probability was developed. The bottom up parsing will be supported by implementing
Cocke–Younger–Kasami (CYK) algorithm. The parser’s performance is evaluated based on
its effectiveness to overcome ambiguity by suggesting a more precise parse tree. In
conclusion, the Malay Language Parser can be useful to help user identify the appropriate
parse tree and solve ambiguity issues in Malay Language
Ambiguity Detection: Scaling to Scannerless
Static ambiguity detection would be an important aspect of language
workbenches for textual software languages. However, the challenge is
that automatic ambiguity detection in context-free grammars is undecidable
in general. Sophisticated approximations and optimizations do exist,
but these do not scale to grammars for so-called ``scannerless parsers'', as of yet.
We extend previous work on ambiguity detection for context-free grammars to
cover disambiguation techniques that are typical for scannerless parsing,
such as longest match and reserved keywords.
This paper contributes a new algorithm for ambiguity detection in
character-level grammars, a prototype implementation of this algorithm and
validation on several real grammars. The total run-time of ambiguity
detection for character-level grammars for languages such as C and Java is
significantly reduced, without loss of precision.
The result is that efficient ambiguity detection in realistic grammars is
possible and may therefore become a tool in language workbenches
- …