7,668 research outputs found
On the Disambiguation of Weighted Automata
We present a disambiguation algorithm for weighted automata. The algorithm
admits two main stages: a pre-disambiguation stage followed by a transition
removal stage. We give a detailed description of the algorithm and the proof of
its correctness. The algorithm is not applicable to all weighted automata but
we prove sufficient conditions for its applicability in the case of the
tropical semiring by introducing the *weak twins property*. In particular, the
algorithm can be used with all acyclic weighted automata, relevant to
applications. While disambiguation can sometimes be achieved using
determinization, our disambiguation algorithm in some cases can return a result
that is exponentially smaller than any equivalent deterministic automaton. We
also present some empirical evidence of the space benefits of disambiguation
over determinization in speech recognition and machine translation
applications
Extracting Formal Models from Normative Texts
We are concerned with the analysis of normative texts - documents based on
the deontic notions of obligation, permission, and prohibition. Our goal is to
make queries about these notions and verify that a text satisfies certain
properties concerning causality of actions and timing constraints. This
requires taking the original text and building a representation (model) of it
in a formal language, in our case the C-O Diagram formalism. We present an
experimental, semi-automatic aid that helps to bridge the gap between a
normative text in natural language and its C-O Diagram representation. Our
approach consists of using dependency structures obtained from the
state-of-the-art Stanford Parser, and applying our own rules and heuristics in
order to extract the relevant components. The result is a tabular data
structure where each sentence is split into suitable fields, which can then be
converted into a C-O Diagram. The process is not fully automatic however, and
some post-editing is generally required of the user. We apply our tool and
perform experiments on documents from different domains, and report an initial
evaluation of the accuracy and feasibility of our approach.Comment: Extended version of conference paper at the 21st International
Conference on Applications of Natural Language to Information Systems (NLDB
2016). arXiv admin note: substantial text overlap with arXiv:1607.0148
Rule-restricted Automaton-grammar transducers: Power and Linguistic Applications
This paper introduces the notion of a new transducer as a two-component system, which consists of a nite automaton and a context-free grammar. In essence, while the automaton reads its input string, the grammar produces its output string, and their cooperation is controlled by a set, which restricts the usage of their rules. From a theoretical viewpoint, the present paper discusses the power of this system working in an ordinary way as well as in a leftmost way. In addition, the paper introduces an appearance checking, which allows us to check whether some symbols are present in the rewritten string, and studies its e ect on the power. It achieves the following three main results. First, the system generates and accepts languages de ned by matrix grammars and partially blind multi-counter automata, respectively. Second, if we place a leftmost restriction on derivation in the context-free grammar, both accepting and generating power of the system is equal to generative power of context-free grammars. Third, the system with appearance checking can accept and generate all recursively enumerable languages. From more pragmatical viewpoint, this paper describes several linguistic applications. A special attention is paid to the Japanese-Czech translation
ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING
One of the fundamental characteristics of language is that it can change over time. One
method to monitor the change is by observing its corpora: a structured language
documentation. Recent development in technology, especially in the field of Natural
Language Processing allows robust linguistic processing, which support the description of
diverse historical changes of the corpora. The interference of human linguist is inevitable as
it determines the gold standard, but computer assistance provides considerable support by
incorporating computational approach in exploring the corpora, especially historical
corpora. This paper proposes a model for corpus development, where corpus are annotated
to support further computational operations such as lexicogrammatical pattern matching,
automatic retrieval and extraction. The corpus processing operations are performed by local
grammar based corpus processing software on a contemporary Indonesian corpus. This
paper concludes that data collection and data processing in a corpus are equally crucial
importance to monitor language change, and none can be set aside
ANNOTATION MODEL FOR LOANWORDS IN INDONESIAN CORPUS: A LOCAL GRAMMAR FRAMEWORK
There is a considerable number for loanwords in Indonesian language as it has been,
or even continuously, in contact with other languages. The contact takes place via different
media; one of them is via machine readable medium. As the information in different languages
can be obtained by a mouse click these days, the contact becomes more and more intense. This
paper aims at proposing an annotation model and lexical resource for loanwords in
Indonesian. The lexical resource is applied to a corpus by a corpus processing software called
UNITEX. This software works under local grammar framewor
On becoming a physicist of mind
In 1976, the German Max Planck Society established a new research enterprise in psycholinguistics, which became the Max Planck Institute for Psycholinguistics in Nijmegen, the Netherlands. I was fortunate enough to be invited to direct this institute. It enabled me, with my background in visual and auditory psychophysics and the theory of formal grammars and automata, to develop a long-term chronometric endeavor to dissect the process of speaking. It led, among other work, to my book Speaking (1989) and to my research team's article in Brain and Behavioral Sciences “A Theory of Lexical Access in Speech Production” (1999). When I later became president of the Royal Netherlands Academy of Arts and Sciences, I helped initiate the Women for Science research project of the Inter Academy Council, a project chaired by my physicist sister at the National Institute of Standards and Technology. As an emeritus I published a comprehensive History of Psycholinguistics (2013). As will become clear, many people inspired and joined me in these undertakings
- …