18 research outputs found
Language Outlay for the Sokoto Cement Production System Using Deterministic Finite Automata Scheme
This paper constructs the compact, detailed and extended models, and focuses on the algebraic theoretic strings and language of each transition of the finite automata scheme. It was discovered that from the initial stage to the final stage of cement production processes, each transition or production process can have a particular language. In addition, a language scheme is developed for each of the sub-states (sub-model) that leads to a theoretic study of language scheme and semantics of the model. It can be deduced that when represented as binary codes, the established schemes in the sub-states can be studied as a Boolean algebraic scheme. 
A Grammatical Inference Approach to Language-Based Anomaly Detection in XML
False-positives are a problem in anomaly-based intrusion detection systems.
To counter this issue, we discuss anomaly detection for the eXtensible Markup
Language (XML) in a language-theoretic view. We argue that many XML-based
attacks target the syntactic level, i.e. the tree structure or element content,
and syntax validation of XML documents reduces the attack surface. XML offers
so-called schemas for validation, but in real world, schemas are often
unavailable, ignored or too general. In this work-in-progress paper we describe
a grammatical inference approach to learn an automaton from example XML
documents for detecting documents with anomalous syntax.
We discuss properties and expressiveness of XML to understand limits of
learnability. Our contributions are an XML Schema compatible lexical datatype
system to abstract content in XML and an algorithm to learn visibly pushdown
automata (VPA) directly from a set of examples. The proposed algorithm does not
require the tree representation of XML, so it can process large documents or
streams. The resulting deterministic VPA then allows stream validation of
documents to recognize deviations in the underlying tree structure or
datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and
Countermeasures ECTCM 201
Learning Unions of k-Testable Languages
A classical problem in grammatical inference is to identify a language from a
set of examples. In this paper, we address the problem of identifying a union
of languages from examples that belong to several different unknown languages.
Indeed, decomposing a language into smaller pieces that are easier to represent
should make learning easier than aiming for a too generalized language. In
particular, we consider k-testable languages in the strict sense (k-TSS). These
are defined by a set of allowed prefixes, infixes (sub-strings) and suffixes
that words in the language may contain. We establish a Galois connection
between the lattice of all languages over alphabet {\Sigma}, and the lattice of
k-TSS languages over {\Sigma}. We also define a simple metric on k-TSS
languages. The Galois connection and the metric allow us to derive an efficient
algorithm to learn the union of k-TSS languages. We evaluate our algorithm on
an industrial dataset and thus demonstrate the relevance of our approach
Statistical natural language generation for dialogue systems based on hierarchical models
Due to the increasing presence of natural-language interfaces in our life, natural
language processing (NLP) is currently gaining more popularity every year.
However, until recently, the main part of the research activity in this area was
aimed to Natural Language Understanding (NLU), which is responsible for
extracting meanings from natural language input. This is explained by a wider
number of practical applications of NLU such as machine translation, etc.,
whereas Natural Language Generation is mainly used for providing output
interfaces, which was considered more as a user interface problem rather than a
functionality issue.
Generally speaking, natural language generation (NLG) is the process of
generating text from a semantic representation, which can be expressed in many
different forms. The common application of NLG takes part in so called Spoken
Dialogue System (SDS), where user interacts directly by voice with a computer-
based system to receive information or perform a certain type of actions as, for
example, buying a plane ticket or booking a table in a restaurant. Dialogue
systems represent one of the most interesting applications within the field of
speech technologies. Usually the NLG part in this kind of systems was provided by
templates, only filling canned gaps with requested information. But nowadays,
since SDS are increasing its complexity, more advanced and user-friendly
interfaces should be provided, thereby creating a need for a more refined and
adaptive approach.
One of the solutions to be considered are the NLG models based on statistical
frameworks, where the system’s response to user is generated in real-time,
adjusting their response to the user performance, instead of just choosing a
pertinent template. Due to the corpus-based approach, these systems are easy to
adapt to the different tasks in a range of informational domain.
The aim of this work is to present a statistical approach to the problem of utterance
generation, which uses cooperation between two different language models (LM)
in order to enhance the efficiency of NLG module. In the higher level, a class-
based language model is used to build the syntactic structure of the sentence. Inthe second layer, a specific language model acts inside each class, dealing with
the words.
In the dialogue system described in this work, a user asks for an information
regarding to a bus schedule, route schemes, fares and special information.
Therefore in each dialogue the user has a specific dialogue goal, which needs to
be met by the system. This could be used as one of the methods to measure the
system performance, as well as the appropriate utterance generation and average
dialogue length, which is important when speaking about an interactive information
system.
The work is organized as follows. In Section 2 the basic approaches to the NLG
task are described, and their advantages and disadvantages are considered.
Section 3 presents the objective of this work. In Section 4 the basic model and its
novelty is explained. In Section 5 the details of the task features and the corpora
employed are presented. Section 6 contains the experiments results and its
explanation, as well as the evaluation of the obtained results. The Section 7
resumes the conclusions and the future investigation proposals