18 research outputs found

    Language Outlay for the Sokoto Cement Production System Using Deterministic Finite Automata Scheme

    Get PDF
    This paper constructs the compact, detailed and extended models, and focuses on the algebraic theoretic strings and language of each transition of the finite automata scheme. It was discovered that from the initial stage to the final stage of cement production processes, each transition or production process can have a particular language. In addition, a language scheme is developed for each of the sub-states (sub-model) that leads to a theoretic study of language scheme and semantics of the model. It can be deduced that when represented as binary codes, the established schemes in the sub-states can be studied as a Boolean algebraic scheme.&nbsp

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Learning stochastic finite automata from experts

    Full text link

    Learning Unions of k-Testable Languages

    Get PDF
    A classical problem in grammatical inference is to identify a language from a set of examples. In this paper, we address the problem of identifying a union of languages from examples that belong to several different unknown languages. Indeed, decomposing a language into smaller pieces that are easier to represent should make learning easier than aiming for a too generalized language. In particular, we consider k-testable languages in the strict sense (k-TSS). These are defined by a set of allowed prefixes, infixes (sub-strings) and suffixes that words in the language may contain. We establish a Galois connection between the lattice of all languages over alphabet {\Sigma}, and the lattice of k-TSS languages over {\Sigma}. We also define a simple metric on k-TSS languages. The Galois connection and the metric allow us to derive an efficient algorithm to learn the union of k-TSS languages. We evaluate our algorithm on an industrial dataset and thus demonstrate the relevance of our approach

    Statistical natural language generation for dialogue systems based on hierarchical models

    Get PDF
    Due to the increasing presence of natural-language interfaces in our life, natural language processing (NLP) is currently gaining more popularity every year. However, until recently, the main part of the research activity in this area was aimed to Natural Language Understanding (NLU), which is responsible for extracting meanings from natural language input. This is explained by a wider number of practical applications of NLU such as machine translation, etc., whereas Natural Language Generation is mainly used for providing output interfaces, which was considered more as a user interface problem rather than a functionality issue. Generally speaking, natural language generation (NLG) is the process of generating text from a semantic representation, which can be expressed in many different forms. The common application of NLG takes part in so called Spoken Dialogue System (SDS), where user interacts directly by voice with a computer- based system to receive information or perform a certain type of actions as, for example, buying a plane ticket or booking a table in a restaurant. Dialogue systems represent one of the most interesting applications within the field of speech technologies. Usually the NLG part in this kind of systems was provided by templates, only filling canned gaps with requested information. But nowadays, since SDS are increasing its complexity, more advanced and user-friendly interfaces should be provided, thereby creating a need for a more refined and adaptive approach. One of the solutions to be considered are the NLG models based on statistical frameworks, where the system’s response to user is generated in real-time, adjusting their response to the user performance, instead of just choosing a pertinent template. Due to the corpus-based approach, these systems are easy to adapt to the different tasks in a range of informational domain. The aim of this work is to present a statistical approach to the problem of utterance generation, which uses cooperation between two different language models (LM) in order to enhance the efficiency of NLG module. In the higher level, a class- based language model is used to build the syntactic structure of the sentence. Inthe second layer, a specific language model acts inside each class, dealing with the words. In the dialogue system described in this work, a user asks for an information regarding to a bus schedule, route schemes, fares and special information. Therefore in each dialogue the user has a specific dialogue goal, which needs to be met by the system. This could be used as one of the methods to measure the system performance, as well as the appropriate utterance generation and average dialogue length, which is important when speaking about an interactive information system. The work is organized as follows. In Section 2 the basic approaches to the NLG task are described, and their advantages and disadvantages are considered. Section 3 presents the objective of this work. In Section 4 the basic model and its novelty is explained. In Section 5 the details of the task features and the corpora employed are presented. Section 6 contains the experiments results and its explanation, as well as the evaluation of the obtained results. The Section 7 resumes the conclusions and the future investigation proposals
    corecore