32,702 research outputs found

    On the Similarities Between Native, Non-native and Translated Texts

    Full text link
    We present a computational analysis of three language varieties: native, advanced non-native, and translation. Our goal is to investigate the similarities and differences between non-native language productions and translations, contrasting both with native language. Using a collection of computational methods we establish three main results: (1) the three types of texts are easily distinguishable; (2) non-native language and translations are closer to each other than each of them is to native language; and (3) some of these characteristics depend on the source or native language, while others do not, reflecting, perhaps, unified principles that similarly affect translations and non-native language.Comment: ACL2016, 12 page

    Testing a system specified using Statecharts and Z

    Get PDF
    A hybrid specification language SZ, in which the dynamic behaviour of a system is described using Statecharts and the data and the data transformations are described using Z, has been developed for the specification of embedded systems. This paper describes an approach to testing from a deterministic sequential specification written in SZ. By considering the Z specifications of the operations, the extended finite state machine (EFSM) defined by the Statechart can be rewritten to produce an EFSM that has a number of properties that simplify test generation. Test generation algorithms are introduced and applied to an example. While this paper considers SZ specifications, the approaches described might be applied whenever the specification is an EFSM whose states and transitions are specified using a language similar to Z

    Beyond an Anthropomorphic Template

    Get PDF
    In our endeavours to explore all possible forms that non-terrestrial communication may encompass, eventually we must throw off our anthropomorphic bias and investigate the implications of post-biological intelligence on SETI search strategies. In the event a candidate signal is detected, our initial categorization and assessment will focus on analyzing its comprising constructs, to ascertain whether information content is present; a fundamental signature of intelligence. To ensure our systems are capable of encompassing such intelligent communicators, we need to investigate both the contrasts and similarities of such non-biological communication and how this extends the known spectrum. In this paper, we begin to investigate the likely signatures and contrasting structures such non-biological communicators may present to us, across a range of known machine communication phenomena, and discuss how such contrasting forms of information exchange can aid, extend and refine our detection and decipherment capabilities

    Reasoning About a Service-oriented Programming Paradigm

    Full text link
    This paper is about a new way for programming distributed applications: the service-oriented one. It is a concept paper based upon our experience in developing a theory and a language for programming services. Both the theoretical formalization and the language interpreter showed us the evidence that a new programming paradigm exists. In this paper we illustrate the basic features it is characterized by

    On the Acoustic Characterization of Ejective Stops in Waima’a

    Get PDF
    We examine some acoustic properties of ejective stops in Waima’a (an Austronesian language spoken in East Timor), and compare them with other voiceless stop types that occur in the language. Previous studies of ejectives in other languages have suggested that they may fall into two classes, strong and weak. We compare our Waima’a results with some existing findings in the literature, and suggest that while Waima’a ejectives might appear to be more appropriately characterized as strong on some criteria, they do not sit squarely in either category

    Learning probability distributions generated by finite-state machines

    Get PDF
    We review methods for inference of probability distributions generated by probabilistic automata and related models for sequence generation. We focus on methods that can be proved to learn in the inference in the limit and PAC formal models. The methods we review are state merging and state splitting methods for probabilistic deterministic automata and the recently developed spectral method for nondeterministic probabilistic automata. In both cases, we derive them from a high-level algorithm described in terms of the Hankel matrix of the distribution to be learned, given as an oracle, and then describe how to adapt that algorithm to account for the error introduced by a finite sample.Peer ReviewedPostprint (author's final draft

    A Grammatical Inference Approach to Language-Based Anomaly Detection in XML

    Full text link
    False-positives are a problem in anomaly-based intrusion detection systems. To counter this issue, we discuss anomaly detection for the eXtensible Markup Language (XML) in a language-theoretic view. We argue that many XML-based attacks target the syntactic level, i.e. the tree structure or element content, and syntax validation of XML documents reduces the attack surface. XML offers so-called schemas for validation, but in real world, schemas are often unavailable, ignored or too general. In this work-in-progress paper we describe a grammatical inference approach to learn an automaton from example XML documents for detecting documents with anomalous syntax. We discuss properties and expressiveness of XML to understand limits of learnability. Our contributions are an XML Schema compatible lexical datatype system to abstract content in XML and an algorithm to learn visibly pushdown automata (VPA) directly from a set of examples. The proposed algorithm does not require the tree representation of XML, so it can process large documents or streams. The resulting deterministic VPA then allows stream validation of documents to recognize deviations in the underlying tree structure or datatypes.Comment: Paper accepted at First Int. Workshop on Emerging Cyberthreats and Countermeasures ECTCM 201

    Towards Understanding the Origin of Genetic Languages

    Full text link
    Molecular biology is a nanotechnology that works--it has worked for billions of years and in an amazing variety of circumstances. At its core is a system for acquiring, processing and communicating information that is universal, from viruses and bacteria to human beings. Advances in genetics and experience in designing computers have taken us to a stage where we can understand the optimisation principles at the root of this system, from the availability of basic building blocks to the execution of tasks. The languages of DNA and proteins are argued to be the optimal solutions to the information processing tasks they carry out. The analysis also suggests simpler predecessors to these languages, and provides fascinating clues about their origin. Obviously, a comprehensive unraveling of the puzzle of life would have a lot to say about what we may design or convert ourselves into.Comment: (v1) 33 pages, contributed chapter to "Quantum Aspects of Life", edited by D. Abbott, P. Davies and A. Pati, (v2) published version with some editin
    corecore