12,422 research outputs found

    A Flexible Shallow Approach to Text Generation

    Full text link
    In order to support the efficient development of NL generation systems, two orthogonal methods are currently pursued with emphasis: (1) reusable, general, and linguistically motivated surface realization components, and (2) simple, task-oriented template-based techniques. In this paper we argue that, from an application-oriented perspective, the benefits of both are still limited. In order to improve this situation, we suggest and evaluate shallow generation methods associated with increased flexibility. We advise a close connection between domain-motivated and linguistic ontologies that supports the quick adaptation to new tasks and domains, rather than the reuse of general resources. Our method is especially designed for generating reports with limited linguistic variations.Comment: LaTeX, 10 page

    Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach

    Full text link
    Theory revision integrates inductive learning and background knowledge by combining training examples with a coarse domain theory to produce a more accurate theory. There are two challenges that theory revision and other theory-guided systems face. First, a representation language appropriate for the initial theory may be inappropriate for an improved theory. While the original representation may concisely express the initial theory, a more accurate theory forced to use that same representation may be bulky, cumbersome, and difficult to reach. Second, a theory structure suitable for a coarse domain theory may be insufficient for a fine-tuned theory. Systems that produce only small, local changes to a theory have limited value for accomplishing complex structural alterations that may be required. Consequently, advanced theory-guided learning systems require flexible representation and flexible structure. An analysis of various theory revision systems and theory-guided learning systems reveals specific strengths and weaknesses in terms of these two desired properties. Designed to capture the underlying qualities of each system, a new system uses theory-guided constructive induction. Experiments in three domains show improvement over previous theory-guided systems. This leads to a study of the behavior, limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Data re-engineering using formal transformations

    Get PDF
    This thesis presents and analyses a solution to the problem of formally re- engineering program data structures, allowing new representations of a program to be developed. The work is based around Ward's theory of program transformations which uses a Wide Spectrum Language, WSL, whose semantics were specially developed for use in proof of program transformations. The re-engineered code exhibits equivalent functionality to the original but differs in the degree of data abstraction and representation. Previous transformational re-engineering work has concentrated upon control flow restructuring, which has highlighted a lack of support for data restructuring in the maintainer's tool-set. Problems have been encountered during program transformation due to the lack of support for data re-engineering. A lack of strict data semantics and manipulation capabilities has left the maintainer unable to produce optimally re-engineered solutions. It has also hindered the migration of programs into other languages because it has not been possible to convert data structures into an appropriate form in the target language. The main contribution of the thesis is the Data Re-Engineering and Abstraction Mechanism (DREAM) which allows theories about type equivalence to be represented and used in a re-engineering environment. DREAM is based around the technique of "ghosting", a way of introducing different representations of data, which provides the theoretical underpinning of the changes applied to the program. A second major contribution is the introduction of data typing into the WSL language. This allows DREAM to be integrated into the existing transformation theories within WSL. These theoretical extensions of the original work have been shown to be practically viable by implementation within a prototype transformation tool, the Maintainer's Assistant. The extended tool has been used to re-engineer heavily modified, commercial legacy code. The results of this have shown that useful re-engineering work can be performed and that DREAM integrates well with existing control flow transformations

    Agile model-driven re-engineering

    Get PDF
    In this paper we describe an Agile model-driven engineering (MDE) approach, AMDRE, for the re-engineering of legacy systems. The objective is to support the reuse of business-critical functionality from such systems and the porting of legacy code to modernised platforms, together with technical debt reduction to improve the system maintainability and extend its useful life. AMDRE uses a lightweight MDE process which involves the automated abstraction of software systems to UML specifications and the interactive application of refactoring and rearchitecting transformations to remove quality flaws and architectural flaws. We demonstrate the approach on Visual Basic, COBOL and Python legacy codes, including a finance industry case. Significant quality improvements are achieved, and translation accuracy over 80\% is demonstrated. In comparison to other MDE re-engineering approaches, AMDRE does not require high MDE skills and should be usable by mainstream software practitioners

    Econometrics meets sentiment : an overview of methodology and applications

    Get PDF
    The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software

    Automated Analysis of ARM Binaries using the Low-Level Virtual Machine Compiler Framework

    Get PDF
    Binary program analysis is a critical capability for offensive and defensive operations in Cyberspace. However, many current techniques are ineffective or time-consuming and few tools can analyze code compiled for embedded processors such as those used in network interface cards, control systems and mobile phones. This research designs and implements a binary analysis system, called the Architecture-independent Binary Abstracting Code Analysis System (ABACAS), which reverses the normal program compilation process, lifting binary machine code to the Low-Level Virtual Machine (LLVM) compiler\u27s intermediate representation, thereby enabling existing security-related analyses to be applied to binary programs. The prototype targets ARM binaries but can be extended to support other architectures. Several programs are translated from ARM binaries and analyzed with existing analysis tools. Programs lifted from ARM binaries are an average of 3.73 times larger than the same programs compiled from a high-level language (HLL). Analysis results are equivalent regardless of whether the HLL source or ARM binary version of the program is submitted to the system, confirming the hypothesis that LLVM is effective for binary analysis

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
    • …
    corecore