12,422 research outputs found
A Flexible Shallow Approach to Text Generation
In order to support the efficient development of NL generation systems, two
orthogonal methods are currently pursued with emphasis: (1) reusable, general,
and linguistically motivated surface realization components, and (2) simple,
task-oriented template-based techniques. In this paper we argue that, from an
application-oriented perspective, the benefits of both are still limited. In
order to improve this situation, we suggest and evaluate shallow generation
methods associated with increased flexibility. We advise a close connection
between domain-motivated and linguistic ontologies that supports the quick
adaptation to new tasks and domains, rather than the reuse of general
resources. Our method is especially designed for generating reports with
limited linguistic variations.Comment: LaTeX, 10 page
Rerepresenting and Restructuring Domain Theories: A Constructive Induction Approach
Theory revision integrates inductive learning and background knowledge by
combining training examples with a coarse domain theory to produce a more
accurate theory. There are two challenges that theory revision and other
theory-guided systems face. First, a representation language appropriate for
the initial theory may be inappropriate for an improved theory. While the
original representation may concisely express the initial theory, a more
accurate theory forced to use that same representation may be bulky,
cumbersome, and difficult to reach. Second, a theory structure suitable for a
coarse domain theory may be insufficient for a fine-tuned theory. Systems that
produce only small, local changes to a theory have limited value for
accomplishing complex structural alterations that may be required.
Consequently, advanced theory-guided learning systems require flexible
representation and flexible structure. An analysis of various theory revision
systems and theory-guided learning systems reveals specific strengths and
weaknesses in terms of these two desired properties. Designed to capture the
underlying qualities of each system, a new system uses theory-guided
constructive induction. Experiments in three domains show improvement over
previous theory-guided systems. This leads to a study of the behavior,
limitations, and potential of theory-guided constructive induction.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Data re-engineering using formal transformations
This thesis presents and analyses a solution to the problem of formally re- engineering program data structures, allowing new representations of a program to be developed. The work is based around Ward's theory of program transformations which uses a Wide Spectrum Language, WSL, whose semantics were specially developed for use in proof of program transformations. The re-engineered code exhibits equivalent functionality to the original but differs in the degree of data abstraction and representation. Previous transformational re-engineering work has concentrated upon control flow restructuring, which has highlighted a lack of support for data restructuring in the maintainer's tool-set. Problems have been encountered during program transformation due to the lack of support for data re-engineering. A lack of strict data semantics and manipulation capabilities has left the maintainer unable to produce optimally re-engineered solutions. It has also hindered the migration of programs into other languages because it has not been possible to convert data structures into an appropriate form in the target language. The main contribution of the thesis is the Data Re-Engineering and Abstraction Mechanism (DREAM) which allows theories about type equivalence to be represented and used in a re-engineering environment. DREAM is based around the technique of "ghosting", a way of introducing different representations of data, which provides the theoretical underpinning of the changes applied to the program. A second major contribution is the introduction of data typing into the WSL language. This allows DREAM to be integrated into the existing transformation theories within WSL. These theoretical extensions of the original work have been shown to be practically viable by implementation within a prototype transformation tool, the Maintainer's Assistant. The extended tool has been used to re-engineer heavily modified, commercial legacy code. The results of this have shown that useful re-engineering work can be performed and that DREAM integrates well with existing control flow transformations
Agile model-driven re-engineering
In this paper we describe an Agile model-driven engineering (MDE) approach, AMDRE, for the re-engineering of legacy systems. The objective is to support the reuse of business-critical functionality from such systems and the porting of legacy code to modernised platforms, together with technical debt reduction to improve the system maintainability and extend its useful life. AMDRE uses a lightweight MDE process which involves the automated abstraction of software systems to UML specifications and the interactive application of refactoring and rearchitecting transformations to remove quality flaws and architectural flaws. We demonstrate the approach on Visual Basic, COBOL and Python legacy codes, including a finance industry case. Significant quality improvements are achieved, and translation accuracy over 80\% is demonstrated. In comparison to other MDE re-engineering approaches, AMDRE does not require high MDE skills and should be usable by mainstream software practitioners
Econometrics meets sentiment : an overview of methodology and applications
The advent of massive amounts of textual, audio, and visual data has spurred the development of econometric methodology to transform qualitative sentiment data into quantitative sentiment variables, and to use those variables in an econometric analysis of the relationships between sentiment and other variables. We survey this emerging research field and refer to it as sentometrics, which is a portmanteau of sentiment and econometrics. We provide a synthesis of the relevant methodological approaches, illustrate with empirical results, and discuss useful software
Automated Analysis of ARM Binaries using the Low-Level Virtual Machine Compiler Framework
Binary program analysis is a critical capability for offensive and defensive operations in Cyberspace. However, many current techniques are ineffective or time-consuming and few tools can analyze code compiled for embedded processors such as those used in network interface cards, control systems and mobile phones. This research designs and implements a binary analysis system, called the Architecture-independent Binary Abstracting Code Analysis System (ABACAS), which reverses the normal program compilation process, lifting binary machine code to the Low-Level Virtual Machine (LLVM) compiler\u27s intermediate representation, thereby enabling existing security-related analyses to be applied to binary programs. The prototype targets ARM binaries but can be extended to support other architectures. Several programs are translated from ARM binaries and analyzed with existing analysis tools. Programs lifted from ARM binaries are an average of 3.73 times larger than the same programs compiled from a high-level language (HLL). Analysis results are equivalent regardless of whether the HLL source or ARM binary version of the program is submitted to the system, confirming the hypothesis that LLVM is effective for binary analysis
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
- …