141,385 research outputs found

    A Neural Model for Generating Natural Language Summaries of Program Subroutines

    Full text link
    Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature

    Automated user documentation generation based on the Eclipse application model

    Full text link
    An application's user documentation, also referred to as the user manual, is one of the core elements required in application distribution. While there exist many tools to aid an application's developer in creating and maintaining documentation on and for the code itself, there are no tools that complement code development with user documentation for modern graphical applications. Approaches like literate programming are not applicable to this scenario, as not a library, but a full application is to be documented to an end-user. Documentation generation on applications up to now was only partially feasible due to the gap between the code and its semantics. The new generation of Eclipse rich client platform developed applications is based on an application model, closing a broad semantic gap between code and visible interface. We use this application model to provide a semantic description for the contained elements. Combined with the internal relationships of the application model, these semantic descriptions are aggregated to well-structured user documentations that comply to the ISO/IEC 26514. This paper delivers a report on the Ecrit research project, where the potentials and limitations of user documentation generation based on the Eclipse application model were investigated.Comment: 9 pages, 9 figure

    Example-based controlled translation

    Get PDF
    The first research on integrating controlled language data in an Example-Based Machine Translation (EBMT) system was published in [Gough & Way, 2003]. We improve on their sub-sentential alignment algorithm to populate the system’s databases with more than six times as many potentially useful fragments. Together with two simple novel improvements—correcting mistranslations in the lexicon, and allowing multiple translations in the lexicon—translation quality improves considerably when target language translations are constrained. We also develop the first EBMT system which attempts to filter the source language data using controlled language specifications. We provide detailed automatic and human evaluations of a number of experiments carried out to test the quality of the system. We observe that our system outperforms Logomedia in a number of tests. Finally, despite conflicting results from different automatic evaluation metrics, we observe a preference for controlling the source data rather than the target translations

    Controlled generation in example-based machine translation

    Get PDF
    The theme of controlled translation is currently in vogue in the area of MT. Recent research (Sch¨aler et al., 2003; Carl, 2003) hypothesises that EBMT systems are perhaps best suited to this challenging task. In this paper, we present an EBMT system where the generation of the target string is filtered by data written according to controlled language specifications. As far as we are aware, this is the only research available on this topic. In the field of controlled language applications, it is more usual to constrain the source language in this way rather than the target. We translate a small corpus of controlled English into French using the on-line MT system Logomedia, and seed the memories of our EBMT system with a set of automatically induced lexical resources using the Marker Hypothesis as a segmentation tool. We test our system on a large set of sentences extracted from a Sun Translation Memory, and provide both an automatic and a human evaluation. For comparative purposes, we also provide results for Logomedia itself

    Semantic Component Composition

    Full text link
    Building complex software systems necessitates the use of component-based architectures. In theory, of the set of components needed for a design, only some small portion of them are "custom"; the rest are reused or refactored existing pieces of software. Unfortunately, this is an idealized situation. Just because two components should work together does not mean that they will work together. The "glue" that holds components together is not just technology. The contracts that bind complex systems together implicitly define more than their explicit type. These "conceptual contracts" describe essential aspects of extra-system semantics: e.g., object models, type systems, data representation, interface action semantics, legal and contractual obligations, and more. Designers and developers spend inordinate amounts of time technologically duct-taping systems to fulfill these conceptual contracts because system-wide semantics have not been rigorously characterized or codified. This paper describes a formal characterization of the problem and discusses an initial implementation of the resulting theoretical system.Comment: 9 pages, submitted to GCSE/SAIG '0

    Archaeological site monitoring: UAV photogrammetry can be an answer

    Get PDF
    During archaeological excavations it is important to monitor the new excavated areas and findings day by day in order to be able to plan future excavation activities. At present, this daily activity is usually performed by using total stations, which survey the changes of the archaeological site: the surveyors are asked to produce day by day draft plans and sections which allow archaeologists to plan their future activities. The survey is realized during the excavations or just at the end of every working day and drawings have to be produced as soon as possible in order to allow the comprehension of the work done and to plan the activities for the following day. By using this technique, all the measurements, even those not necessary for the day after, have to be acquired in order to avoid a ‘loss of memory'. A possible alternative to this traditional approach is aerial photogrammetry, if the images can be acquired quickly and at a taken distance able to guarantee the necessary accuracy of a few centimeters. Today the use of UAVs (Unmanned Aerial Vehicles) can be considered a proven technology able to acquire images at distances ranging from 4 m up to 20 m: and therefore as a possible monitoring system to provide the necessary information to the archaeologists day by day. The control network, usually present at each archaeological site, can give the stable control points useful for orienting a photogrammetric block acquired by using an UAV equipped with a calibrated digital camera and a navigation control system able to drive the aircraft following a pre-planned flight scheme. Modern digital photogrammetric software can solve for the block orientation and generate a DSM automatically, allowing rapid orthophoto generation and the possibility of producing sections and plans. The present paper describes a low cost UAV system realized by the research group of the Politecnico di Torino and tested on a Roman villa archaeological site located in Aquileia (Italy), a well-known UNESCO WHL site. The results of automatic orientation and orthophoto production are described in terms of their accuracy and the completeness of information guaranteed for archaeological site excavation managemen
    • …
    corecore