137,561 research outputs found

    A comparison of tree- and line-oriented observational slicing

    Get PDF
    Observation-based slicing and its generalization observational slicing are recently-introduced, language-independent dynamic slicing techniques. They both construct slices based on the dependencies observed during program execution, rather than static or dynamic dependence analysis. The original implementation of the observation-based slicing algorithm used lines of source code as its program representation. A recent variation, developed to slice modelling languages (such as Simulink), used an XML representation of an executable model. We ported the XML slicer to source code by constructing a tree representation of traditional source code through the use of srcML. This work compares the tree- and line-based slicers using four experiments involving twenty different programs, ranging from classic benchmarks to million-line production systems. The resulting slices are essentially the same size for the majority of the programs and are often identical. However, structural constraints imposed by the tree representation sometimes force the slicer to retain enclosing control structures. It can also “bog down” trying to delete single-token subtrees. This occasionally makes the tree-based slices larger and the tree-based slicer slower than a parallelised version of the line-based slicer. In addition, a Java versus C comparison finds that the two languages lead to similar slices, but Java code takes noticeably longer to slice. The initial experiments suggest two improvements to the tree-based slicer: the addition of a size threshold, for ignoring small subtrees, and subtree replacement. The former enables the slicer to run 3.4 times faster while producing slices that are only about 9% larger. At the same time the subtree replacement reduces size by about 8–12% and allows the tree-based slicer to produce more natural slices

    A tree-to-tree model for statistical machine translation

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 227-234).In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a target-language tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a source-language tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output. A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the source-language input to the target-language output. Instead of learning a mapping from a source-language tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP. The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system.by Brooke Alissa Cowan.Ph.D

    Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic

    Get PDF
    Language modeling for an inflected language such as Arabic poses new challenges for speech recognition and machine translation due to its rich morphology. Rich morphology results in large increases in out-of-vocabulary (OOV) rate and poor language model parameter estimation in the absence of large quantities of data. In this study, we present a joint morphological-lexical language model (JMLLM) that takes advantage of Arabic morphology. JMLLM combines morphological segments with the underlying lexical items and additional available information sources with regards to morphological segments and lexical items in a single joint model. Joint representation and modeling of morphological and lexical items reduces the OOV rate and provides smooth probability estimates while keeping the predictive power of whole words. Speech recognition and machine translation experiments in dialectal-Arabic show improvements over word and morpheme based trigram language models. We also show that as the tightness of integration between different information sources increases, both speech recognition and machine translation performances improve

    Thread Reconstruction in Conversational Data using Neural Coherence Models

    Get PDF
    Discussion forums are an important source of information. They are often used to answer specific questions a user might have and to discover more about a topic of interest. Discussions in these forums may evolve in intricate ways, making it difficult for users to follow the flow of ideas. We propose a novel approach for automatically identifying the underlying thread structure of a forum discussion. Our approach is based on a neural model that computes coherence scores of possible reconstructions and then selects the highest scoring, i.e., the most coherent one. Preliminary experiments demonstrate promising results outperforming a number of strong baseline methods.Comment: Neu-IR: Workshop on Neural Information Retrieval 201

    On Tree-Based Neural Sentence Modeling

    Full text link
    Neural networks with tree-based sentence encoders have shown better results on many downstream tasks. Most of existing tree-based encoders adopt syntactic parsing trees as the explicit structure prior. To study the effectiveness of different tree structures, we replace the parsing trees with trivial trees (i.e., binary balanced tree, left-branching tree and right-branching tree) in the encoders. Though trivial trees contain no syntactic information, those encoders get competitive or even better results on all of the ten downstream tasks we investigated. This surprising result indicates that explicit syntax guidance may not be the main contributor to the superior performances of tree-based neural sentence modeling. Further analysis show that tree modeling gives better results when crucial words are closer to the final representation. Additional experiments give more clues on how to design an effective tree-based encoder. Our code is open-source and available at https://github.com/ExplorerFreda/TreeEnc.Comment: To Appear at EMNLP 201

    Discourse Structure in Machine Translation Evaluation

    Full text link
    In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

    Better Document-level Sentiment Analysis from RST Discourse Parsing

    Full text link
    Discourse structure is the hidden link between surface features and document-level properties, such as sentiment polarity. We show that the discourse analyses produced by Rhetorical Structure Theory (RST) parsers can improve document-level sentiment analysis, via composition of local information up the discourse tree. First, we show that reweighting discourse units according to their position in a dependency representation of the rhetorical structure can yield substantial improvements on lexicon-based sentiment analysis. Next, we present a recursive neural network over the RST structure, which offers significant improvements over classification-based methods.Comment: Published at Empirical Methods in Natural Language Processing (EMNLP 2015

    Neural Discourse Structure for Text Categorization

    Full text link
    We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization. Our approach uses a recursive neural network and a newly proposed attention mechanism to compute a representation of the text that focuses on salient content, from the perspective of both RST and the task. Experiments consider variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio
    • …
    corecore