155,478 research outputs found

    A New Paradigm for Identifying Reconciliation-Scenario Altering Mutations Conferring Environmental Adaptation

    Get PDF
    An important goal in microbial computational genomics is to identify crucial events in the evolution of a gene that severely alter the duplication, loss and mobilization patterns of the gene within the genomes in which it disseminates. In this paper, we formalize this microbiological goal as a new pattern-matching problem in the domain of Gene tree and Species tree reconciliation, denoted "Reconciliation-Scenario Altering Mutation (RSAM) Discovery". We propose an O(m * n * k) time algorithm to solve this new problem, where m and n are the number of vertices of the input Gene tree and Species tree, respectively, and k is a user-specified parameter that bounds from above the number of optimal solutions of interest. The algorithm first constructs a hypergraph representing the k highest scoring reconciliation scenarios between the given Gene tree and Species tree, and then interrogates this hypergraph for subtrees matching a pre-specified RSAM Pattern. Our algorithm is optimal in the sense that the number of hypernodes in the hypergraph can be lower bounded by Omega(m * n * k). We implement the new algorithm as a tool, denoted RSAM-finder, and demonstrate its application to the identification of RSAMs in toxins and drug resistance elements across a dataset spanning hundreds of species

    Knowledge discovering for document classification using tree matching in Texpros

    Get PDF
    This dissertation describes a knowledge-based system for classifying documents based upon the layout structure and conceptual information extracted from the content of the document. The spatial elements in a document are laid out in rectangular blocks which are represented by nodes in an ordered labelled tree, called the layout structure tree (L-S Tree). Each leaf node of a L-S Tree points to its corresponding block content. A knowledge Acquisition Tool (KAT) is devised to create a Document Sample Tree from L-S Tree, in which each of its leaves contains a node content conceptually describing its corresponding block content. Then, applying generalization rules, the KAT performs the inductive learning from Document Sample Trees of a type and generates fewer number of Document Type Trees to represent its type. A testing document is classified if a Document Type Tree is discovered as a substructure of the L-S Tree of the testing document; and then the exact format of the testing document can be found by matching the L-S Tree with the Document Sample Trees of the classified document type. The Document Sample Trees and Document Type Trees are called Structural Knowledge Base (SKB). The tree discovering and matching processes involve computing the edit distance and the degree of conceptual closeness between the SKB trees and the L-S Tree of a testing document by using pattern matching and discovering toolkits. Our experimental results demonstrate that many office documents can be classified correctly using the proposed approach

    Transforming XML Documents using fxt

    Get PDF
    As XML spreads to various application domains, transformation tasks on XML documents are accomplished by an ever increasing number of non-programmers. In this respect, rather than providing just a collection of basic operations via a library in a special purpose language, it is useful to provide a more intuitive, rule-based approach to XML transformation. The rule-based approach requires pattern-matching for identifying parts of the document to be processed. As XML document processing is basically a subarea of tree processing for which the functional programming style is very natural, we choose SML as implementation language. The functional style implies a processing model in which navigation is possible only to subtrees of a tree. This restriction can be compensated by using a tree pattern-matcher able to relate to ancestors, successors, as well as to siblings of a match. On top of the powerful fxgrep XML pattern-matcher, we build fxt, a transformation tool for XML documents. The functional processing model that fxt uses, allows an implementation more efficient than implementations permitted by the processing model of the popular XSLT, where navigation in the input tree can proceed in arbitrary directions. Usual transformations are specified in fxt in an intuitive, declarative way. More elaborate transformations can be flexibly achieved by the hooks provided to the full functionality of the SML programming language, as well as by the fxt’s variable mechanism

    A graphical environment for change detection in structured documents

    Get PDF
    Change detection in structured documents (e.g. SGML is important in data warehousing, digital libraries and Internet databases. This thesis presents a graphical environment for detecting changes in the structured documents. We represent. each document by alp ordered labeled tree based on the underlying markup language. We then compare two documents by invoking previously developed algorithms for approximate pattern matching and pattern discovery in trees. Several operators are developed to support. the comparison of the documents; graphical devices are provided to facilitate the use of the operators. We believe the proposed tool is useful for not only document management, but also software maintenance, particularly configuration management and version control, where programs aro represented as parse trees and detecting changes in the trees provides a way to find the syntactic differences of two program versions

    Model Transformations with Tom

    Get PDF
    International audienceModel Driven Engineering (MDE) advocates the use of Model Transformations (MT) in order to automate repetitive development tasks. Many different model transformation languages have been proposed with a significant tool development cost as common language elements like expressions, statements, ... must be built from scratch for each new language development tools. The Tom language is a shallow extension of Java tailored to describe and implement transformations of tree based data-structures. A key feature of Tom allows to map any Java data-structure to tree based data abstractions that can then be accessed by powerful non-linear, associative, commutative pattern matching. In this paper, we present how this approach can be used in order to develop model transformations, in particular relying on Eclipse Modeling Framework (EMF) based metamodeling facilities. This allows to provide a transformation language at a low cost both for the development of its tools and the training of its users

    Simple algebraic data types for C

    Get PDF
    ADT is a simple tool in the spirit of Lex and Yacc that makes algebraic data types and a restricted form of pattern matching on those data types as found in SML available in C programs. ADT adds runtime checks, which make C programs written with the aid of ADT less likely to dereference a NULL pointer. The runtime tests may consume a significant amount of CPU time; hence they can be switched off once the program is suitably debugged

    Solving String Problems on Graphs Using the Labeled Direct Product

    Get PDF
    Suffix trees are an important data structure at the core of optimal solutions to many fundamental string problems, such as exact pattern matching, longest common substring, matching statistics, and longest repeated substring. Recent lines of research focused on extending some of these problems to vertex-labeled graphs, either by using efficient ad-hoc approaches which do not generalize to all input graphs, or by indexing difficult graphs and having worst-case exponential complexities. In the absence of an ubiquitous and polynomial tool like the suffix tree for labeled graphs, we introduce the labeled direct product of two graphs as a general tool for obtaining optimal algorithms in the worst case: we obtain conceptually simpler algorithms for the quadratic problems of string matching (SMLG) and longest common substring (LCSP) in labeled graphs. Our algorithms run in time linear in the size of the labeled product graph, which may be smaller than quadratic for some inputs, and their run-time is predictable, because the size of the labeled direct product graph can be precomputed efficiently. We also solve LCSP on graphs containing cycles, which was left as an open problem by Shimohira et al. in 2011. To show the power of the labeled product graph, we also apply it to solve the matching statistics (MSP) and the longest repeated string (LRSP) problems in labeled graphs. Moreover, we show that our (worst-case quadratic) algorithms are also optimal, conditioned on the Orthogonal Vectors Hypothesis. Finally, we complete the complexity picture around LRSP by studying it on undirected graphs.Peer reviewe

    FixMiner: Mining Relevant Fix Patterns for Automated Program Repair

    Get PDF
    Patching is a common activity in software development. It is generally performed on a source code base to address bugs or add new functionalities. In this context, given the recurrence of bugs across projects, the associated similar patches can be leveraged to extract generic fix actions. While the literature includes various approaches leveraging similarity among patches to guide program repair, these approaches often do not yield fix patterns that are tractable and reusable as actionable input to APR systems. In this paper, we propose a systematic and automated approach to mining relevant and actionable fix patterns based on an iterative clustering strategy applied to atomic changes within patches. The goal of FixMiner is thus to infer separate and reusable fix patterns that can be leveraged in other patch generation systems. Our technique, FixMiner, leverages Rich Edit Script which is a specialized tree structure of the edit scripts that captures the AST-level context of the code changes. FixMiner uses different tree representations of Rich Edit Scripts for each round of clustering to identify similar changes. These are abstract syntax trees, edit actions trees, and code context trees. We have evaluated FixMiner on thousands of software patches collected from open source projects. Preliminary results show that we are able to mine accurate patterns, efficiently exploiting change information in Rich Edit Scripts. We further integrated the mined patterns to an automated program repair prototype, PARFixMiner, with which we are able to correctly fix 26 bugs of the Defects4J benchmark. Beyond this quantitative performance, we show that the mined fix patterns are sufficiently relevant to produce patches with a high probability of correctness: 81% of PARFixMiner's generated plausible patches are correct.Comment: 31 pages, 11 figure
    • 

    corecore