Search CORE

65,555 research outputs found

Recommended from our members

Comparison of Clang Abstract Syntax Trees using string kernels

Author: Dolz Manuel F.
Kunkel Julian M.
Ludwig Thomas
Torres Raul
Publication venue
Publication date: 01/01/2018
Field of study

Abstract Syntax Trees (ASTs) are intermediate representations widely used by compiler frameworks. One of their strengths is that they can be used to determine the similarity among a collection of programs. In this paper we propose a novel comparison method that converts ASTs into weighted strings in order to get similarity matrices and quantify the level of correlation among codes. To evaluate the approach, we leveraged the corresponding strings derived from the Clang ASTs of a set of 100 source code examples written in C. Our kernel and two other string kernels from the literature were used to obtain similarity matrices among those examples. Next, we used Hierarchical Clustering to visualize the results. Our solution was able to identify different clusters conformed by examples that shared similar semantics. We demonstrated that the proposed strategy can be promisingly applied to similarity problems involving trees or strings

Central Archive at the University of Reading

Constituency and Dependency Relationship from a Tree Adjoining Grammar and Abstract Categorial Grammar Perspective

Author: Maskharashvili Aleksandre
Pogodalla Sylvain
Publication venue: HAL CCSD
Publication date: 14/10/2013
Field of study

International audienceThis paper gives an Abstract Categorial Grammar (ACG) account of (Kallmeyer and Kuhlmann, 2012)'s process of transformation of the derivation trees of Tree Adjoining Grammar (TAG) into dependency trees. We make explicit how the requirement of keeping a direct interpretation of dependency trees into strings results into lexical ambiguity. Since the ACG framework has already been used to provide a logical semantics from TAG derivation trees, we have a unified picture where derivation trees and dependency trees are related but independent equivalent ways to account for the same surface--meaning relation

INRIA a CCSD electronic archive server

Succinct Dictionary Matching With No Slowdown

Author: A.V. Aho
J.I. Munro
K. Sadakane
P. Elias
R.M. Fano
S. Dori
W.-K. Hon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The problem of dictionary matching is a classical problem in string matching: given a set S of d strings of total length n characters over an (not necessarily constant) alphabet of size sigma, build a data structure so that we can match in a any text T all occurrences of strings belonging to S. The classical solution for this problem is the Aho-Corasick automaton which finds all occ occurrences in a text T in time O(|T| + occ) using a data structure that occupies O(m log m) bits of space where m <= n + 1 is the number of states in the automaton. In this paper we show that the Aho-Corasick automaton can be represented in just m(log sigma + O(1)) + O(d log(n/d)) bits of space while still maintaining the ability to answer to queries in O(|T| + occ) time. To the best of our knowledge, the currently fastest succinct data structure for the dictionary matching problem uses space O(n log sigma) while answering queries in O(|T|log log n + occ) time. In this paper we also show how the space occupancy can be reduced to m(H0 + O(1)) + O(d log(n/d)) where H0 is the empirical entropy of the characters appearing in the trie representation of the set S, provided that sigma < m^epsilon for any constant 0 < epsilon < 1. The query time remains unchanged.Comment: Corrected typos and other minor error

arXiv.org e-Print Archive

CiteSeerX

Crossref

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Crossref

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

FURY: Fuzzy unification and resolution based on edit distance

Author: Gilbert D
Schroeder M
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2000
Field of study

We present a theoretically founded framework for fuzzy unification and resolution based on edit distance over trees. Our framework extends classical unification and resolution conservatively. We prove important properties of the framework and develop the FURY system, which implements the framework efficiently using dynamic programming. We evaluate the framework and system on a large problem in the bioinformatics domain, that of detecting typographical errors in an enzyme name databas

CiteSeerX

Brunel University Research Archive