Search CORE

27 research outputs found

Biochemical network matching and composition

Author: Goodfellow Martin
Hunt Ela
Wilson John
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2010
Field of study

This paper looks at biochemical network matching and compositio

Crossref

University of Strathclyde Institutional Repository

Tree mining application to matching of hetereogeneous knowledge

Author: Chang Elizabeth
Dillon Tharam S.
Hadzic Fedja
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Matching of heterogeneous knowledge sources is of increasing importance in areas such as scientific knowledge management, e-commerce, enterprise application integration, and many emerging Semantic Web applications. With the desire of knowledge sharing and reuse in these fields, it is common that the knowledge coming from different organizations from the same domain is to be matched. We propose a knowledge matching method based on our previously developed tree mining algorithms for extracting frequently occurring subtrees from a tree structured database such as XML. Using the method the common structure among the different representations can be automatically extracted. Our focus is on knowledge matching at the structural level and we use a set of example XML schema documents from the same domain to evaluate the method. We discuss some important issues that arise when applying tree mining algorithms for detection of common document structures. The experiments demonstrate the usefulness of the approach

espace@Curtin

Comparing similar ordered trees in linear-time

Author: Touzet Hélène
Publication venue: Elsevier B.V.
Publication date: 31/12/2007
Field of study

AbstractWe describe a linear-time algorithm for comparing two similar ordered rooted trees with node labels. The method for comparing trees is the usual tree edit distance. We show that an optimal mapping that uses at most k insertions or deletions can then be constructed in O(nk3) where n is the size of the trees. The approach is inspired by the Zhang–Shasha algorithm for tree edit distance in combination with an adequate pruning of the search space based on the tree edit graph

Elsevier - Publisher Connector

Mechanism for Change Detection in HTML Web Pages as XML Documents

Author: Tõnisson Kaarel
Publication venue
Publication date: 01/01/2015
Field of study

Veebilehtede muudatuste tuvastamine on oluline osa veebi monitoorimisest. Veebi automaatset monitoorimist saab kasutada spetsiiflise informatsiooni kogumiseks, näiteks avalike teadaannete, uudiste või hinnamuutuste automaatseks märkamiseks. Kui lehe HTML-kood talletada, on võimalik seda lehte uuesti külastades uut ja eelnevat koodi võrrelda ning nendevahelised erinevused leida. HTML-koode saab võrrelda tavateksti võrdlemise meetodite abil, kuid sel juhul riskime lehe struktuuri kohta käiva informatsiooni kaotamisega. HTML-kood on struktuurilt puulaadne ja selle omaduse säilitamine muudatuste tuvastamisel on soovitav. Selles töös kirjeldame mehhanismi, millega eelnevalt kogutud HTML-koodis lehed teisendatakse XML dokumentide kujule ning võrreldakse neid XML puudena. Me kirjeldame selle ülesande täitmiseks vajalikke komponente ja oma teostust, mis kasutab NutchWAX-i, NekoHTML-i, XMLUnit-it, Jena-t ja MongoDBd. Me analüüsime mõõtmistulemusi, mis koguti selle programmiga 1,1 miljoni HTML lehe läbimisel. Meile teadaolevatel andmetel pole sellist mehhanismi varem rakendatud. Me näitame, et mehhanism on kasutatav tegelikkuses esinevate andmete töötlemiseks.Change detection of web pages is an important aspect of web monitoring. Automated web monitoring can be used for the collection of specifc information, for example for detecting public announcements, news posts and changes of prices. If we store the HTML code of a page, we can compare the current and previous codes when we revisit the page, allowing us to find their changes. HTML code can be compared using ordinary text comparison, but this brings the risk of losing information about the structure of the page. HTML code is treelike in structure and it is a desirable property to preserve when finding changes. In this work we describe a mechanism that can be applied to collected HTML pages to find their changes by transforming HTML pages into XML documents and comparing the resulting XML trees. We give a general list of the components needed for this task, describe our implementation which uses NutchWAX, NekoHTML, XMLUnit, Jena and MongoDB, and show the results of applying the program to a dataset. We analyse the results of measurements collected when running our program on 1.1 million HTML pages. To our knowledge this mechanism has not been tested in previous works. We show that the mechanism is usable on real world data

DSpace at Tartu University Library

Archiving scientific data

Author: Altinel M.
Avilla-Campillo I.
Buneman P.
Chawathe S. S.
Chawathe S. S.
Chien S.
Cobena G.
Diao Y.
Keishi Tajima
Marian A.
Peter Buneman
Sanjeev Khanna
Schmidt A. R.
Tufte K.
Wang-Chiew Tan
Publication venue
Publication date: 01/01/2002
Field of study

We present an archiving technique for hierarchical data with key structure. Our approach is based on the notion of timestamps whereby an element appearing in multiple versions of the database is stored only once along with a compact description of versions in which it appears. The basic idea of timestamping was discovered by Driscoll et. al. in the context of persistent data structures where one wishes to track the sequences of changes made to a data structure. We extend this idea to develop an archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieval of any specific version from the archive and querying the temporal history of any element. This is in contrast to diff-based approaches where such operations may require undoing a large number of changes or significant reasoning with the deltas. Surprisingly, our archiving technique does not incur any significant space overhead when contrasted with other approaches. Our experimental results support this and also show that the compacted archive file interacts well with other compression techniques. Finally, another useful property of our approach is that the resulting archive is also in XML and hence can directly leverage existing XML tools

CiteSeerX

Crossref

Edinburgh Research Explorer

ScholarlyCommons@Penn

Edit Distance between Unrooted Trees in Cubic Time

Author: Dudek Bartlomiej
Gawrychowski Pawel
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 45th International Colloquium on Automata, Languages, and Programming (ICALP 2018)
Publication date: 01/01/2018
Field of study

Edit distance between trees is a natural generalization of the classical edit distance between strings, in which the allowed elementary operations are contraction, uncontraction and relabeling of an edge. Demaine et al. [ACM Trans. on Algorithms, 6(1), 2009] showed how to compute the edit distance between rooted trees on n nodes in O(n^3) time. However, generalizing their method to unrooted trees seems quite problematic, and the most efficient known solution remains to be the previous O(n^3 log n) time algorithm by Klein [ESA 1998]. Given the lack of progress on improving this complexity, it might appear that unrooted trees are simply more difficult than rooted trees. We show that this is, in fact, not the case, and edit distance between unrooted trees on n nodes can be computed in O(n^3) time. A significantly faster solution is unlikely to exist, as Bringmann et al. [SODA 2018] proved that the complexity of computing the edit distance between rooted trees cannot be decreased to O(n^{3-epsilon}) unless some popular conjecture fails, and the lower bound easily extends to unrooted trees. We also show that for two unrooted trees of size m and n, where m <=n, our algorithm can be modified to run in O(nm^2(1+log(n/m))). This, again, matches the complexity achieved by Demaine et al. for rooted trees, who also showed that this is optimal if we restrict ourselves to the so-called decomposition algorithms

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Subcubic Algorithm for (Unweighted) Unrooted Tree Edit Distance

Author
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study

Dagstuhl Research Online Publication Server

Decomposition algorithms for the tree edit distance problem

Author: Dulucq Serge
Touzet Hélène
Publication venue: Elsevier B.V.
Publication date: 01/01/2005
Field of study

AbstractWe study the behavior of dynamic programming methods for the tree edit distance problem, such as [P. Klein, Computing the edit-distance between unrooted ordered trees, in: Proceedings of 6th European Symposium on Algorithms, 1998, p. 91–102; K. Zhang, D. Shasha, SIAM J. Comput. 18 (6) (1989) 1245–1262]. We show that those two algorithms may be described as decomposition strategies. We introduce the general framework of cover strategies, and we provide an exact characterization of the complexity of cover strategies. This analysis allows us to define a new tree edit distance algorithm, that is optimal for cover strategies

Elsevier - Publisher Connector

Hal-Diderot

Subcubic algorithm for (Unweighted) Unrooted Tree Edit Distance

Author: Pióro Krzysztof
Publication venue
Publication date: 17/04/2023
Field of study

The tree edit distance problem is a natural generalization of the classic string edit distance problem. Given two ordered, edge-labeled trees

T_1

and

T_2

, the edit distance between

T_1

and

T_2

is defined as the minimum total cost of operations that transform

T_1

into

T_2

. In one operation, we can contract an edge, split a vertex into two or change the label of an edge. For the weighted version of the problem, where the cost of each operation depends on the type of the operation and the label on the edge involved,

\mathcal{O}(n^3)

time algorithms are known for both rooted and unrooted trees. The existence of a truly subcubic

\mathcal{O}(n^{3-\epsilon})

time algorithm is unlikely, as it would imply a truly subcubic algorithm for the APSP problem. However, recently Mao (FOCS'21) showed that if we assume that each operation has a unit cost, then the tree edit distance between two rooted trees can be computed in truly subcubic time. In this paper, we show how to adapt Mao's algorithm to make it work for unrooted trees and we show an

\widetilde{\mathcal{O}}(n^{(7\omega + 15)/(2\omega + 6)}) \leq \mathcal{O}(n^{2.9417})

time algorithm for the unweighted tree edit distance between two unrooted trees, where

\omega \leq 2.373

is the matrix multiplication exponent. It is the first known subcubic algorithm for unrooted trees. The main idea behind our algorithm is the fact that to compute the tree edit distance between two unrooted trees, it is enough to compute the tree edit distance between an arbitrary rooting of the first tree and every rooting of the second tree.Comment: 20 page

arXiv.org e-Print Archive