Search CORE

15 research outputs found

Order preserving pattern matching on trees and DAGs

Author: A Amir
A Amir
I Simon
J Kim
K Park
M Dubiner
M Kubica
P Bose
RA Baeza-Yates
S Cho
S Faro
T Chhabra
Publication venue
Publication date: 25/07/2017
Field of study

The order preserving pattern matching (OPPM) problem is, given a pattern string

p

and a text string

t

, find all substrings of

t

which have the same relative orders as

p

. In this paper, we consider two variants of the OPPM problem where a set of text strings is given as a tree or a DAG. We show that the OPPM problem for a single pattern

p

of length

m

and a text tree

T

of size

N

can be solved in

O(m+N)

time if the characters of

p

are drawn from an integer alphabet of polynomial size. The time complexity becomes

O(m \log m + N)

if the pattern

p

is over a general ordered alphabet. We then show that the OPPM problem for a single pattern and a text DAG is NP-complete

arXiv.org e-Print Archive

Crossref

On Tree Pattern Matching by Pushdown Automata

Author: Flouri T.
Publication venue: 'Czech Technical University in Prague - Central Library'
Publication date: 02/01/2009
Field of study

Tree pattern matching is an important operation in Computer Science on which a number of tasks such as mechanical theorem proving, term-rewriting, symbolic computation and non-procedural programming languages are based on. Work has begun on a systematic approach to the construction of tree pattern matchers by deterministic pushdown automata which read subject trees in prefix notation. The method is analogous to the construction of string pattern matchers: for given patterns, a non-deterministic pushdown automaton is created and then it is determinised. In this first paper, we present the proposed non-deterministic pushdown automaton which will serve as a basis for the determinisation process, and prove its correctness.

Directory of Open Access Journals

CTU Open Journal Systems (Czech Technical University, Prague / České vysoké učení technické v Praze)

The Tree Inclusion Problem: In Linear Space and Faster

Author: Alstrup S.
Alstrup S.
Alstrup S.
Alstrup S.
Bender M. A.
Cole R.
Demaine E. D.
Ferragina P.
Inge Li Gortz
Muthukrishnan S.
Philip Bille
Schlieder T.
Termier A.
Yang L. H.
Zezula P.
Publication venue
Publication date: 01/01/2011
Field of study

Given two rooted, ordered, and labeled trees

P

and

T

the tree inclusion problem is to determine if

P

can be obtained from

T

by deleting nodes in

T

. This problem has recently been recognized as an important query primitive in XML databases. Kilpel\"ainen and Mannila [\emph{SIAM J. Comput. 1995}] presented the first polynomial time algorithm using quadratic time and space. Since then several improved results have been obtained for special cases when

P

and

T

have a small number of leaves or small depth. However, in the worst case these algorithms still use quadratic time and space. Let

n_S

l_S

, and

d_S

denote the number of nodes, the number of leaves, and the %maximum depth of a tree

S \in \{P, T\}

. In this paper we show that the tree inclusion problem can be solved in space

O(n_T)

and time: O(\min(l_Pn_T, l_Pl_T\log \log n_T + n_T, \frac{n_Pn_T}{\log n_T} + n_{T}\log n_{T})). This improves or matches the best known time complexities while using only linear space instead of quadratic. This is particularly important in practical applications, such as XML databases, where the space is likely to be a bottleneck.Comment: Minor updates from last tim

arXiv.org e-Print Archive

Crossref

Online Research Database In Technology

Parameterized Algorithms for String Matching to DAGs: Funnels and Beyond

Author
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 34th Annual Symposium on Combinatorial Pattern Matching (CPM 2023)
Publication date: 01/01/2023
Field of study

The problem of String Matching to Labeled Graphs (SMLG) asks to find all the paths in a labeled graph G = (V, E) whose spellings match that of an input string S ? ?^m. SMLG can be solved in quadratic O(m|E|) time [Amir et al., JALG 2000], which was proven to be optimal by a recent lower bound conditioned on SETH [Equi et al., ICALP 2019]. The lower bound states that no strongly subquadratic time algorithm exists, even if restricted to directed acyclic graphs (DAGs). In this work we present the first parameterized algorithms for SMLG on DAGs. Our parameters capture the topological structure of G. All our results are derived from a generalization of the Knuth-Morris-Pratt algorithm [Park and Kim, CPM 1995] optimized to work in time proportional to the number of prefix-incomparable matches. To obtain the parameterization in the topological structure of G, we first study a special class of DAGs called funnels [Millani et al., JCO 2020] and generalize them to k-funnels and the class ST_k. We present several novel characterizations and algorithmic contributions on both funnels and their generalizations

Dagstuhl Research Online Publication Server

A survey on tree matching and XML retrieval

Author: Aho
Al-Khalifa
Alilaouar
Amer-Yahia
Aouicha
Ayala
Bille
Bille
Botev
Bruno
Buneman
Burghardt
Cai
Campi
Ceri
Chamberlin
Chase
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cole
Cole
Cyril Laitang
Dalamagas
Dalamagas
Damiani
Damiani
Dao
de Vries
Demaine
Denoyer
Dubiner
Dulucq
Dürr
Hamamache Kheddouci
Haw
Haw
Hoffmann
Hubert
Hummel
Izadi
Jansson
Jiang
Jiang
Jiang
Kamps
Karen Pinel-Sauvagnat
Kazai
Kazai
Kilpelainen
Klein
Knuth
Kosaraju
Kuboyama
Laitang
Lalmas
Lalmas
Le
Lei Ning
Levenshtein
Levy
Li
Li
Li
Lu
Lu
Mass
Mihajlovic
Mohammed Amin Tahraoui
Mohand Boughanem
Ogilvie
Pehcevski
Pehcevski
Pinel-Sauvagnat
Piwowarski
Popovici
Qin
Rao
Richter
Robie
Runapongsa
Schenkel
Schenkel
Schlieder
Shasha
Stahl
Tai
Tekli
Theobald
Trotman
Trotman
Trotman
Trotman
Trotman
van Zwol
Wagner
Wang
Wang
Wang
Wang
Wu
Yang
Yao
Zezula
Zezula
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

HAL

Hal-Diderot

Security in Data Mining- A Comprehensive Survey

Author: Niranjan A
Nitish A
P Deepa Shenoy
Publication venue: Global Journals Inc. (US)
Publication date: 15/10/2016
Field of study

Data mining techniques, while allowing the individuals to extract hidden knowledge on one hand, introduce a number of privacy threats on the other hand. In this paper, we study some of these issues along with a detailed discussion on the applications of various data mining techniques for providing security. An efficient classification technique when used properly, would allow an user to differentiate between a phishing website and a normal website, to classify the users as normal users and criminals based on their activities on Social networks (Crime Profiling) and to prevent users from executing malicious codes by labelling them as malicious. The most important applications of Data mining is the detection of intrusions, where different Data mining techniques can be applied to effectively detect an intrusion and report in real time so that necessary actions are taken to thwart the attempts of the intruder. Privacy Preservation, Outlier Detection, Anomaly Detection and PhishingWebsite Classification are discussed in this paper

Global Journal of Computer Science and Technology (GJCST)

DETC2002/CIE-34462 WEB-BASED INNOVATION ALERT SERVICES TO SUPPORT PRODUCT DESIGN EVOLUTION

Author: Alexander J Lo
Changxin Xu
Edward Lin
Satyandra K Gupta
Publication venue
Publication date: 24/04/2020
Field of study

ABSTRACT Technological innovations provide an opportunity to improve product performance and reduce cost. Therefore, design organizations are interested in monitoring technological innovations. A large number of innovations are announced every year. Monitoring them manually is very time consuming. We are developing web-based innovation-alert services that can be used to monitor and communicate information about innovations relevant to a particular product design. In this paper, we discuss the required infrastructure, relevant design issues, and our approach to developing web-based innovation alert services to support product design evolution. We also describe a prototype innovation monitoring service for computer components and an interactive tool to transform semi-structured web contents into semantic representations in XML

CiteSeerX

Semi-automatic wrapper generation for semi-structured websites

Author: Zaccak Gabriel (Gabriel Gabra)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (leaves 69-74).Many information sources on the Web are semi-structured; hence there is an opportunity for automatic tools to process and extract their information for easy access through a uniform interface language. Wrapper generation is the creation of wrappers which contains scripts that extract and integrate data from data sources, mostly from Web data sources due to the large amount of data available on the World Wide Web. Despite ongoing efforts to automate the process of wrapper generation, wrappers frequently break due to formatting and layout changes in data sources. This thesis presents Wrapster, a new system that semi-automatically generates wrappers for semi-structured Web sources, improves wrapper robustness, and eliminates the need for programming skills and, to a large extent, the process of script creation. Wrapster's novel component is the repairing module that constantly checks if any wrapper script has failed and repairs the failing wrapper's script using stored extracted instances. In addition, Wrapster provides an interactive Web user interface to control the wrapper generation process, edit the generated wrappers, and test their scripts. Wrapster is being tested on the START Question Answering system; however, it is a generic tool to be used by any QA system that uses the Web as its knowledge base.by Gabriel Zaccak.S.M

DSpace@MIT

Security in Data Mining-A Comprehensive Survey

Author: Deepa Shenoy P.
Niranjan A.
Nitish A.
Venugopal K.R.
Publication venue
Publication date: 01/01/2016
Field of study

ePrints@Bangalore University