Search CORE

1,031 research outputs found

The XQueC Project: Compressing and Querying XML

Author: Arion Andrei
Bonifati Angela
Manolescu Ioana
Pugliese Andrea
Publication venue: Dagstuhl Seminar Proceedings. 08261 - Structure-Based Compression of Complex Massive Data
Publication date: 01/01/2008
Field of study

Dagstuhl Research Online Publication Server

An efficient and scalable algorithm for clustering XML documents by structure

Author: D.W.-l. Cheung
N. Mamoulis
Siu-Ming Yiu
Wang Lian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

DescribeX: A Framework for Exploring and Querying XML Web Collections

Author: Rizzolo Flavio
Publication venue
Publication date: 01/01/2008
Field of study

This thesis introduces DescribeX, a powerful framework that is capable of describing arbitrarily complex XML summaries of web collections, providing support for more efficient evaluation of XPath workloads. DescribeX permits the declarative description of document structure using all axes and language constructs in XPath, and generalizes many of the XML indexing and summarization approaches in the literature. DescribeX supports the construction of heterogeneous summaries where different document elements sharing a common structure can be declaratively defined and refined by means of path regular expressions on axes, or axis path regular expression (AxPREs). DescribeX can significantly help in the understanding of both the structure of complex, heterogeneous XML collections and the behaviour of XPath queries evaluated on them. Experimental results demonstrate the scalability of DescribeX summary refinements and stabilizations (the key enablers for tailoring summaries) with multi-gigabyte web collections. A comparative study suggests that using a DescribeX summary created from a given workload can produce query evaluation times orders of magnitude better than using existing summaries. DescribeX's light-weight approach of combining summaries with a file-at-a-time XPath processor can be a very competitive alternative, in terms of performance, to conventional fully-fledged XML query engines that provide DB-like functionality such as security, transaction processing, and native storage.Comment: PhD thesis, University of Toronto, 2008, 163 page

arXiv.org e-Print Archive

CiteSeerX

CERN Document Server

WAQS : a web-based approximate query system

Author: Chang George Jyh-Shian
Publication venue: Digital Commons @ NJIT
Publication date: 31/05/2001
Field of study

The Web is often viewed as a gigantic database holding vast stores of information and provides ubiquitous accessibility to end-users. Since its inception, the Internet has experienced explosive growth both in the number of users and the amount of content available on it. However, searching for information on the Web has become increasingly difficult. Although query languages have long been part of database management systems, the standard query language being the Structural Query Language is not suitable for the Web content retrieval. In this dissertation, a new technique for document retrieval on the Web is presented. This technique is designed to allow a detailed retrieval and hence reduce the amount of matches returned by typical search engines. The main objective of this technique is to allow the query to be based on not just keywords but also the location of the keywords within the logical structure of a document. In addition, the technique also provides approximate search capabilities based on the notion of Distance and Variable Length Don\u27t Cares. The proposed techniques have been implemented in a system, called Web-Based Approximate Query System, which contains an SQL-like query language called Web-Based Approximate Query Language. Web-Based Approximate Query Language has also been integrated with EnviroDaemon, an environmental domain specific search engine. It provides EnviroDaemon with more detailed searching capabilities than just keyword-based search. Implementation details, technical results and future work are presented in this dissertation

Digital Commons @ New Jersey Institute of Technology (NJIT)

A survey on tree matching and XML retrieval

Author: Aho
Al-Khalifa
Alilaouar
Amer-Yahia
Aouicha
Ayala
Bille
Bille
Botev
Bruno
Buneman
Burghardt
Cai
Campi
Ceri
Chamberlin
Chase
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cole
Cole
Cyril Laitang
Dalamagas
Dalamagas
Damiani
Damiani
Dao
de Vries
Demaine
Denoyer
Dubiner
Dulucq
Dürr
Hamamache Kheddouci
Haw
Haw
Hoffmann
Hubert
Hummel
Izadi
Jansson
Jiang
Jiang
Jiang
Kamps
Karen Pinel-Sauvagnat
Kazai
Kazai
Kilpelainen
Klein
Knuth
Kosaraju
Kuboyama
Laitang
Lalmas
Lalmas
Le
Lei Ning
Levenshtein
Levy
Li
Li
Li
Lu
Lu
Mass
Mihajlovic
Mohammed Amin Tahraoui
Mohand Boughanem
Ogilvie
Pehcevski
Pehcevski
Pinel-Sauvagnat
Piwowarski
Popovici
Qin
Rao
Richter
Robie
Runapongsa
Schenkel
Schenkel
Schlieder
Shasha
Stahl
Tai
Tekli
Theobald
Trotman
Trotman
Trotman
Trotman
Trotman
van Zwol
Wagner
Wang
Wang
Wang
Wang
Wu
Yang
Yao
Zezula
Zezula
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

Hal-Diderot

Exploiting Latent Features of Text and Graphs

Author: Sybrandt Justin George
Publication venue: Clemson University Libraries
Publication date: 01/05/2020
Field of study

As the size and scope of online data continues to grow, new machine learning techniques become necessary to best capitalize on the wealth of available information. However, the models that help convert data into knowledge require nontrivial processes to make sense of large collections of text and massive online graphs. In both scenarios, modern machine learning pipelines produce embeddings --- semantically rich vectors of latent features --- to convert human constructs for machine understanding. In this dissertation we focus on information available within biomedical science, including human-written abstracts of scientific papers, as well as machine-generated graphs of biomedical entity relationships. We present the Moliere system, and our method for identifying new discoveries through the use of natural language processing and graph mining algorithms. We propose heuristically-based ranking criteria to augment Moliere, and leverage this ranking to identify a new gene-treatment target for HIV-associated Neurodegenerative Disorders. We additionally focus on the latent features of graphs, and propose a new bipartite graph embedding technique. Using our graph embedding, we advance the state-of-the-art in hypergraph partitioning quality. Having newfound intuition of graph embeddings, we present Agatha, a deep-learning approach to hypothesis generation. This system learns a data-driven ranking criteria derived from the embeddings of our large proposed biomedical semantic graph. To produce human-readable results, we additionally propose CBAG, a technique for conditional biomedical abstract generation

Clemson University: TigerPrints

A node partitioning strategy for optimising the performance of XML queries

Author: Marks Gerard
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2011
Field of study

For ease of communication between heterogeneous systems, the eXtensible Markup Language (XML) has been widely adopted as a data storage format. However, XML query processing presents issues both in terms of query performance and updatability. Thus, many are choosing to shred XML data into relational databases in order to benet from its mature technology. The problem with this approach is that (often complex and time consuming) data transformation processes are required to transform XML data to relational tables and vice versa. Additionally, many of the benets of XML data can be lost during these processes. In this dissertation, we present a process that partitions nodes within an XML document into disjoint subsets. Briefly, as there are fewer partitions than there are nodes, a more efficient join operation can be performed between partitions, thus reducing the number of inefficient node comparisons. The number and size of partitions varies depending on the structure and layout in the XML document, and the number of partitions impacts query performance. Therefore, we also provide a partition classication process, which signicantly reduces the number of partitions because each partition class represents many equivalent partitions within the XML document. In this dissertation, we will demonstrate that our approach outperforms similar approaches for a large subset of XML queries by eliminating complex join operations (where possible) during the query process

DCU Online Research Access Service

Web Queries: From a Web of Data to a Semantic Web?

Author: Bry François
Furche Tim
Vossen Gottfried
Weiand Klara
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Open Access LMU

POLIS: a probabilistic summarisation logic for structured documents

Author: Forst Jan Frederik
Publication venue
Publication date: 01/01/2009
Field of study

PhDAs the availability of structured documents, formatted in markup languages such as SGML, RDF, or XML, increases, retrieval systems increasingly focus on the retrieval of document-elements, rather than entire documents. Additionally, abstraction layers in the form of formalised retrieval logics have allowed developers to include search facilities into numerous applications, without the need of having detailed knowledge of retrieval models. Although automatic document summarisation has been recognised as a useful tool for reducing the workload of information system users, very few such abstraction layers have been developed for the task of automatic document summarisation. This thesis describes the development of an abstraction logic for summarisation, called POLIS, which provides users (such as developers or knowledge engineers) with a high-level access to summarisation facilities. Furthermore, POLIS allows users to exploit the hierarchical information provided by structured documents. The development of POLIS is carried out in a step-by-step way. We start by defining a series of probabilistic summarisation models, which provide weights to document-elements at a user selected level. These summarisation models are those accessible through POLIS. The formal definition of POLIS is performed in three steps. We start by providing a syntax for POLIS, through which users/knowledge engineers interact with the logic. This is followed by a definition of the logics semantics. Finally, we provide details of an implementation of POLIS. The final chapters of this dissertation are concerned with the evaluation of POLIS, which is conducted in two stages. Firstly, we evaluate the performance of the summarisation models by applying POLIS to two test collections, the DUC AQUAINT corpus, and the INEX IEEE corpus. This is followed by application scenarios for POLIS, in which we discuss how POLIS can be used in specific IR tasks

CiteSeerX

Queen Mary Research Online