Search CORE

9,998 research outputs found

Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh

Author: Bodor András
Budavári Tamás
Csabai István
Dobos László
Kondor Dániel
Szalay Alexander S.
Vattay Gábor
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2014
Field of study

We present a case study about the spatial indexing and regional classification of billions of geographic coordinates from geo-tagged social network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft SQL Server. Due to the lack of certain features of the HTM library, we use it in conjunction with the GIS functions of SQL Server to significantly increase the efficiency of pre-filtering of spatial filter and join queries. For example, we implemented a new algorithm to compute the HTM tessellation of complex geographic regions and precomputed the intersections of HTM triangles and geographic regions for faster false-positive filtering. With full control over the index structure, HTM-based pre-filtering of simple containment searches outperforms SQL Server spatial indices by a factor of ten and HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on Scientific and Statistical Database Management (2014

arXiv.org e-Print Archive

Crossref

ELTE Digital Institutional Repository (EDIT)

Content-Aware DataGuides for Indexing Large Collections of XML Documents

Author: Bry François
Meuss Holger
Schulz Klaus U.
Weigel Felix
Publication venue
Publication date: 01/01/2003
Field of study

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

CiteSeerX

Open Access LMU

An overview of selected information storage and retrieval issues in computerized document processing

Author: Dominick Wayne D.
Ihebuzor Valentine U.
Publication venue
Publication date
Field of study

The rapid development of computerized information storage and retrieval techniques has introduced the possibility of extending the word processing concept to document processing. A major advantage of computerized document processing is the relief of the tedious task of manual editing and composition usually encountered by traditional publishers through the immense speed and storage capacity of computers. Furthermore, computerized document processing provides an author with centralized control, the lack of which is a handicap of the traditional publishing operation. A survey of some computerized document processing techniques is presented with emphasis on related information storage and retrieval issues. String matching algorithms are considered central to document information storage and retrieval and are also discussed

NASA Technical Reports Server

Storing RDF as a Graph

Author: Bönström Valerie
Hinze Annika
Schweppe Heinz
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

RDF is the first W3C standard for enriching information resources of the Web with detailed meta data. The semantics of RDF data is defined using a RDF schema. The most expressive language for querying RDF is RQL, which enables querying of semantics. In order to support RQL, a RDF storage system has to map the RDF graph model onto its storage structure. Several storage systems for RDF data have been developed, which store the RDF data as triples in a relational database. To evaluate an RQL query on those triple structures, the graph model has to be rebuilt from the triples. In this paper, we presented a new approach to store RDF data as a graph in a object-oriented database. Our approach avoids the costly rebuilding of the graph and efficiently queries the storage structure directly. The advantages of our approach have been shown by performance test on our prototype implementation OO-Store

Research Commons@Waikato

A document management methodology based on similarity contents

Author: Meziane F
Rezgui Y
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain

University of Salford Institutional Repository

Expressing the GIVE event in Papuan languages: A preliminary survey

Author: Reesink G.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2013
Field of study

The linguistic expression of the GIVE event is investigated in a sample of 72 Papuan languages, 33 belonging to the Trans New Guinea family, 39 of various non-TNG lineages. Irrespective of the verbal template (prefix, suffix, or no indexation of undergoer), in the majority of languages the recipient is marked as the direct object of a monotransitive verb, which sometimes involves stem suppletion for the recipient. While a few languages allow verbal affixation for all three arguments, a number of languages challenge the universal claim that the `give' verb always has three arguments

MPG.PuRe