9,998 research outputs found
Efficient classification of billions of points into complex geographic regions using hierarchical triangular mesh
We present a case study about the spatial indexing and regional
classification of billions of geographic coordinates from geo-tagged social
network data using Hierarchical Triangular Mesh (HTM) implemented for Microsoft
SQL Server. Due to the lack of certain features of the HTM library, we use it
in conjunction with the GIS functions of SQL Server to significantly increase
the efficiency of pre-filtering of spatial filter and join queries. For
example, we implemented a new algorithm to compute the HTM tessellation of
complex geographic regions and precomputed the intersections of HTM triangles
and geographic regions for faster false-positive filtering. With full control
over the index structure, HTM-based pre-filtering of simple containment
searches outperforms SQL Server spatial indices by a factor of ten and
HTM-based spatial joins run about a hundred times faster.Comment: appears in Proceedings of the 26th International Conference on
Scientific and Statistical Database Management (2014
Content-Aware DataGuides for Indexing Large Collections of XML Documents
XML is well-suited for modelling structured data with
textual content. However, most indexing approaches perform
structure and content matching independently, combining
the retrieved path and keyword occurrences in a third
step. This paper shows that retrieval in XML documents can
be accelerated significantly by processing text and structure
simultaneously during all retrieval phases. To this end,
the Content-Aware DataGuide (CADG) enhances the wellknown
DataGuide with (1) simultaneous keyword and path
matching and (2) a precomputed content/structure join. Extensive
experiments prove the CADG to be 50-90% faster
than the DataGuide for various sorts of query and document,
including difficult cases such as poorly structured
queries and recursive document paths. A new query classification
scheme identifies precise query characteristics with
a predominant influence on the performance of the individual
indices. The experiments show that the CADG is applicable
to many real-world applications, in particular large
collections of heterogeneously structured XML documents
An overview of selected information storage and retrieval issues in computerized document processing
The rapid development of computerized information storage and retrieval techniques has introduced the possibility of extending the word processing concept to document processing. A major advantage of computerized document processing is the relief of the tedious task of manual editing and composition usually encountered by traditional publishers through the immense speed and storage capacity of computers. Furthermore, computerized document processing provides an author with centralized control, the lack of which is a handicap of the traditional publishing operation. A survey of some computerized document processing techniques is presented with emphasis on related information storage and retrieval issues. String matching algorithms are considered central to document information storage and retrieval and are also discussed
Storing RDF as a Graph
RDF is the first W3C standard for enriching information resources of the Web with detailed meta data. The semantics of RDF data is defined using a RDF schema. The most expressive language for querying RDF is RQL, which enables querying of semantics. In order to support RQL, a RDF storage system has to map the RDF graph model onto its storage structure. Several storage systems for RDF data have been developed, which store the RDF data as triples in a relational database. To evaluate an RQL query on those triple structures, the graph model has to be rebuilt from the triples.
In this paper, we presented a new approach to store RDF data as a graph in a object-oriented database. Our approach avoids the costly rebuilding of the graph and efficiently queries the storage structure directly. The advantages of our approach have been shown by performance test on our prototype implementation OO-Store
A document management methodology based on similarity contents
The advent of the WWW and distributed information systems have made it possible to share documents between different users and organisations. However, this has created many problems related to the security, accessibility, right and most importantly the consistency of documents. It is important that the people involved in the documents management process have access to the most up-to-date version of documents, retrieve the correct documents and should be able to update the documents repository in such a way that his or her document are known to others. In this paper we propose a method for organising, storing and retrieving documents based on similarity contents. The method uses techniques based on information retrieval, document indexation and term extraction and indexing. This methodology is developed for the E-Cognos project which aims at developing tools for the management and sharing of documents in the construction domain
Expressing the GIVE event in Papuan languages: A preliminary survey
The linguistic expression of the GIVE event is investigated in a sample of 72 Papuan languages, 33 belonging to the Trans New Guinea family, 39 of various non-TNG lineages. Irrespective of the verbal template (prefix, suffix, or no indexation of undergoer), in the majority of languages the recipient is marked as the direct object of a monotransitive verb, which sometimes involves stem suppletion for the recipient. While a few languages allow verbal affixation for all three arguments, a number of languages challenge the universal claim that the `give' verb always has three arguments
- …