10 research outputs found

    空間Webデータにおけるm-最近接キーワード検索問題のトップダウン解法に関する研究

    Get PDF
    This thesis addresses the problem of m-closest keywords queries (mCK queries) over spatial web objects that contain descriptive texts and spatial information. The mCK query is a problem to find the optimal set of records in the sense that they are the spatially-closest records that satisfy m user-given keywords in their texts. The mCK query can be widely used in various applications to find the place of user’s interest. Generally, top-down search techniques using tree-style data structures are appropriate for finding optimal results of queries over spatial datasets. Thus in order to solve the mCK query problem, a previous study of NUS group assumed a specialized R*-tree (called bR*-tree) to store all records and proposed a top-down approach which uses an Apriori-based node-set enumeration in top-down process. However this assumption of prepared bR*-tree is not applicable to practical spatial web datasets, and the pruning ability of Apriori-based enumeration is highly dependent on the data distribution. In this thesis, we do not expect any prepared data-partitioning, but assume that we create a grid partitioning from necessary data only when an mCK query is given. Under this assumption, we propose a new search strategy termed Diameter Candidate Check (DCC), which can find a smaller node-set at an earlier stage of search so that it can reduce search space more efficiently. According to DCC search strategy, we firstly employ an implementation of DCC strategy in a nested loop search algorithm (called DCC-NL). Next, we improve the DCC-NL in a recursive way (called RDCC). RDCC can afford a more reasonable priority order of node-set enumeration. We also uses a tight lower bound to improve pruning ability in RDCC. RDCC performs well in a wide variey of data distributions, but it has still deficiency when one data-point has many query keywords and numerous node-sets are generated. Hence in order to avoid the generation of node-sets which is an unstable factor of search efficiency, we propose another different top-down search approach called Pairwise Expansion. Finally, we discuss some optimization techniques to enhance Pairwise Expansion approach. We first discuss the index structure in the Pairwise Expansion approach, and try to use an on-the-fly kd-tree to reduce building cost in the query process. Also a new lower bound and an upper bound are employed for more powerful pruning in Pairwise Expansion. We evaluate these approaches by using both real datasets and synthetic datasets for different data distributions, including 1.6 million of Flickr photo data. The result shows that DCC strategy can provide more stable search performance than the Apriori-based approach. And the Pairwise Expansion approach enhanced with lower/upper bounds, has more advantages over those algorithms having node-set generation, and is applicable for real spatial web data.電気通信大学201

    Maximizing Bichromatic Reverse Spatial and Textual k Nearest Neighbor Queries

    No full text
    ABSTRACT The problem of maximizing bichromatic reverse k nearest neighbor queries (BRkNN) has been extensively studied in spatial databases. In this work, we present a related query for spatial-textual databases that finds an optimal location, and a set of keywords that maximizes the size of bichromatic reverse spatial textual k nearest neighbors (MaxBRSTkNN). Such a query has many practical applications including social media advertisements where a limited number of relevant advertisements are displayed to each user. The problem is to find the location and the text contents to include in an advertisement so that it will be displayed to the maximum number of users. The increasing availability of spatial-textual collections allows us to answer these queries for both spatial proximity and textual similarity. This paper is the first to consider the MaxBRSTkNN query. We show that the problem is NP-hard and present both approximate and exact solutions

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy
    corecore