11 research outputs found

    Grade And Exact In Order Of Textual Substance

    Get PDF
    Ranking and returning the most relevant results for a question is probably the most popular form of XML query processing. To resolve this issue, we first suggest an elegant framework for query relaxation processes to support difficult XML queries. The solutions on which this framework is based are not required, however, to satisfy the precisely defined query syntax, as they can be based on the qualities that can be deduced in the initial query. It does not have the power to elegantly combine structures and content to answer comfortable questions. In our solution, we classify nodes into two groups: categorical nodes and statistical nodes and pattern-based approaches in assessing the similarity relationship of categorical nodes and statistical nodes. We continue to use a comprehensive set of experiences to demonstrate the effectiveness of our proposed approach to the accuracy and recovery of values. Querying XML data often becomes difficult in practical applications because the hierarchical structure of XML documents can be heterogeneous, so any slight misunderstanding of the document structure can certainly increase the risk of unsatisfactory queries. This is very difficult, especially given that such queries produce empty solutions, even if there are no translation errors. In addition, we design a non-periodic evidence-based vector diagram to create and adjust the weakening of the structure and develop an inefficient evaluation parameter to evaluate the similarity relationship on structures. So, we design a new approach to take the highest k that can intelligently create the most promising solutions in a linked order using the ranking scale

    The Structural Multiple and Information Satisfied Mixture of XML

    Get PDF
    Perhaps the order of the most relevant results for the question and return to the most common form of XML query processing. To solve this problem, we first propose an elegant query release framework that supports approximate XML data queries. The solutions that underpin this framework are not forced to strictly conform to the specified query format, but may be based on attributes that cannot be inferred in the original query. However, the current proposals do not take sufficient account of structures, nor do they have the power to combine structures and content neatly to answer relaxation questions. Within our solution we divide nodes into two groups: categorization attribute contracts and statistical attribute nodes. We continue to use a comprehensive set of experience to demonstrate the effectiveness of our proposed approach in terms of accuracy and the restoration of benchmarks. In practical applications, it is often impossible to query XML data because the hierarchical structure of XML documents can be heterogeneous, so any misunderstanding of the document structure can certainly increase the risk of formulating unsatisfactory queries. This is really difficult, especially given the fact that such queries lead to empty solutions, although there are no translation errors. In addition, we propose an evidence-based acyclic graph that generates and regulates the relaxation of the structure and develops an inefficient assessment coefficient to evaluate the relationship of structure similarity. We are therefore developing a new top-to-search approach that can intelligently create promising solutions in a ranking-related order

    Materialized View Selection in XML Databases

    Get PDF
    Materialized views, a rdbms silver bullet, demonstrate its efficacy in many applications, especially as a data warehousing/decison support system tool. The pivot of playing materialized views efficiently is view selection. Though studied for over thirty years in rdbms, the selection is hard to make in the context of xml databases, where both the semi-structured data and the expressiveness of xml query languages add challenges to the view selection problem. We start our discussion on producing minimal xml views (in terms of size) as candidates for a given workload (a query set). To facilitate intuitionistic view selection, we present a view graph (called vcube) to structurally maintain all generated views. By basing our selection on vcube for materialization, we propose two view selection strategies, targeting at space-optimized and space-time tradeoff, respectively. We built our implementation on top of Berkeley DB XML, demonstrating that significant performance improvement could be obtained using our proposed approaches

    ARRANGE AND EXTRACT ACCURATE INFORMATION ABOUT XML CONTENT

    Get PDF
    Order and Return The most relevant results may be the most common form of XML query processing. To work around this problem, we first suggest an elegant query framework to support rough queries across XML data. The solutions based on this framework do not have to accurately fulfill the wording of the query but may be based on attributes that can be inferred in the original query. However, the current proposals do not take the structures into account adequately, in addition they do not have the power to combine structures and contents neatly to answer relaxation queries. Within our solution, we classify the contract into two groups: class attribute points, statistical attribute points, and pattern of related methods in relation to similarity ratings for holding the class attribute and statistical attribute points. We continue to benefit from a comprehensive set of experiments to demonstrate the effectiveness of our proposed approach when it comes to accuracy and recall metrics. XML data cannot be queried in practical applications, because the hierarchical structure of XML documents may be heterogeneous, or any slight misunderstanding of the structure of the document can certainly increase the risk of unsatisfactory query formulation. This is really difficult, especially given the fact that such inquiries give empty solutions, although they are not aggregative errors. In addition, we design a polygonal diagram based on an idea to create and regulate the relaxation of the structure and develop an inefficient evaluation coefficient to assess the relative relationship to structures. We therefore create a new retrieval approach from top k that can intelligently create promising solutions in a contextual arrangement using the order scale

    グラフ上の分割問題と被覆問題:計算量解析とアルゴリズム設計

    Get PDF
    This dissertation studies four combinatorial optimization problems on graphs: (1) Minimum Block Transfer problem (MBT for short), (2) Maximum k-Path Vertex Cover problem (MaxPkVC for short), (3) k-Path Vertex Cover Reconfiguration problem (k- PVCR for short), and (4) Minimum (Maximum) Weighted Path Cover problem (MinPC (MaxPC) for short). This dissertation provides hardness results, such as NP-hardness and inapproximabilities, and polynomial-time algorithms for each problem. In Chapter 2, we study MBT. Let G = (V, A) be a simple directed acyclic graph, i.e., G does not include any cycles, any multiple arcs, or any self-loops, with a node set V and an arc set A. Given a DAG G and a block size B, the objective of MBT is to find a partition of its node set such that it satisfies the following two conditions: (i) Each element (called a block) of the partition has a size which is at most B, and (ii) the maximum number of external arcs among directed paths from the roots to the leaves is minimized. The number of external arcs is defined as the number of arcs connecting two distinct blocks, that is, the number denotes the number of block transfers. The height of a DAG is defined as the length of the longest directed paths from its roots to the leaves. Let us consider the two-level I/O model for data transfers between an external memory with a large space and an internal memory with a limited space. Assume that the external memory is divided into fixed contiguous blocks of size B, and one query or modification transfers one block of B objects from the external memory to the internal one. Then, with our MBT problem, we can consider the efficient way to store data in the external memory such that the maximum number of data transfers between the external memory and the internal one is minimized. We first revisit the previous, naive bottom-up packing algorithm for MBT and show that its approximation ratio is 2 if B = 2. Additionally, we show that the approximation ratio of that algorithm is at least B if B gets larger. Next, we explicitly show that MBT is NP-hard even if each block size B is at most two and the height of DAGs is three, and maximum indegree and outdegree of a node are two and three, respectively. Our proof of the NP-hardness also shows that, if B = 2 and P 6= NP, MBT does not admit any polynomial-time (3=2 - ε)- approximation ((4/3 - ε)-approximation, resp.) algorithm for any ε > 0 even if the input is restricted to DAGs of height at most five (at least six, resp.). Fortunately, however, we can obtain a linear time exact algorithm if the height of DAGs is bounded above by two. Also, for MBT with B = 2, we provide the following linear-time algorithms: A simple 2-approximation algorithm and improved (2 - ε)-approximation algorithms, where ε = 2/h and ε = 2/(h + 1) for the case where the height of the input DAGs is even and odd, respectively. If h = 3, the last algorithm achieves a 3/2-approximation ratio, matching the inapproximability. In Chapter 3, we study MaxPkVC. Let G = (V, E) be a simple undirected graph, where V and E denote the set of vertices and the set of edges, respectively. A path of length k - 1 is called a k-path. If a k-path Pk contains a vertex v in a vertex set S, then we say that the vertex v or the set S covers Pk. Given a graph G and an integer s, the goal of MaxPkVC is to find a vertex subset S of size at most s such that the number of k-paths covered by S is maximized. Given a graph G, MinPkVC problem, a minimization version of MaxPkVC, is to find a minimum vertex subset of G such that it covers all the k-paths of G. A great focus has been on MinPkVC since it was introduced in 2011, and it is known that MinPkVC has an application for maintaining the security of a network. MinVC is a classical, very famous problem in this field such that it seeks to find a minimum vertex subset to cover all the 2-paths, i.e., the edges of the graph. Also, its maximization version, MaxVC, is well studied. One can see that MaxPkVC is a generalized problem of MaxVC since MaxVC is a special case of MaxPkVC, in the case where k = 2. MaxPkVC, for example, has an application when we would like to cover as many areas as possible with a restricted amount of budget. First, we show that MaxP3VC (MaxP4VC, resp.) is NP-hard on split graphs (chordal graphs, resp.). Then, we show that MaxP3VC is in FPT with respect to the combined parameter s + tw, where s and tw are the prescribed size of 3-path vertex cover and treewidth parameter, respectively. Treewidth is a well-known graph parameter, and it defines a tree-likeness of a graph; see Chapter 3. Our algorithm runs in O((s + 1)2tw+4 ・ 4tw・n)-time, where |V| = n. In Chapter 4, we discuss k-PVCR. Let G = (V, E) be a simple graph. In a reconfiguration setting, two feasible solutions of a computational problem are given, along with a reconfiguration rule that describes an adjacency relation between solutions. A reconfiguration problem asks if one feasible solution can be transformed into the other via a sequence of adjacent feasible solutions where each intermediate member is obtained from its predecessor by applying the given reconfiguration rule exactly once. Such a sequence is called a reconfiguration sequence, if it exists. For any fixed integers k ≥ 2, given two distinct k-path vertex covers I and J of a graph G and a single reconfiguration rule, the goal of k-PVCR is to determine if there is a reconfiguration sequence between I and J. For the reconfiguration rule, we consider the following three well-known rules: Token Sliding (TS), Token Jumping (TJ), and Token Addition or Removal (TAR). For the precise descriptions of each rule, refer to Chapter 4. The reconfiguration variant of MinVC (called VCR) has been well studied; the goal of our study is to find the difference between VCR and k-PVCR, such as the difference of the computational complexity on graph subclasses, and to design polynomial-time algorithms. We can again see that k-PVCR is a generalized problem of VCR, since VCR is a special case of k-PVCR if k = 2. First, we confirm that several hardness results for VCR remain true for k-PVCR; we show the PSPACE-completeness of k-PVCR on general graphs under each rule TS, TJ, and TAR using a reduction from a variant of VCR. As our reduction preserves some nice graph properties, we claim that the hardness results for VCR on several graphs (planar graphs, bounded bandwidth graphs, chordal graphs, bipartite graphs) can be converted into those for k-PVCR. Using another reduction, we moreover show that k-PVCR remains PSPACE-complete even on planar graphs of bounded bandwith and maximum degree 3. On the other hand, we design polynomial-time algorithms for k-PVCR on trees (under each of TJ and TAR), paths and cycles (under each reconfiguration rule). Furthermore, on paths, our algorithm constructs a shortest reconfiguration sequence. In Chapter 5, we investigate MinPC (MaxPC), especially the (in)tractabilities of MinPC. Given a graph G = (V, E), a collection P of vertex disjoint paths is called a path cover on G if every vertex v ⋲ V is in exactly one path of P. The goal of path cover problem (PC for short) is to find a path cover with the minimum number of paths on G. As a generalized variant of PC, we introduce MinPC (MaxPC) as follows: Let U = {0, 1,...,n-1} denote a set of path lengths. Given a graph G = (V, E) and a cost (profit) function f : U → R ⋃ {+∞, -∞}, which defines a cost (profit) for each path in its length, find a path cover P of G such that the total cost (profit) of the paths in P is minimized (maximized). Let L be a subset of U. We denote the set of paths of length l ⋲ L as PL. We, especially, consider MinPC whose cost function is f(l) = 1 if l ⋲ L; otherwise f(l) = 0. The problem is denoted by MinPLPC and is to find a path cover with the minimum number of paths with length l ⋲ L. We can also define the problem MaxPLPC with f(l) = l + 1, if l ⋲ L, and f(l) = 0, otherwise. Note that several classical problems can be seen as special cases of MinPC or MaxPC. For example, Hamiltonian Path Problem (to seek a single path visiting every vertex exactly once) and Maximum Matching Problem are equivalent to MinP{n-1}PC and MaxP{1}PC, respectively. It is known that MinP{0}PC and MinP{0, 1}PC with the same cost function as ours can be solved in polynomial time. First, we show that MinP{0, 1, 2}PC is NP-hard on planar bipartite graphs with maximum degree three, reduced from Planar 3-SAT. Our reduction also shows that there exist no approximation algorithms for MinP{0, 1, 2}PC unless P = NP. As a positive result, we show that MinP{0,...,k}PC for any fixed integers k can be solved in polynomial time on graphs with bounded treewidth. Specifically, our algorithm runs in O(42W ・W2W+2 ・ (k + 2)2W+2 ・ n)-time, assuming we are given an n-vertex graph of width at most W with its tree decomposition. Finally, a conclusion of this dissertation and open problems are given in Chapter 6.九州工業大学博士学位論文 学位記番号:情工博甲第355号 学位授与年月日:令和3年3月25日1 Introduction|2 Minimum Block Transfer problem|3 Maximum k-Path Vertex Cover problem|4 k-Path Vertex Cover Reconfiguration problem|5 Minimum (Maximum) Weighted Path Cover problem|6 Conclusion and Open Problems九州工業大学令和2年

    Effective searching of RDF knowledge bases

    Get PDF
    RDF data has become a vital source of information for many applications. In this thesis, we present a set of models and algorithms to effectively search large RDF knowledge bases. These knowledge bases contain a large set of subjectpredicate-object (SPO) triples where subjects and objects are entities and predicates express relationships between them. Searching such knowledge bases can be done using the W3C-endorsed SPARQL language or by similarly designed triple-pattern search. However, the exact-match semantics of triple-pattern search might fall short of satisfying the users needs by returning too many or too few results. Thus, IR-style searching and ranking techniques are crucial. This thesis develops models and algorithms to enhance triple-pattern search. We propose a keyword extension to triple-pattern search that allows users to augment triple-pattern queries with keyword conditions. To improve the recall of triple-pattern search, we present a framework to automatically reformulate triple-pattern queries in such a way that the intention of the original user query is preserved while returning a sufficient number of ranked results. For efficient query processing, we present a set of top-k query processing algorithms and for ease of use, we develop methods for plain keyword search over RDF knowledge bases. Finally, we propose a set of techniques to diversify query results and we present several methods to allow users to interactively explore RDF knowledge bases to find additional contextual information about their query results.Eine Vielzahl aktueller Anwendungen basiert auf RDF-Daten als essentieller Informationsquelle. Daher sind Modelle und Algorithmen zur effizienten Suche in RDF-Wissensdatenbanken ein entscheidender Aspekt, der über Erfolg und Nichterfolg entscheidet. Derartige Datenbanken bestehen aus einer großen Menge von Subjekt-Prädikat-Objekt-Tripeln (SPO-Tripeln), wobei Subjekt und Objekt Entitäten darstellen und Prädikate Beziehungen zwischen diesen Entitäten beschreiben. Suchanfragen werden in der Regel durch Verwendung des W3C Anfragestandards SPARQL oder ähnlich strukturierte Anfragesprachen formuliert und basieren auf Tripel-Patterns. Werden nur exakte Treffer in die Ergebnismenge übernommen, wird das Informationsbedürfnis des Nutzers häufig nicht befriedigt, wenn zu wenige oder zu viele Ergebnisse ausgegeben werden. Techniken, die ihren Ursprung im Information-Retrieval haben, sowie ein geeignetes Ranking können diesem Problem entgegenwirken. Diese Dissertation stellt daher Modelle und Algorithmen zur Verbesserung der Suche basierend auf Tripel-Patterns vor. Die im Rahmen der Dissertation erarbeitete Strategie zur Lösung der oben geschilderten Problematik basiert auf der Idee, die Tripel-Patterns einer Anfrage durch Schlüsselwörter zu erweitern. Um den Recall dieser Suchvariante zu verbessern, wird ein Framework vorgestellt, welches die vom Nutzer übergebenen Anfragen automatisch in einer Weise umformuliert, dass die Intention der ursprünglichen Nutzeranfrage erhalten bleibt und eine ausreichende Anzahl an sortierten Ergebnissen ausgegeben wird. Um derartige Anfragen effizient bearbeiten zu können, werden Top-k Algorithmen und Methoden zur Schlüsselwortsuche auf RDF-Datenbanken vorgestellt. Schließlich werden einige Methoden zur Diversifikation der Anfrageergebnisse präsentiert sowie einige Ansätze vorgestellt, die es Benutzern erlauben, RDFDatenbanken interaktiv zu explorieren und so zusätzliche Kontextinformationen zu den Anfrageergebnissen zu erhalten

    Structure and Content Scoring for XML

    No full text
    XML repositories are usually queried both on structure and content. Due to structural heterogeneity of XML, queries are often interpreted approximately and their answers are returned ranked by scores. Computing answer scores in XML is an active area of research that oscillates between pure content scoring such as the well-known tf*idf and taking structure into account. However, none of the existing proposals fully accounts for structure and combines it with content to score query answers. We propose novel XML scoring methods that are inspired by tf*idf and that account for both structure and content while considering query relaxations. Twig scoring, accounts for the most structure and content and is thus used as our reference method. Path scoring is an approximation that loosens correlations between query nodes hence reducing the amount of time required to manipulate scores during top- query processing. We propose efficient data structures in order to speed up ranked query processing. We run extensive experiments that validate our scoring methods and that show that path scoring provides very high precision while improving score computation time
    corecore