165 research outputs found

    Keyword-based search in peer-to-peer networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Answering Tag-Term Keyword Queries over XML Documents in DHT Networks

    Get PDF
    Abstract. The emergence of Peer-to-Peer (P2P) computing model and the popularity of Extensible Markup Language (XML) as the web data format have fueled the extensive research on retrieving XML data in P2P networks. In this paper, we developed an efficient and effective keyword search framework that can support tag-term keyword queries in Distributed Hash Table (DHT) networks. We employed a concise Bloom-Filter data structure to index XML meta-data in the DHT repository. We also developed an effective algorithm that supports tag-term keyword queries over our Bloom-Filter encoded XML meta-data in the DHT network. We conducted extensive experiments to demonstrate the efficiency of indexing scheme, the effectiveness of our keyword query algorithm and the system scalability of our framework

    Investigation into Indexing XML Data Techniques

    Get PDF
    The rapid development of XML technology improves the WWW, since the XML data has many advantages and has become a common technology for transferring data cross the internet. Therefore, the objective of this research is to investigate and study the XML indexing techniques in terms of their structures. The main goal of this investigation is to identify the main limitations of these techniques and any other open issues. Furthermore, this research considers most common XML indexing techniques and performs a comparison between them. Subsequently, this work makes an argument to find out these limitations. To conclude, the main problem of all the XML indexing techniques is the trade-off between the size and the efficiency of the indexes. So, all the indexes become large in order to perform well, and none of them is suitable for all users’ requirements. However, each one of these techniques has some advantages in somehow

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    On the use of query-driven XML auto-indexing

    Full text link

    Wittgenstein on line / on the line

    Get PDF
    wo independent publishing projects have thoroughly changed the state of Wittgenstein scholarship in recent years. Michael Nedo's 'Wiener Ausgabe'1 offers a traditional critical edition of Wittgenstein's philosophical writings ranging from 1929 up to and including the 'Big Typescript' (1933). Considering the eclectic and - at times - arbitrary editorial policy underlying previous publications from the Nachlass2 Nedo's project offers unprecedented philosophical rigor as well as textual criticism in volumes designed for comfortable reading. A second, more ambitious, attempt at a critical edition is the Bergen electronic edition.3 It is planned to include 4 CD-ROMs, covering the entire range of the philosopher's unpublished writing. Two disks are currently available, comprising all of Wittgenstein's manuscripts from 1929-1939, as well as type-scripts, beginning with 'Notes on Logic' (1913) and leading up to Typescript 226, composed in 1939.\ud \ud Wittgenstein's writings from the Thirties are, therefore, available in independent, reliable printed and electronic editions respectively. Readers can, for the first time, observe the philosopher at work, transferring paragraphs from pocket notebooks to handwritten 'volumes'; picking acceptable remarks to be included in type-scripts that are, at a later stage, cut up into slips of paper which are again annotated, rearranged and put together in further volumes and type-scripts. But this is only half the excitement. The 'Wiener Ausgabe' and the 'Bergen Edition' stake their success on different media, inevitably provoking a comparison between the well known features of printed scholarly editions and the not so familiar realm of digitized texts

    XML query processing: Indices and histograms

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Overview of query optimization in XML database systems

    Get PDF

    Approximate information filtering in structured peer-to-peer networks

    Get PDF
    Today';s content providers are naturally distributed and produce large amounts of information every day, making peer-to-peer data management a promising approach offering scalability, adaptivity to dynamics, and failure resilience. In such systems, subscribing with a continuous query is of equal importance as one-time querying since it allows the user to cope with the high rate of information production and avoid the cognitive overload of repeated searches. In the information filtering setting users specify continuous queries, thus subscribing to newly appearing documents satisfying the query conditions. Contrary to existing approaches providing exact information filtering functionality, this doctoral thesis introduces the concept of approximate information filtering, where users subscribe to only a few selected sources most likely to satisfy their information demand. This way, efficiency and scalability are enhanced by trading a small reduction in recall for lower message traffic. This thesis contains the following contributions: (i) the first architecture to support approximate information filtering in structured peer-to-peer networks, (ii) novel strategies to select the most appropriate publishers by taking into account correlations among keywords, (iii) a prototype implementation for approximate information retrieval and filtering, and (iv) a digital library use case to demonstrate the integration of retrieval and filtering in a unified system.Heutige Content-Anbieter sind verteilt und produzieren riesige Mengen an Daten jeden Tag. Daher wird die Datenhaltung in Peer-to-Peer Netzen zu einem vielversprechenden Ansatz, der Skalierbarkeit, Anpassbarkeit an Dynamik und Ausfallsicherheit bietet. Für solche Systeme besitzt das Abonnieren mit Daueranfragen die gleiche Wichtigkeit wie einmalige Anfragen, da dies dem Nutzer erlaubt, mit der hohen Datenrate umzugehen und gleichzeitig die Überlastung durch erneutes Suchen verhindert. Im Information Filtering Szenario legen Nutzer Daueranfragen fest und abonnieren dadurch neue Dokumente, die die Anfrage erfüllen. Im Gegensatz zu vorhandenen Ansätzen für exaktes Information Filtering führt diese Doktorarbeit das Konzept von approximativem Information Filtering ein. Ein Nutzer abonniert nur wenige ausgewählte Quellen, die am ehesten die Anfrage erfüllen werden. Effizienz und Skalierbarkeit werden verbessert, indem Recall gegen einen geringeren Nachrichtenverkehr eingetauscht wird. Diese Arbeit beinhaltet folgende Beiträge: (i) die erste Architektur für approximatives Information Filtering in strukturierten Peer-to-Peer Netzen, (ii) Strategien zur Wahl der besten Anbieter unter Berücksichtigung von Schlüsselwörter-Korrelationen, (iii) ein Prototyp, der approximatives Information Retrieval und Filtering realisiert und (iv) ein Anwendungsfall für Digitale Bibliotheken, der beide Funktionalitäten in einem vereinten System aufzeigt
    corecore