86,348 research outputs found

    Broadcasting in Prefix Space: P2P Data Dissemination with Predictable Performance

    Full text link
    A broadcast mode may augment peer-to-peer overlay networks with an efficient, scalable data replication function, but may also give rise to a virtual link layer in VPN-type solutions. We introduce a simple broadcasting mechanism that operates in the prefix space of distributed hash tables without signaling. This paper concentrates on the performance analysis of the prefix flooding scheme. Starting from simple models of recursive kk-ary trees, we analytically derive distributions of hop counts and the replication load. Extensive simulation results are presented further on, based on an implementation within the OverSim framework. Comparisons are drawn to Scribe, taken as a general reference model for group communication according to the shared, rendezvous-point-centered distribution paradigm. The prefix flooding scheme thereby confirmed its widely predictable performance and consistently outperformed Scribe in all metrics. Reverse path selection in overlays is identified as a major cause of performance degradation.Comment: final version for ICIW'0

    280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification

    Get PDF
    We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages

    Web Data Extraction, Applications and Techniques: A Survey

    Full text link
    Web Data Extraction is an important problem that has been studied by means of different scientific tools and in a broad range of applications. Many approaches to extracting data from the Web have been designed to solve specific problems and operate in ad-hoc domains. Other approaches, instead, heavily reuse techniques and algorithms developed in the field of Information Extraction. This survey aims at providing a structured and comprehensive overview of the literature in the field of Web Data Extraction. We provided a simple classification framework in which existing Web Data Extraction applications are grouped into two main classes, namely applications at the Enterprise level and at the Social Web level. At the Enterprise level, Web Data Extraction techniques emerge as a key tool to perform data analysis in Business and Competitive Intelligence systems as well as for business process re-engineering. At the Social Web level, Web Data Extraction techniques allow to gather a large amount of structured data continuously generated and disseminated by Web 2.0, Social Media and Online Social Network users and this offers unprecedented opportunities to analyze human behavior at a very large scale. We discuss also the potential of cross-fertilization, i.e., on the possibility of re-using Web Data Extraction techniques originally designed to work in a given domain, in other domains.Comment: Knowledge-based System
    • …
    corecore