86,348 research outputs found
Broadcasting in Prefix Space: P2P Data Dissemination with Predictable Performance
A broadcast mode may augment peer-to-peer overlay networks with an efficient,
scalable data replication function, but may also give rise to a virtual link
layer in VPN-type solutions. We introduce a simple broadcasting mechanism that
operates in the prefix space of distributed hash tables without signaling. This
paper concentrates on the performance analysis of the prefix flooding scheme.
Starting from simple models of recursive -ary trees, we analytically derive
distributions of hop counts and the replication load. Extensive simulation
results are presented further on, based on an implementation within the OverSim
framework. Comparisons are drawn to Scribe, taken as a general reference model
for group communication according to the shared, rendezvous-point-centered
distribution paradigm. The prefix flooding scheme thereby confirmed its widely
predictable performance and consistently outperformed Scribe in all metrics.
Reverse path selection in overlays is identified as a major cause of
performance degradation.Comment: final version for ICIW'0
280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
We propose a simple, yet effective, approach towards inducing multilingual
taxonomies from Wikipedia. Given an English taxonomy, our approach leverages
the interlanguage links of Wikipedia followed by character-level classifiers to
induce high-precision, high-coverage taxonomies in other languages. Through
experiments, we demonstrate that our approach significantly outperforms the
state-of-the-art, heuristics-heavy approaches for six languages. As a
consequence of our work, we release presumably the largest and the most
accurate multilingual taxonomic resource spanning over 280 languages
Web Data Extraction, Applications and Techniques: A Survey
Web Data Extraction is an important problem that has been studied by means of
different scientific tools and in a broad range of applications. Many
approaches to extracting data from the Web have been designed to solve specific
problems and operate in ad-hoc domains. Other approaches, instead, heavily
reuse techniques and algorithms developed in the field of Information
Extraction.
This survey aims at providing a structured and comprehensive overview of the
literature in the field of Web Data Extraction. We provided a simple
classification framework in which existing Web Data Extraction applications are
grouped into two main classes, namely applications at the Enterprise level and
at the Social Web level. At the Enterprise level, Web Data Extraction
techniques emerge as a key tool to perform data analysis in Business and
Competitive Intelligence systems as well as for business process
re-engineering. At the Social Web level, Web Data Extraction techniques allow
to gather a large amount of structured data continuously generated and
disseminated by Web 2.0, Social Media and Online Social Network users and this
offers unprecedented opportunities to analyze human behavior at a very large
scale. We discuss also the potential of cross-fertilization, i.e., on the
possibility of re-using Web Data Extraction techniques originally designed to
work in a given domain, in other domains.Comment: Knowledge-based System
- …