8,029 research outputs found
NETEMBED: A Network Resource Mapping Service for Distributed Applications
Emerging configurable infrastructures such as large-scale overlays and grids, distributed testbeds, and sensor networks comprise diverse sets of available computing resources (e.g., CPU and OS capabilities and memory constraints) and network conditions (e.g., link delay, bandwidth, loss rate, and jitter) whose characteristics are both complex and time-varying. At the same time, distributed applications to be deployed on these infrastructures exhibit increasingly complex constraints and requirements on resources they wish to utilize. Examples include selecting nodes and links to schedule an overlay multicast file transfer across the Grid, or embedding a network experiment with specific resource constraints in a distributed testbed such as PlanetLab. Thus, a common problem facing the efficient deployment of distributed applications on these infrastructures is that of "mapping" application-level requirements onto the network in such a manner that the requirements of the application are realized, assuming that the underlying characteristics of the network are known. We refer to this problem as the network embedding problem. In this paper, we propose a new approach to tackle this combinatorially-hard problem. Thanks to a number of heuristics, our approach greatly improves performance and scalability over previously existing techniques. It does so by pruning large portions of the search space without overlooking any valid embedding. We present a construction that allows a compact representation of candidate embeddings, which is maintained by carefully controlling the order via which candidate mappings are inserted and invalid mappings are removed. We present an implementation of our proposed technique, which we call NETEMBED – a service that identify feasible mappings of a virtual network configuration (the query network) to an existing real infrastructure or testbed (the hosting network). We present results of extensive performance evaluation experiments of NETEMBED using several combinations of real and synthetic network topologies. Our results show that our NETEMBED service is quite effective in identifying one (or all) possible embeddings for quite sizable queries and hosting networks – much larger than what any of the existing techniques or services are able to handle.National Science Foundation (CNS Cybertrust 0524477, NSF CNS NeTS 0520166, NSF CNS ITR 0205294, EIA RI 0202067
Multilingual Schema Matching for Wikipedia Infoboxes
Recent research has taken advantage of Wikipedia's multilingualism as a
resource for cross-language information retrieval and machine translation, as
well as proposed techniques for enriching its cross-language structure. The
availability of documents in multiple languages also opens up new opportunities
for querying structured Wikipedia content, and in particular, to enable answers
that straddle different languages. As a step towards supporting such queries,
in this paper, we propose a method for identifying mappings between attributes
from infoboxes that come from pages in different languages. Our approach finds
mappings in a completely automated fashion. Because it does not require
training data, it is scalable: not only can it be used to find mappings between
many language pairs, but it is also effective for languages that are
under-represented and lack sufficient training samples. Another important
benefit of our approach is that it does not depend on syntactic similarity
between attribute names, and thus, it can be applied to language pairs that
have distinct morphologies. We have performed an extensive experimental
evaluation using a corpus consisting of pages in Portuguese, Vietnamese, and
English. The results show that not only does our approach obtain high precision
and recall, but it also outperforms state-of-the-art techniques. We also
present a case study which demonstrates that the multilingual mappings we
derive lead to substantial improvements in answer quality and coverage for
structured queries over Wikipedia content.Comment: VLDB201
Crosslingual Document Embedding as Reduced-Rank Ridge Regression
There has recently been much interest in extending vector-based word
representations to multiple languages, such that words can be compared across
languages. In this paper, we shift the focus from words to documents and
introduce a method for embedding documents written in any language into a
single, language-independent vector space. For training, our approach leverages
a multilingual corpus where the same concept is covered in multiple languages
(but not necessarily via exact translations), such as Wikipedia. Our method,
Cr5 (Crosslingual reduced-rank ridge regression), starts by training a
ridge-regression-based classifier that uses language-specific bag-of-word
features in order to predict the concept that a given document is about. We
show that, when constraining the learned weight matrix to be of low rank, it
can be factored to obtain the desired mappings from language-specific
bags-of-words to language-independent embeddings. As opposed to most prior
methods, which use pretrained monolingual word vectors, postprocess them to
make them crosslingual, and finally average word vectors to obtain document
vectors, Cr5 is trained end-to-end and is thus natively crosslingual as well as
document-level. Moreover, since our algorithm uses the singular value
decomposition as its core operation, it is highly scalable. Experiments show
that our method achieves state-of-the-art performance on a crosslingual
document retrieval task. Finally, although not trained for embedding sentences
and words, it also achieves competitive performance on crosslingual sentence
and word retrieval tasks.Comment: In The Twelfth ACM International Conference on Web Search and Data
Mining (WSDM '19
Accelerating Nearest Neighbor Search on Manycore Systems
We develop methods for accelerating metric similarity search that are
effective on modern hardware. Our algorithms factor into easily parallelizable
components, making them simple to deploy and efficient on multicore CPUs and
GPUs. Despite the simple structure of our algorithms, their search performance
is provably sublinear in the size of the database, with a factor dependent only
on its intrinsic dimensionality. We demonstrate that our methods provide
substantial speedups on a range of datasets and hardware platforms. In
particular, we present results on a 48-core server machine, on graphics
hardware, and on a multicore desktop
Tangible user interfaces : past, present and future directions
In the last two decades, Tangible User Interfaces (TUIs) have emerged as a new interface type that interlinks the digital and physical worlds. Drawing upon users' knowledge and skills of interaction with the real non-digital world, TUIs show a potential to enhance the way in which people interact with and leverage digital information. However, TUI research is still in its infancy and extensive research is required in or- der to fully understand the implications of tangible user interfaces, to develop technologies that further bridge the digital and the physical, and to guide TUI design with empirical knowledge. This paper examines the existing body of work on Tangible User In- terfaces. We start by sketching the history of tangible user interfaces, examining the intellectual origins of this field. We then present TUIs in a broader context, survey application domains, and review frame- works and taxonomies. We also discuss conceptual foundations of TUIs including perspectives from cognitive sciences, phycology, and philoso- phy. Methods and technologies for designing, building, and evaluating TUIs are also addressed. Finally, we discuss the strengths and limita- tions of TUIs and chart directions for future research
Acquiring Word-Meaning Mappings for Natural Language Interfaces
This paper focuses on a system, WOLFIE (WOrd Learning From Interpreted
Examples), that acquires a semantic lexicon from a corpus of sentences paired
with semantic representations. The lexicon learned consists of phrases paired
with meaning representations. WOLFIE is part of an integrated system that
learns to transform sentences into representations such as logical database
queries. Experimental results are presented demonstrating WOLFIE's ability to
learn useful lexicons for a database interface in four different natural
languages. The usefulness of the lexicons learned by WOLFIE are compared to
those acquired by a similar system, with results favorable to WOLFIE. A second
set of experiments demonstrates WOLFIE's ability to scale to larger and more
difficult, albeit artificially generated, corpora. In natural language
acquisition, it is difficult to gather the annotated data needed for supervised
learning; however, unannotated data is fairly plentiful. Active learning
methods attempt to select for annotation and training only the most informative
examples, and therefore are potentially very useful in natural language
applications. However, most results to date for active learning have only
considered standard classification tasks. To reduce annotation effort while
maintaining accuracy, we apply active learning to semantic lexicons. We show
that active learning can significantly reduce the number of annotated examples
required to achieve a given level of performance
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Maximum Inner Product Search or top-k retrieval on sparse vectors is
well-understood in information retrieval, with a number of mature algorithms
that solve it exactly. However, all existing algorithms are tailored to text
and frequency-based similarity measures. To achieve optimal memory footprint
and query latency, they rely on the near stationarity of documents and on laws
governing natural languages. We consider, instead, a setup in which collections
are streaming -- necessitating dynamic indexing -- and where indexing and
retrieval must work with arbitrarily distributed real-valued vectors. As we
show, existing algorithms are no longer competitive in this setup, even against
naive solutions. We investigate this gap and present a novel approximate
solution, called Sinnamon, that can efficiently retrieve the top-k results for
sparse real valued vectors drawn from arbitrary distributions. Notably,
Sinnamon offers levers to trade-off memory consumption, latency, and accuracy,
making the algorithm suitable for constrained applications and systems. We give
theoretical results on the error introduced by the approximate nature of the
algorithm, and present an empirical evaluation of its performance on two
hardware platforms and synthetic and real-valued datasets. We conclude by
laying out concrete directions for future research on this general top-k
retrieval problem over sparse vectors
- …