3,702 research outputs found
Updating collection representations for federated search
To facilitate the search for relevant information across a set of online distributed collections, a federated information retrieval system typically represents each collection, centrally, by a set of vocabularies or sampled documents. Accurate retrieval is therefore related to how precise each representation reflects the underlying content stored in that collection. As collections evolve over time, collection representations should also be updated to reflect any change, however, a current solution has not yet been proposed. In this study we examine both the implications of out-of-date representation sets on retrieval accuracy, as well as proposing three different policies for managing necessary updates. Each policy is evaluated on a testbed of forty-four dynamic collections over an eight-week period. Our findings show that out-of-date representations significantly degrade performance overtime, however, adopting a suitable update policy can minimise this problem
Distributed Information Retrieval using Keyword Auctions
This report motivates the need for large-scale distributed approaches to information retrieval, and proposes solutions based on keyword auctions
The NASA Astrophysics Data System: Architecture
The powerful discovery capabilities available in the ADS bibliographic
services are possible thanks to the design of a flexible search and retrieval
system based on a relational database model. Bibliographic records are stored
as a corpus of structured documents containing fielded data and metadata, while
discipline-specific knowledge is segregated in a set of files independent of
the bibliographic data itself.
The creation and management of links to both internal and external resources
associated with each bibliography in the database is made possible by
representing them as a set of document properties and their attributes.
To improve global access to the ADS data holdings, a number of mirror sites
have been created by cloning the database contents and software on a variety of
hardware and software platforms.
The procedures used to create and manage the database and its mirrors have
been written as a set of scripts that can be run in either an interactive or
unsupervised fashion.
The ADS can be accessed at http://adswww.harvard.eduComment: 25 pages, 8 figures, 3 table
LightFR: Lightweight Federated Recommendation with Privacy-preserving Matrix Factorization
Federated recommender system (FRS), which enables many local devices to train
a shared model jointly without transmitting local raw data, has become a
prevalent recommendation paradigm with privacy-preserving advantages. However,
previous work on FRS performs similarity search via inner product in continuous
embedding space, which causes an efficiency bottleneck when the scale of items
is extremely large. We argue that such a scheme in federated settings ignores
the limited capacities in resource-constrained user devices (i.e., storage
space, computational overhead, and communication bandwidth), and makes it
harder to be deployed in large-scale recommender systems. Besides, it has been
shown that transmitting local gradients in real-valued form between server and
clients may leak users' private information. To this end, we propose a
lightweight federated recommendation framework with privacy-preserving matrix
factorization, LightFR, that is able to generate high-quality binary codes by
exploiting learning to hash technique under federated settings, and thus enjoys
both fast online inference and economic memory consumption. Moreover, we devise
an efficient federated discrete optimization algorithm to collaboratively train
model parameters between the server and clients, which can effectively prevent
real-valued gradient attacks from malicious parties. Through extensive
experiments on four real-world datasets, we show that our LightFR model
outperforms several state-of-the-art FRS methods in terms of recommendation
accuracy, inference efficiency and data privacy.Comment: Accepted by ACM Transactions on Information Systems (TOIS
Towards interoperability in heterogeneous database systems
Distributed heterogeneous databases consist of systems which differ physically and logically, containing different data models and data manipulation languages. Although these databases are independently created and administered they must cooperate and interoperate. Users need to access and manipulate data from several databases and applications may require data from a wide variety of independent databases. Therefore, a new system architecture is required to manipulate and manage distinct and multiple databases, in a transparent way, while preserving their autonomy. This report contains an extensive survey on heterogeneous databases, analysing and comparing the different aspects, concepts and approaches related to the topic. It introduces an architecture to support interoperability among heterogeneous database systems. The architecture avoids the use of a centralised structure to assist in the different phases of the interoperability process. It aims to support scalability, and to assure privacy and nfidentiality of the data. The proposed architecture allows the databases to decide when to participate in the system, what type of data to share and with which other databases, thereby preserving their autonomy. The report also describes an approach to information discovery in the proposed architecture, without using any centralised structure as repositories and dictionaries, and broadcasting to all databases. It attempts to reduce the number of databases searched and to preserve the privacy of the shared data. The main idea is to visit a database that either containsthe requested data or knows about another database that possible contains this data
- …