7,959 research outputs found
Shortest Path Computation with No Information Leakage
Shortest path computation is one of the most common queries in location-based
services (LBSs). Although particularly useful, such queries raise serious
privacy concerns. Exposing to a (potentially untrusted) LBS the client's
position and her destination may reveal personal information, such as social
habits, health condition, shopping preferences, lifestyle choices, etc. The
only existing method for privacy-preserving shortest path computation follows
the obfuscation paradigm; it prevents the LBS from inferring the source and
destination of the query with a probability higher than a threshold. This
implies, however, that the LBS still deduces some information (albeit not
exact) about the client's location and her destination. In this paper we aim at
strong privacy, where the adversary learns nothing about the shortest path
query. We achieve this via established private information retrieval
techniques, which we treat as black-box building blocks. Experiments on real,
large-scale road networks assess the practicality of our schemes.Comment: VLDB201
Memories for Life: A Review of the Science and Technology
This paper discusses scientific, social and technological aspects of memory. Recent developments in our understanding of memory processes and mechanisms, and their digital implementation, have placed the encoding, storage, management and retrieval of information at the forefront of several fields of research. At the same time, the divisions between the biological, physical and the digital worlds seem to be dissolving. Hence opportunities for interdisciplinary research into memory are being created, between the life sciences, social sciences and physical sciences. Such research may benefit from immediate application into information management technology as a testbed. The paper describes one initiative, Memories for Life, as a potential common problem space for the various interested disciplines
Text Embeddings Reveal (Almost) As Much As Text
How much private information do text embeddings reveal about the original
text? We investigate the problem of embedding \textit{inversion},
reconstructing the full text represented in dense text embeddings. We frame the
problem as controlled generation: generating text that, when reembedded, is
close to a fixed point in latent space. We find that although a na\"ive model
conditioned on the embedding performs poorly, a multi-step method that
iteratively corrects and re-embeds text is able to recover of
text inputs exactly. We train our model to decode text
embeddings from two state-of-the-art embedding models, and also show that our
model can recover important personal information (full names) from a dataset of
clinical notes. Our code is available on Github:
\href{https://github.com/jxmorris12/vec2text}{github.com/jxmorris12/vec2text}.Comment: Accepted at EMNLP 202
Repetition, pattern and the domestic: notes on the relationship between pattern and home-making
Repetition constitutes the very essence of pattern. Repetition is also the basis of our most ordinary actions. Repetitive gestures are usually so integrated in our lives that we tend to take them for granted. It is only when repetition is excessive or absent that we become aware of its importance to us. Not least because of their everyday properties, pattern and repetition are also closely related to the domain of the domestic. On the one hand, patterned artifacts, such as wallpapers, rugs, latticed curtains, and other fabrics seem to operate naturally as signifiers of an idea of domesticity, denoting privacy, comfort and, eventually, also seclusion and confinement. On the other hand, the repetitive rituals of pattern fabrication bear strong resonance with the traditional routines of household maintenance—cleaning, sorting, laundering, and so on. Not only are both dependent on a logic of continuous reiteration, but they also tend to be considered equally mindless and prosaic, as their processes are often rated inferior in comparison to less repetitive forms of production. In “Repetition, Pattern, and the Domestic” I investigate the foundations and implications of the identification between pattern and the home, drawing on material from historical, mythological, and psychological sources. This investigation aims to show how the repetitive mechanisms of pattern-making integrate the very dynamics of inhabitation, being essentially entangled, if sometimes inconspicuously, with the practice of spatial design
SimplyRetrieve: A Private and Lightweight Retrieval-Centric Generative AI Tool
Large Language Model (LLM) based Generative AI systems have seen significant
progress in recent years. Integrating a knowledge retrieval architecture allows
for seamless integration of private data into publicly available Generative AI
systems using pre-trained LLM without requiring additional model fine-tuning.
Moreover, Retrieval-Centric Generation (RCG) approach, a promising future
research direction that explicitly separates roles of LLMs and retrievers in
context interpretation and knowledge memorization, potentially leads to more
efficient implementation. SimplyRetrieve is an open-source tool with the goal
of providing a localized, lightweight, and user-friendly interface to these
sophisticated advancements to the machine learning community. SimplyRetrieve
features a GUI and API based RCG platform, assisted by a Private Knowledge Base
Constructor and a Retrieval Tuning Module. By leveraging these capabilities,
users can explore the potential of RCG for improving generative AI performance
while maintaining privacy standards. The tool is available at
https://github.com/RCGAI/SimplyRetrieve with an MIT license.Comment: 12 pages, 6 figure
Talk the Walk: Synthetic Data Generation for Conversational Music Recommendation
Recommendation systems are ubiquitous yet often difficult for users to
control and adjust when recommendation quality is poor. This has motivated the
development of conversational recommendation systems (CRSs), with control over
recommendations provided through natural language feedback. However, building
conversational recommendation systems requires conversational training data
involving user utterances paired with items that cover a diverse range of
preferences. Such data has proved challenging to collect scalably using
conventional methods like crowdsourcing. We address it in the context of
item-set recommendation, noting the increasing attention to this task motivated
by use cases like music, news and recipe recommendation. We present a new
technique, TalkTheWalk, that synthesizes realistic high-quality conversational
data by leveraging domain expertise encoded in widely available curated item
collections, showing how these can be transformed into corresponding item set
curation conversations. Specifically, TalkTheWalk generates a sequence of
hypothetical yet plausible item sets returned by a system, then uses a language
model to produce corresponding user utterances. Applying TalkTheWalk to music
recommendation, we generate over one million diverse playlist curation
conversations. A human evaluation shows that the conversations contain
consistent utterances with relevant item sets, nearly matching the quality of
small human-collected conversational data for this task. At the same time, when
the synthetic corpus is used to train a CRS, it improves Hits@100 by 10.5
points on a benchmark dataset over standard baselines and is preferred over the
top-performing baseline in an online evaluation
- …