6,664 research outputs found
Evaluating Music Recommender Systems for Groups
Recommendation to groups of users is a challenging and currently only
passingly studied task. Especially the evaluation aspect often appears ad-hoc
and instead of truly evaluating on groups of users, synthesizes groups by
merging individual preferences.
In this paper, we present a user study, recording the individual and shared
preferences of actual groups of participants, resulting in a robust,
standardized evaluation benchmark. Using this benchmarking dataset, that we
share with the research community, we compare the respective performance of a
wide range of music group recommendation techniques proposed in theComment: Presented at the 2017 Workshop on Value-Aware and Multistakeholder
Recommendatio
Unsupervised Learning of Parsimonious General-Purpose Embeddings for User and Location Modelling
Many social network applications depend on robust representations of
spatio-temporal data. In this work, we present an embedding model based on
feed-forward neural networks which transforms social media check-ins into dense
feature vectors encoding geographic, temporal, and functional aspects for
modelling places, neighborhoods, and users. We employ the embedding model in a
variety of applications including location recommendation, urban functional
zone study, and crime prediction. For location recommendation, we propose a
Spatio-Temporal Embedding Similarity algorithm (STES) based on the embedding
model.
In a range of experiments on real life data collected from Foursquare, we
demonstrate our model's effectiveness at characterizing places and people and
its applicability in aforementioned problem domains. Finally, we select eight
major cities around the globe and verify the robustness and generality of our
model by porting pre-trained models from one city to another, thereby
alleviating the need for costly local training
Embedding Electronic Health Records for Clinical Information Retrieval
Neural network representation learning frameworks have recently shown to be
highly effective at a wide range of tasks ranging from radiography
interpretation via data-driven diagnostics to clinical decision support. This
often superior performance comes at the price of dramatically increased
training data requirements that cannot be satisfied in every given institution
or scenario. As a means of countering such data sparsity effects, distant
supervision alleviates the need for scarce in-domain data by relying on a
related, resource-rich, task for training.
This study presents an end-to-end neural clinical decision support system
that recommends relevant literature for individual patients (few available
resources) via distant supervision on the well-known MIMIC-III collection
(abundant resource). Our experiments show significant improvements in retrieval
effectiveness over traditional statistical as well as purely locally supervised
retrieval models.Comment: Published in AMIA Annual Symposium 201
Biomedical Question Answering via Weighted Neural Network Passage Retrieval
The amount of publicly available biomedical literature has been growing
rapidly in recent years, yet question answering systems still struggle to
exploit the full potential of this source of data. In a preliminary processing
step, many question answering systems rely on retrieval models for identifying
relevant documents and passages. This paper proposes a weighted cosine distance
retrieval scheme based on neural network word embeddings. Our experiments are
based on publicly available data and tasks from the BioASQ biomedical question
answering challenge and demonstrate significant performance gains over a wide
range of state-of-the-art models.Comment: To appear in ECIR 201
A Cross-Platform Collection of Social Network Profiles
The proliferation of Internet-enabled devices and services has led to a
shifting balance between digital and analogue aspects of our everyday lives. In
the face of this development there is a growing demand for the study of privacy
hazards, the potential for unique user de-anonymization and information leakage
between the various social media profiles many of us maintain. To enable the
structured study of such adversarial effects, this paper presents a dedicated
dataset of cross-platform social network personas (i.e., the same person has
accounts on multiple platforms). The corpus comprises 850 users who generate
predominantly English content. Each user object contains the online footprint
of the same person in three distinct social networks: Twitter, Instagram and
Foursquare. In total, it encompasses over 2.5M tweets, 340k check-ins and 42k
Instagram posts. We describe the collection methodology, characteristics of the
dataset, and how to obtain it. Finally, we discuss a common use case,
cross-platform user identification.Comment: 4 pages, 5 figures, SIGIR 2016, short paper. SIGIR 2016 Proceedings
of the 39th International ACM SIGIR conference on Research and Development in
Information Retrieva
Privacy Leakage through Innocent Content Sharing in Online Social Networks
The increased popularity and ubiquitous availability of online social
networks and globalised Internet access have affected the way in which people
share content. The information that users willingly disclose on these platforms
can be used for various purposes, from building consumer models for
advertising, to inferring personal, potentially invasive, information. In this
work, we use Twitter, Instagram and Foursquare data to convey the idea that the
content shared by users, especially when aggregated across platforms, can
potentially disclose more information than was originally intended. We perform
two case studies: First, we perform user de-anonymization by mimicking the
scenario of finding the identity of a user making anonymous posts within a
group of users. Empirical evaluation on a sample of real-world social network
profiles suggests that cross-platform aggregation introduces significant
performance gains in user identification. In the second task, we show that it
is possible to infer physical location visits of a user on the basis of shared
Twitter and Instagram content. We present an informativeness scoring function
which estimates the relevance and novelty of a shared piece of information with
respect to an inference task. This measure is validated using an active
learning framework which chooses the most informative content at each given
point in time. Based on a large-scale data sample, we show that by doing this,
we can attain an improved inference performance. In some cases this performance
exceeds even the use of the user's full timeline.Comment: 8 pages, 10 figures, submitted to Privacy Preserving Workshop, Sigi
Semantic Place Descriptors for Classification and Map Discovery
Urban environments develop complex, non-obvious structures that are often
hard to represent in the form of maps or guides. Finding the right place to go
often requires intimate familiarity with the location in question and cannot
easily be deduced by visitors. In this work, we exploit large-scale samples of
usage information, in the form of mobile phone traces and geo-tagged Twitter
messages in order to automatically explore and annotate city maps via kernel
density estimation. Our experiments are based on one year's worth of mobile
phone activity collected by Nokia's Mobile Data Challenge (MDC). We show that
usage information can be a strong predictor of semantic place categories,
allowing us to automatically annotate maps based on the behavior of the local
user base.Comment: 13 pages, 1 figure, 1 tabl
Axial quasinormal modes of static neutron stars in the nonminimal derivative coupling sector of Horndeski gravity: spectrum and universal relations for realistic equations of state
We study axial quasinormal modes of static neutron stars in the nonminimal
derivative coupling sector of Horndeski theory. We focus on the fundamental
curvature mode, which we analyse for ten different equations of state with
different matter content. A comparison with the results obtained in pure
General Relativity reveals that, apart from modifying the spectrum of the
frequencies and the damping times of the stars, this theory modifies several
universal relations between the modes and physical parameters of the stars,
that are otherwise matter-independent.Comment: 10 pages, 9 figures. References added, typos corrected. Matches
published versio
Indonesian Chinese in the Netherlands and the Legacies of Violence in Colonial and Post-colonial Indonesia
After Indonesian independence in 1945, thousands of Indonesian Chinese repatriated to the Netherlands, the former colonizer. As opposed to other repatriates from Indonesia, who organized themselves into pressure groups and fought for a place in the national memory culture, the Indonesian Chinese in the Netherlands only formed strict socio-cultural associations and have generally stayed clear of identity politics. Usually, this divergence is attributed to the smooth integration and socio-economic success of the latter group, as well as to Chinese values, such as conflict avoidance.This article adds to this explanation by positing that this phenomenon has also been induced by the legacy of anti-Chinese violence in colonial and post-colonial Indonesia: respectively, Dutch discomfort to acknowledge the violent and discriminatory elements of its own colonial history, as well as a fear of offending the Indonesian government. Consequently, many Indonesian Chinese in the Netherlands have engaged in some form of public self-silencing
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language
processing and information retrieval tasks. Extracting the main content from
those documents is essential for the performance of derived applications. To
address this issue, we introduce a novel model that performs sequence labeling
to collectively classify all text blocks in an HTML page as either boilerplate
or main content. Our method uses a hidden Markov model on top of potentials
derived from DOM tree features using convolutional neural networks. The
proposed method sets a new state-of-the-art performance for boilerplate removal
on the CleanEval benchmark. As a component of information retrieval pipelines,
it improves retrieval performance on the ClueWeb12 collection.Comment: To appear in ECIR 201
- …